Execution Engine Configuration

Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

MapReduce is a default execution engine for Hive. Kyvos also supports Spark for running queries on Hive. You can configure the execution engine in this area according to your requirements.

Note

From Kyvos 2025.3.2 onwards, Kyvos has added support for Apache Iceberg to enhance compatibility and performance when working with external data sources on AWS EMR with Spark.
The fields displayed in the following figure are displayed ONLY if you select the Spark option.

Note

In the case of Azure (Databricks) deployment, only the Spark Version parameter is available for selection, and other fields are not displayed.

To configure execution engine properties for the cluster:

Enter details as:

Area	Parameter/Field	Comments/Description

Area	Parameter/Field	Comments/Description
	Execution Engine Name	Select the Execution engine from the list.
	Deployment Mode	Select the yarn-cluster option in case your Spark deployment mode is YARN cluster; else, select the yarn-client option.
Node and Authentication	Spark Source Node	To use the Hive Source Node, select the Same As Name Node option. Else, select the Other Node option.
	Spark Node Host Name	If you selected the Other Node option above, enter your node IP here.
	Use different user account for accessing Spark Node	Select the checkbox to use a user account other than the Hadoop Node authentication user for accessing the Spark node. NOTE: If you select this option. You will be prompted to provide Username, Authentication Type, and Password/Shared Key for authentication.
Paths and Version	Spark Version	Select the Spark version from the list.
	Spark Home Directory	Provide Spark home directory.
	Spark Library Path	Enter library files path for Spark. Refer to the Appendix for details.
	Spark Configuration Path	Enter the configuration files path for Spark. Refer to the Appendix for details.
	Configure Iceberg	Click this link to configure Iceberg in Kyvos.
Spark Parameters	Spark Parameters	Use this to add custom Spark parameters for your cluster. NOTE: You must provide the spark.yarn.historyServer.address parameter.

Click Validate Spark file paths. The system validates user authentication and connection for paths.

Note

The Validate Hive File Paths button is not displayed for the Azure (Databricks) environment.

Click the Save button from the top-right of the page to save your changes.

Post-Deployment Steps to Enable Iceberg in Kyvos

Kyvos has introduced support for Apache Iceberg to enhance compatibility and performance when working with external data sources on AWS EMR with Spark.

Previously, non-Iceberg semantic model tables registered in AWS Glue have limited capabilities, especially when used for scalable analytics and advanced metadata management. These limitations prevent Kyvos from fully leveraging features such as time travel, schema evolution, and efficient snapshot isolation, which are critical for modern data operations. To address this issue, existing datasets must be converted to the Iceberg table format. This support modifies the table structure to ensure compatibility with Iceberg standards. Once enabled, Kyvos can more effectively manage and query large-scale data with improved consistency and performance.

This strategic integration allows Kyvos to leverage Spark’s high-performance query engine alongside Iceberg’s robust features, delivering better query performance, data governance, and scalability for enterprise-scale analytics.

Prerequisites to enable Iceberg in Kyvos,

Perform the following steps before you enable Iceberg in Kyvos Manager.

After completing the deployment, clone EMR 6.15.0 and add the following configuration in the EMR settings:
{ "Classification": "iceberg-defaults", "Properties": { "iceberg.enabled": "true" } }
Then, synchronize AWS (EMR) via Kyvos Manager to apply the updated EMR version and configuration.
Download required JAR Files from an EMR 6.15.0 cluster to your local machine:
- /usr/share/aws/iceberg/lib/iceberg-spark-runtime-3.4_2.12-1.4.0-amzn-0.jar
- /usr/share/aws/aws-java-sdk-v2/aws-sdk-java-bundle-2.20.160-SNAPSHOT.jar
  NOTE: These JAR files will be required when enabling Iceberg through Kyvos Manager.
Create an S3 location. In your AWS S3 bucket, create a folder to serve as the Iceberg warehouse location.
Example: s3://kyvos-output-769691/iceberg_location/

Configuring Iceberg Support through Kyvos Manager

To configure Iceberg support, perform the following steps.

In the Path Version section, click the Configure Iceberg link. The Configure Iceberg dialog box is displayed.
Click the Enable Iceberg checkbox. The fields on the Configure Iceberg dialog will be editable.
To set the Warehouse Location, specify the S3 path as created above in this section.
For example, s3://kyvos-output-769691/iceberg_location/
Provide a Catalog Name for the Iceberg catalog.
Provide a Catalog Name for the Iceberg catalog. For example, Catalog Name = iceberg_catalog
Create an S3 location. In your AWS S3 bucket, create a folder to serve as the Iceberg warehouse location.
For example: s3://kyvos-output-769691/iceberg_location/
Upload the aws-sdk-java-bundle-2.20.160-SNAPSHOT.jar file.
Upload the Iceberg JAR file as ‘iceberg-spark-runtime-3.4_2.12-1.4.0-amzn-0.jar’ file.
Click Save to apply the configuration. This will enable the Iceberg Configuration in Kyvos.