Working with Apache Impala

Kyvos supports Apache Impala as a data source, enabling users to leverage Impala data for building semantic models. You can configure an Impala connection using JDBC details and use this connection to create datasets from Impala tables or custom SQL queries.

Kyvos also allows flexibility in data modeling by enabling users to modify dataset metadata—such as column names, data types, and field formats—without impacting the underlying source data.

With Impala integration, you can perform the following tasks:

Connect to Impala by selecting IMPALA as the provider.
Create datasets using SQL queries or tables from the Impala connection.
Design the Data Relationship Diagram (DRD) and build the semantic model using the selected data.

Note

During the model build process, export data directly from Impala to HDFS in Parquet format for optimized performance.
This export mechanism supports both partitioned and non-partitioned models.

Creating Apache Impala connection

To create an Apache Impala connection for processing semantic models, perform the steps below.

From the Toolbox, click Connections.
From the Actions menu (⋮) click Add Connection.
Enter a name for the Apache Impala connection.
Select Warehouse as Category value.
Select the Impala as Provider from the list.
The Driver Class field is displayed by default.
Provide the URL. The format for JDBC should be jdbc:hive2://<host>:<port>/<db>
Leave the username and password fields blank.
JAVA_JDBC is selected as the default Spark Read Method. Leave it unchanged.
Click the Test button from the top left to validate the connection settings.
If the connection is valid, click the Save button.