Creating a dataset

Creating a dataset

Applies to: Kyvos Enterprise  Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace   Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)


This section details how to create a dataset in Kyvos by selecting a data connection, defining the input type (such as HDFS, table, or SQL), and preparing the data for use in semantic models and analysis.

You can create a dataset for use with Kyvos. When you create a dataset, choose the connection from which the data needs to be fetched. As you apply settings to sort and filter the columns, you can preview the results to make sure you get the results you are seeking.

The total number of columns is shown above the list of their names. If the column count has changed, for example, if any columns are hidden, the column total includes additional information and a link. For example, "Total 29 columns out of which 5 columns are hidden." When you click the link, the information dialog informs you that column details have been updated and lists the number of columns added and deleted, and recommends that you review the updated file. Click Dismiss to close the dialog.

Points to know: 

  • From Kyvos 2026.2 onwards, Kyvos Lakehouse now allows direct reading of Parquet and Apache Iceberg data stored in S3 bucket. The system supports storage in Amazon S3, which eliminates the necessity for external catalogs or SQL engines.

  • Kyvos supports global parameters for datasets, making it easier to manage parameter values (like dbname and tablename) during environment migrations (e.g., from UAT to Production). You can set these parameters in the connection settings, and Kyvos will automatically use the correct values based on the target environment. This reduces manual work, improves deployment flexibility, and simplifies environment management. When users define parameters as connection properties, the parameter names must be prefixed with kyvos. param..

    Example:

    Suppose the query is:
    SELECT * FROM <dbname>.EMPLOYEE

    • On the dataset, you can define a parameter named dbname and assign the required value.

    • On the connection, users can add a property named kyvos.param.dbname and set its corresponding value.

    If Kyvos resolves the parameter value as devdb, the final query becomes:
    SELECT * FROM devdb.EMPLOYEE

  • You can preview the entire dataset, replacing the previous limitation of a partial dataset preview.
    NOTE: Opting to preview the full dataset may result in higher-than-expected costs. Additionally, the execution time required to generate the dataset preview could increase, depending on the size of the data.

  • You can view a column marked as "Modified" when changes are made to it in a dataset. Additionally, you can view the details of those changes. This feature helps users quickly identify modified columns and review their specifics.

  • You can mark a file as a Fact to use it as a fact table in relationships. And you can hide columns not required for analysis. 

  • Use the Actions menu (...) to validate the dataset, share it, add a note, or show related entities.

  • If your instance of Kyvos has been configured via the portal.properties file to support it, you can register a file with a Presto connection. You can format columns and preview the result.

  • To register a file with a Presto connection, see Creating a file with a Presto connection. You can also create register files by  Writing SQL queries using data in Hive

  • To learn more about the effects of some of the settings you can use while registering the file, see Logic for creating relationships and semantic model.

To create a file, perform the following steps. 

  1. From the Toolbox, click Datasets.

  2. Click the Actions menu (   ) and click Add Dataset.

  3. Select the Connection from the list.

  4. Select any one of the following input types: 

    • HDFS

    • Table

    • SQL

Previewing dataset

You can preview the entire dataset, replacing the previous limitation of a partial dataset preview. You can also opt to preview the full dataset which may result in higher-than-expected costs. Additionally, the execution time required to generate the dataset preview could increase, depending on the size of the data.

image-20250103-112748.png

See Formatting, sorting, filtering file data while registering datasets to learn additional ways to customize the data while you register datasets.


Related topics

Copyright Kyvos, Inc. 2025. All rights reserved.