Best Practices for Working with Kyvos

Best Practices for Working with Kyvos

Working with Kyvos — a cloud-native semantic layer for both AI and BI — involves both technical and strategic best practices. Here’s a breakdown of best practices for effectively using Kyvos for optimal performance, scalability, and usability:

  1. Understanding the Use Case: A clear understanding of the use case is essential for designing optimal solutions within the Kyvos Semantic Model.

  2. Dataset Preparation: Identify and compile a list of datasets (tables/views), along with the corresponding dimensions, attributes, and measures derived from these datasets.

  3. Dataset Classification: Classify datasets as Dimensions if they only contain dimension/attribute fields, and as Facts if they include measure fields or both dimensions and measures.

  4. Data Modeling Best Practices

    1. Star Schema Preference: Design your data in a star schema (fact and dimension tables), as Kyvos performs best with this structure.

    2. Flatten Hierarchies Where Possible: Use flattened dimension tables to reduce joins and improve semantic model build performance.

  5. Dataset Design Best Practices

    1. SQL-Based Datasets: It is recommended to use SQL-based datasets for the following benefits:

      1. Allows you to perform calculations at the SQL level.

      2. Enables the use of partition columns to efficiently filter data.

    2. Selective Column Inclusion: Only include the necessary columns in the SELECT clause of your SQL query to optimize performance.

    3. Date Column Input Format: Ensure that the input format for date columns is clearly specified.

    4. Handling Decimal Data: When dealing with decimal data, use the double data type in the Kyvos dataset if extremely high accuracy is not required for higher decimal places.

    5. Row Selection: Fetch only the required rows from the SQL query to reduce unnecessary data retrieval.

  6. Relationship Design Best Practices

    1. Single-Direction Relationships: When defining relationships, ensure that they are created in a single direction, such as from Fact to Dimension or Dimension to Fact.

    2. Fact-to-Fact Joins: Fact-to-Fact joins are not directly supported, so avoid creating relationships between fact tables.

  7. Semantic model Best Practices

    1. Start Small, Scale Gradually: Begin with fewer dimensions and measures; scale as needed. This keeps builds fast and manageable and also helps in quick debugging in case of any issues.

    2. Partitioning: Use intelligent partitioning (like time-based) on large fact tables for faster processing and querying.

      1. Identify and choose the right fields for partitioning the semantic model. Proper selection of partition fields helps filter data during queries, thereby enhancing query performance.

      2. Choosing the correct partition field also ensures efficient processing of relevant data during incremental builds with the replace partition set as “Auto".

    3. Include Only Required Fields: Add only the necessary fields to the semantic model to ensure efficiency and avoid unnecessary complexity.

  8. Best Practices for Performance Optimization

    1. Build Scheduling: Schedule semantic model builds during off-peak hours or in a staggered manner to reduce resource contention.

    2. Incremental Builds: Use incremental builds to update only new or changed data, which saves time and resources.

    3. Caching Strategy: Tune caching based on usage patterns. Use warm-up queries post-build to populate the cache for faster user response.

  9. Deployment Best Practices

    1. Scaling: Use Kyvos' elastic architecture to dynamically scale based on workload demand in cloud environments.

    2. Security Compliance: Leverage Kyvos’ support for cloud IAM (like AWS IAM, Azure AD) for secure access management.

  10. Security and Governance Best Practices

    1. Role-Based Access Control (RBAC): Assign data access and visibility rules based on user roles.

    2. Data Masking and Row-Level Security: Use these features to ensure sensitive information is protected.

    3. Audit Logs: Enable auditing to track user activity and system performance.

  11. Integration with BI Tools Best Practices

    1. Optimize SQL Queries: Kyvos translates BI tool queries into optimized semantic model queries, ensuring efficient execution. Understand how your tool (e.g., Tableau, Power BI, Excel) generates SQL.

    2. Live Connections: For real-time insights, live connections to the Kyvos SQL interface are preferred over data extraction from BI tools.

    3. Tool-Specific Tuning: Tune the dashboard and report design (e.g., filters and visuals) to minimize query complexity.

  12. Monitoring and Maintenance Best Practices

    1. Use Kyvos Monitoring Tools: Monitor job status, semantic model usage, and query performance through the Kyvos Manager and web portal.

    2. Log Rotation and Housekeeping: Regularly clean up old logs and monitor storage usage to prevent bloat.

    3. Version Upgrades: Stay current with Kyvos releases to leverage performance improvements and new features.

  13.  Collaboration and Documentation

    1. Document semantic model Definitions: Maintain thorough documentation for all dimensions, hierarchies, and measures.

    2. Team Collaboration: Use Git or other version control tools to collaborate on semantic model design and configuration in development environments.

Best practices for handling calculations

Calculated Field Type

Calculation Type

Pros

Cons

Attributes

SQL

Pre-calculated, so it gives better performance.

You need to process the semantic model each time you change the calculation.

 

Tableau

It can be changed at runtime; there is no need to process the semantic model.

Calculated at runtime, hence it would impact performance.

Measure

SQL

Pre-calculated, so it gives better performance.

You need to process the semantic model each time you change the calculation.

 

MDX

It can be changed at runtime; there is no need to process the semantic model.
Complex calculations can be done here, like Time series calculations.

Calculated at runtime, hence it would impact performance.

 

Tableau

It can be changed at runtime; there is no need to process the semantic model.
Suitable for users who are more familiar with Tableau than MDX calculations

Calculated at runtime, hence it would impact performance.

Best Practices of Multidimensional (MDM), Relational Multidimensional (RMDM) and Hybrid Multidimensional (HMDM) Data Models

This section summarizes the key considerations, recommendations, and best practices for selecting and implementing semantic models within the Kyvos environment. The focus is on performance optimization, cost efficiency, and suitability for various schemas and business requirements.

Aspect

Multidimensional Models

Relational Multidimensional Models

Hybrid Multidimensional Models

Aspect

Multidimensional Models

Relational Multidimensional Models

Hybrid Multidimensional Models

Query Performance (Aggregates)

Very fast—precomputed aggregates and optimized semantic model operations

Slower—aggregates computed on-the-fly via SQL

Fast for summary queries using the semantic model; slower for non-materialized columns

Scalability / Capacity

Highly scalable

Highly scalable

Highly scalable

Flexibility / Ad-hoc Queries

Flexible

Highly flexible

Moderate — fast if data exists in the semantic model; otherwise, queries access the underlying data source.

Storage Overhead

High — stores many aggregates

Lower — primarily base tables, indexes, and materialized views

Medium — aggregates stored in the semantic model; detailed data remains in the source

ETL / Preprocessing Cost

High — semantic model building and aggregation computation; supports full or incremental processing.

Low — minimal precomputation, more on-demand computation

Medium — requires maintaining the semantic model and synchronizing with relational data

Maintenance & Complexity

High — semantic model schema design, rebuilds, and tuning required

High — semantic model schema design, relational DB tuning, indexing, and query optimization

High integration of the semantic model with relational systems, synchronization, and orchestration is needed

Cost (Setup, Hardware, Licensing, Human Resources)

High costs for semantic model creation and initial infrastructure

Low — leverages existing RDBMS and staff familiar with SQL/DB

Medium to high — requires both semantic model system and relational DB setup

Multidimensional Model

Kyvos aggregates data into a multidimensional semantic model, which is stored within its datastore. These semantic models contain pre-computed summaries of the data across various dimensions and hierarchies. Queries are served directly from these semantic models, which allows for sub-second response times even on very large datasets.

Key Features

  • Eliminates dependency on the source system by pre-aggregating and storing data in semantic models.

  • Delivers high performance through result caching and efficient query execution.

  • Flexible semantic model allows runtime modifications (e.g., switching from count to average).

  • Well-suited for complex schemas and environments requiring high query performance.

  • Minimal processing cost compared to roll-up methods.

  • Ensures data security and reduces query complexity.

Recommendations

Multidimensional Model is highly recommended and appropriate when:

  • Schema complexity is high.

  • Query flexibility and runtime aggregation are required.

  • Cost efficiency and predictable performance are priorities.

Relational Multidimensional Model

In Relational Multidimensional Model, analytical queries are executed directly against the underlying relational data sources, such as Snowflake, BigQuery, Redshift, or other cloud data warehouses. Unlike Multidimensional Model, data is not pre-aggregated or stored in semantic models. Instead, the system dynamically generates queries that fetch the required results in real time.

Key Features

  • Relies directly on the external data source for query execution.

  • Provides flexibility but lacks the performance and efficiency of Multidimensional Model or Hybrid Multidimensional Model.

Recommendations

Should be used:

  • When cardinality is very low.

  • When the computation cost in Multidimensional Model/Hybrid Multidimensional Model becomes prohibitively high.

  • Generally, not recommended as a default option due to reliance on source performance.

Performance and Cost Considerations

  • Query execution costs scale with the amount of data scanned (e.g., high costs observed in BigQuery and Snowflake environments).

  • Quota limits are applied to manage query costs and prevent runaway scans.

  • Multidimensional Model offers the optimal balance of cost efficiency and performance, featuring one-time model processing and minimal recurring computational costs.

  • Kyvos’ scaling engine enhances Multidimensional Model’s ability to deliver fast performance even at scale.

Hybrid Multidimensional Model

Hybrid Multidimensional Model offers a hybrid approach by combining the strengths of Multidimensional Model and Relational Multidimensional Model. In this model, frequently used aggregations are pre-calculated and stored in semantic models (Multidimensional Model). At the same time, less common or ad-hoc queries are executed directly against the underlying data source (Relational Multidimensional Model). This approach balances performance with flexibility and storage efficiency.

Key Features

  • Combines pre-aggregation with direct queries to the source.

  • Allows inclusion of high-cardinality attributes in the semantic model.

  • Offers immediate refresh capabilities where data latency is critical.

Recommendations

Should be used when:

  • Dimensions have high cardinality and therefore contain computation cost.

  • Business requires their inclusion in the semantic model for completeness.

  • Immediate refresh and lower semantic model storage are desired.
    Note: Large queries may still be sent directly to the source, which can impact performance for very large datasets.

Decision Framework

Cases

Recommended

Cases

Recommended

Complex schema with high query demands

Multidimensional Model

High-cardinality dimensions and compute costs

Hybrid Multidimensional Model

Need for immediate refresh / near real-time updates with a smaller source size

Hybrid Multidimensional Model

low cardinality attributes

Relational Multidimensional Model

Extremely high computation costs

Relational Multidimensional Model

Summary of Recommendations

  • Multidimensional Model: Highly recommended for complex schemas, high performance, and cost efficiency.

  • Relational Multidimensional Model: Recommended; limited to specific edge cases with low cardinality or excessive computation costs.

  • Hybrid Multidimensional Model: Recommended selectively for high-cardinality attributes not directly driving reporting but needed in the semantic model.

Why Multidimensional Model is Highly Recommended

Multidimensional Model is the preferred and default recommendation in nearly all deployment scenarios within Kyvos.

Performance Advantages

  • Pre-aggregated data semantic models ensure queries are resolved quickly without repeatedly hitting the source system.

  • Built-in result caching dramatically improves response times, even under heavy query loads.

  • Kyvos’ scaling engine enhances Multidimensional Model performance, enabling sub-second query responses on large datasets.

Cost Efficiency

  • Since queries are executed against pre-computed semantic models, data scan costs are minimized.

  • A one-time model processing cost is incurred, but recurring query execution costs remain negligible compared to Relational Multidimensional Model or direct source queries.

  • Prevents runaway query costs often seen in cloud platforms, such as BigQuery and Snowflake.

Flexibility

  • Supports runtime query modifications (e.g., changing aggregations from count to average without reprocessing the semantic model).

  • Allows schema adjustments and complex modeling without degrading performance.

  • Handles both simple and highly complex schema requirements seamlessly.

Simplicity and Reliability

  • Reduces reliance on source systems, lowering risks of performance bottlenecks.

  • Avoids the complexity of managing direct queries against multiple sources.

  • Provides built-in security and governance mechanisms, making it enterprise-ready.

Copyright Kyvos, Inc. 2025. All rights reserved.