Best Practices for Working with Kyvos
Working with Kyvos — a cloud-native semantic layer for both AI and BI — involves both technical and strategic best practices. Here’s a breakdown of best practices for effectively using Kyvos for optimal performance, scalability, and usability:
Understanding the Use Case: A clear understanding of the use case is essential for designing optimal solutions within the Kyvos Semantic Model.
Dataset Preparation: Identify and compile a list of datasets (tables/views), along with the corresponding dimensions, attributes, and measures derived from these datasets.
Dataset Classification: Classify datasets as Dimensions if they only contain dimension/attribute fields, and as Facts if they include measure fields or both dimensions and measures.
Data Modeling Best Practices
Star Schema Preference: Design your data in a star schema (fact and dimension tables), as Kyvos performs best with this structure.
Flatten Hierarchies Where Possible: Use flattened dimension tables to reduce joins and improve semantic model build performance.
Dataset Design Best Practices
SQL-Based Datasets: It is recommended to use SQL-based datasets for the following benefits:
Allows you to perform calculations at the SQL level.
Enables the use of partition columns to efficiently filter data.
Selective Column Inclusion: Only include the necessary columns in the SELECT clause of your SQL query to optimize performance.
Date Column Input Format: Ensure that the input format for date columns is clearly specified.
Handling Decimal Data: When dealing with decimal data, use the double data type in the Kyvos dataset if extremely high accuracy is not required for higher decimal places.
Row Selection: Fetch only the required rows from the SQL query to reduce unnecessary data retrieval.
Relationship Design Best Practices
Single-Direction Relationships: When defining relationships, ensure that they are created in a single direction, such as from Fact to Dimension or Dimension to Fact.
Fact-to-Fact Joins: Fact-to-Fact joins are not directly supported, so avoid creating relationships between fact tables.
Semantic model Best Practices
Start Small, Scale Gradually: Begin with fewer dimensions and measures; scale as needed. This keeps builds fast and manageable and also helps in quick debugging in case of any issues.
Partitioning: Use intelligent partitioning (like time-based) on large fact tables for faster processing and querying.
Identify and choose the right fields for partitioning the semantic model. Proper selection of partition fields helps filter data during queries, thereby enhancing query performance.
Choosing the correct partition field also ensures efficient processing of relevant data during incremental builds with the replace partition set as “Auto".
Include Only Required Fields: Add only the necessary fields to the semantic model to ensure efficiency and avoid unnecessary complexity.
Best Practices for Performance Optimization
Build Scheduling: Schedule semantic model builds during off-peak hours or in a staggered manner to reduce resource contention.
Incremental Builds: Use incremental builds to update only new or changed data, which saves time and resources.
Caching Strategy: Tune caching based on usage patterns. Use warm-up queries post-build to populate the cache for faster user response.
Deployment Best Practices
Scaling: Use Kyvos' elastic architecture to dynamically scale based on workload demand in cloud environments.
Security Compliance: Leverage Kyvos’ support for cloud IAM (like AWS IAM, Azure AD) for secure access management.
Security and Governance Best Practices
Role-Based Access Control (RBAC): Assign data access and visibility rules based on user roles.
Data Masking and Row-Level Security: Use these features to ensure sensitive information is protected.
Audit Logs: Enable auditing to track user activity and system performance.
Integration with BI Tools Best Practices
Optimize SQL Queries: Kyvos translates BI tool queries into optimized semantic model queries, ensuring efficient execution. Understand how your tool (e.g., Tableau, Power BI, Excel) generates SQL.
Live Connections: For real-time insights, live connections to the Kyvos SQL interface are preferred over data extraction from BI tools.
Tool-Specific Tuning: Tune the dashboard and report design (e.g., filters and visuals) to minimize query complexity.
Monitoring and Maintenance Best Practices
Use Kyvos Monitoring Tools: Monitor job status, semantic model usage, and query performance through the Kyvos Manager and web portal.
Log Rotation and Housekeeping: Regularly clean up old logs and monitor storage usage to prevent bloat.
Version Upgrades: Stay current with Kyvos releases to leverage performance improvements and new features.
Collaboration and Documentation
Document semantic model Definitions: Maintain thorough documentation for all dimensions, hierarchies, and measures.
Team Collaboration: Use Git or other version control tools to collaborate on semantic model design and configuration in development environments.
Best practices for handling calculations
Calculated Field Type | Calculation Type | Pros | Cons |
Attributes | SQL | Pre-calculated, so it gives better performance. | You need to process the semantic model each time you change the calculation. |
| Tableau | It can be changed at runtime; there is no need to process the semantic model. | Calculated at runtime, hence it would impact performance. |
Measure | SQL | Pre-calculated, so it gives better performance. | You need to process the semantic model each time you change the calculation. |
| MDX | It can be changed at runtime; there is no need to process the semantic model. | Calculated at runtime, hence it would impact performance. |
| Tableau | It can be changed at runtime; there is no need to process the semantic model. | Calculated at runtime, hence it would impact performance. |
Best Practices of Multidimensional (MDM), Relational Multidimensional (RMDM) and Hybrid Multidimensional (HMDM) Data Models
This section summarizes the key considerations, recommendations, and best practices for selecting and implementing semantic models within the Kyvos environment. The focus is on performance optimization, cost efficiency, and suitability for various schemas and business requirements.
Aspect | Multidimensional Models | Relational Multidimensional Models | Hybrid Multidimensional Models |
|---|---|---|---|
Query Performance (Aggregates) | Very fast—precomputed aggregates and optimized semantic model operations | Slower—aggregates computed on-the-fly via SQL | Fast for summary queries using the semantic model; slower for non-materialized columns |
Scalability / Capacity | Highly scalable | Highly scalable | Highly scalable |
Flexibility / Ad-hoc Queries | Flexible | Highly flexible | Moderate — fast if data exists in the semantic model; otherwise, queries access the underlying data source. |
Storage Overhead | High — stores many aggregates | Lower — primarily base tables, indexes, and materialized views | Medium — aggregates stored in the semantic model; detailed data remains in the source |
ETL / Preprocessing Cost | High — semantic model building and aggregation computation; supports full or incremental processing. | Low — minimal precomputation, more on-demand computation | Medium — requires maintaining the semantic model and synchronizing with relational data |
Maintenance & Complexity | High — semantic model schema design, rebuilds, and tuning required | High — semantic model schema design, relational DB tuning, indexing, and query optimization | High integration of the semantic model with relational systems, synchronization, and orchestration is needed |
Cost (Setup, Hardware, Licensing, Human Resources) | High costs for semantic model creation and initial infrastructure | Low — leverages existing RDBMS and staff familiar with SQL/DB | Medium to high — requires both semantic model system and relational DB setup |
Multidimensional Model
Kyvos aggregates data into a multidimensional semantic model, which is stored within its datastore. These semantic models contain pre-computed summaries of the data across various dimensions and hierarchies. Queries are served directly from these semantic models, which allows for sub-second response times even on very large datasets.
Key Features
Eliminates dependency on the source system by pre-aggregating and storing data in semantic models.
Delivers high performance through result caching and efficient query execution.
Flexible semantic model allows runtime modifications (e.g., switching from count to average).
Well-suited for complex schemas and environments requiring high query performance.
Minimal processing cost compared to roll-up methods.
Ensures data security and reduces query complexity.
Recommendations
Multidimensional Model is highly recommended and appropriate when:
Schema complexity is high.
Query flexibility and runtime aggregation are required.
Cost efficiency and predictable performance are priorities.
Relational Multidimensional Model
In Relational Multidimensional Model, analytical queries are executed directly against the underlying relational data sources, such as Snowflake, BigQuery, Redshift, or other cloud data warehouses. Unlike Multidimensional Model, data is not pre-aggregated or stored in semantic models. Instead, the system dynamically generates queries that fetch the required results in real time.
Key Features
Relies directly on the external data source for query execution.
Provides flexibility but lacks the performance and efficiency of Multidimensional Model or Hybrid Multidimensional Model.
Recommendations
Should be used:
When cardinality is very low.
When the computation cost in Multidimensional Model/Hybrid Multidimensional Model becomes prohibitively high.
Generally, not recommended as a default option due to reliance on source performance.
Performance and Cost Considerations
Query execution costs scale with the amount of data scanned (e.g., high costs observed in BigQuery and Snowflake environments).
Quota limits are applied to manage query costs and prevent runaway scans.
Multidimensional Model offers the optimal balance of cost efficiency and performance, featuring one-time model processing and minimal recurring computational costs.
Kyvos’ scaling engine enhances Multidimensional Model’s ability to deliver fast performance even at scale.
Hybrid Multidimensional Model
Hybrid Multidimensional Model offers a hybrid approach by combining the strengths of Multidimensional Model and Relational Multidimensional Model. In this model, frequently used aggregations are pre-calculated and stored in semantic models (Multidimensional Model). At the same time, less common or ad-hoc queries are executed directly against the underlying data source (Relational Multidimensional Model). This approach balances performance with flexibility and storage efficiency.
Key Features
Combines pre-aggregation with direct queries to the source.
Allows inclusion of high-cardinality attributes in the semantic model.
Offers immediate refresh capabilities where data latency is critical.
Recommendations
Should be used when:
Dimensions have high cardinality and therefore contain computation cost.
Business requires their inclusion in the semantic model for completeness.
Immediate refresh and lower semantic model storage are desired.
Note: Large queries may still be sent directly to the source, which can impact performance for very large datasets.
Decision Framework
Cases | Recommended |
|---|---|
Complex schema with high query demands | Multidimensional Model |
High-cardinality dimensions and compute costs | Hybrid Multidimensional Model |
Need for immediate refresh / near real-time updates with a smaller source size | Hybrid Multidimensional Model |
low cardinality attributes | Relational Multidimensional Model |
Extremely high computation costs | Relational Multidimensional Model |
Summary of Recommendations
Multidimensional Model: Highly recommended for complex schemas, high performance, and cost efficiency.
Relational Multidimensional Model: Recommended; limited to specific edge cases with low cardinality or excessive computation costs.
Hybrid Multidimensional Model: Recommended selectively for high-cardinality attributes not directly driving reporting but needed in the semantic model.
Why Multidimensional Model is Highly Recommended
Multidimensional Model is the preferred and default recommendation in nearly all deployment scenarios within Kyvos.
Performance Advantages
Pre-aggregated data semantic models ensure queries are resolved quickly without repeatedly hitting the source system.
Built-in result caching dramatically improves response times, even under heavy query loads.
Kyvos’ scaling engine enhances Multidimensional Model performance, enabling sub-second query responses on large datasets.
Cost Efficiency
Since queries are executed against pre-computed semantic models, data scan costs are minimized.
A one-time model processing cost is incurred, but recurring query execution costs remain negligible compared to Relational Multidimensional Model or direct source queries.
Prevents runaway query costs often seen in cloud platforms, such as BigQuery and Snowflake.
Flexibility
Supports runtime query modifications (e.g., changing aggregations from count to average without reprocessing the semantic model).
Allows schema adjustments and complex modeling without degrading performance.
Handles both simple and highly complex schema requirements seamlessly.
Simplicity and Reliability
Reduces reliance on source systems, lowering risks of performance bottlenecks.
Avoids the complexity of managing direct queries against multiple sources.
Provides built-in security and governance mechanisms, making it enterprise-ready.