Best Practices for Working with Kyvos

Working with Kyvos — a cloud-native semantic layer for both AI and BI — involves both technical and strategic best practices. Here’s a breakdown of best practices for effectively using Kyvos for optimal performance, scalability, and usability:

Understanding the Use Case: A clear understanding of the use case is essential for designing optimal solutions within the Kyvos Semantic Model.
Dataset Preparation: Identify and compile a list of datasets (tables/views), along with the corresponding dimensions, attributes, and measures derived from these datasets.
Dataset Classification: Classify datasets as Dimensions if they only contain dimension/attribute fields, and as Facts if they include measure fields or both dimensions and measures.
Data Modeling Best Practices
1. Star Schema Preference: Design your data in a star schema (fact and dimension tables), as Kyvos performs best with this structure.
2. Flatten Hierarchies Where Possible: Use flattened dimension tables to reduce joins and improve semantic model build performance.
Dataset Design Best Practices
1. SQL-Based Datasets: It is recommended to use SQL-based datasets for the following benefits:
  1. Allows you to perform calculations at the SQL level.
  2. Enables the use of partition columns to efficiently filter data.
2. Selective Column Inclusion: Only include the necessary columns in the SELECT clause of your SQL query to optimize performance.
3. Date Column Input Format: Ensure that the input format for date columns is clearly specified.
4. Handling Decimal Data: When dealing with decimal data, use the double data type in the Kyvos dataset if extremely high accuracy is not required for higher decimal places.
5. Row Selection: Fetch only the required rows from the SQL query to reduce unnecessary data retrieval.
Relationship Design Best Practices
1. Single-Direction Relationships: When defining relationships, ensure that they are created in a single direction, such as from Fact to Dimension or Dimension to Fact.
2. Fact-to-Fact Joins: Fact-to-Fact joins are not directly supported, so avoid creating relationships between fact tables.
Semantic model Best Practices
1. Start Small, Scale Gradually: Begin with fewer dimensions and measures; scale as needed. This keeps builds fast and manageable and also helps in quick debugging in case of any issues.
2. Partitioning: Use intelligent partitioning (like time-based) on large fact tables for faster processing and querying.
  1. Identify and choose the right fields for partitioning the semantic model. Proper selection of partition fields helps filter data during queries, thereby enhancing query performance.
  2. Choosing the correct partition field also ensures efficient processing of relevant data during incremental builds with the replace partition set as “Auto".
3. Include Only Required Fields: Add only the necessary fields to the semantic model to ensure efficiency and avoid unnecessary complexity.
Best Practices for Performance Optimization
1. Build Scheduling: Schedule semantic model builds during off-peak hours or in a staggered manner to reduce resource contention.
2. Incremental Builds: Use incremental builds to update only new or changed data, which saves time and resources.
3. Caching Strategy: Tune caching based on usage patterns. Use warm-up queries post-build to populate the cache for faster user response.
Deployment Best Practices
1. Scaling: Use Kyvos' elastic architecture to dynamically scale based on workload demand in cloud environments.
2. Security Compliance: Leverage Kyvos’ support for cloud IAM (like AWS IAM, Azure AD) for secure access management.
Security and Governance Best Practices
1. Role-Based Access Control (RBAC): Assign data access and visibility rules based on user roles.
2. Data Masking and Row-Level Security: Use these features to ensure sensitive information is protected.
3. Audit Logs: Enable auditing to track user activity and system performance.
Integration with BI Tools Best Practices
1. Optimize SQL Queries: Kyvos translates BI tool queries into optimized semantic model queries, ensuring efficient execution. Understand how your tool (e.g., Tableau, Power BI, Excel) generates SQL.
2. Live Connections: For real-time insights, live connections to the Kyvos SQL interface are preferred over data extraction from BI tools.
3. Tool-Specific Tuning: Tune the dashboard and report design (e.g., filters and visuals) to minimize query complexity.
Monitoring and Maintenance Best Practices
1. Use Kyvos Monitoring Tools: Monitor job status, semantic model usage, and query performance through the Kyvos Manager and web portal.
2. Log Rotation and Housekeeping: Regularly clean up old logs and monitor storage usage to prevent bloat.
3. Version Upgrades: Stay current with Kyvos releases to leverage performance improvements and new features.
Collaboration and Documentation
1. Document semantic model Definitions: Maintain thorough documentation for all dimensions, hierarchies, and measures.
2. Team Collaboration: Use Git or other version control tools to collaborate on semantic model design and configuration in development environments.

Best practices for handling calculations

Calculated Field Type	Calculation Type	Pros	Cons
Attributes	SQL	Pre-calculated, so it gives better performance.	You need to process the semantic model each time you change the calculation.
	Tableau	It can be changed at runtime; there is no need to process the semantic model.	Calculated at runtime, hence it would impact performance.
Measure	SQL	Pre-calculated, so it gives better performance.	You need to process the semantic model each time you change the calculation.
	MDX	It can be changed at runtime; there is no need to process the semantic model. Complex calculations can be done here, like Time series calculations.	Calculated at runtime, hence it would impact performance.
	Tableau	It can be changed at runtime; there is no need to process the semantic model. Suitable for users who are more familiar with Tableau than MDX calculations	Calculated at runtime, hence it would impact performance.

Best Practices of Multidimensional (MDM), Relational Multidimensional (RMDM) and Hybrid Multidimensional (HMDM) Data Models

This section summarizes the key considerations, recommendations, and best practices for selecting and implementing semantic models within the Kyvos environment. The focus is on performance optimization, cost efficiency, and suitability for various schemas and business requirements.

Aspect	Multidimensional Models	Relational Multidimensional Models	Hybrid Multidimensional Models

Aspect	Multidimensional Models	Relational Multidimensional Models	Hybrid Multidimensional Models
Query Performance (Aggregates)	Very fast—precomputed aggregates and optimized semantic model operations	Slower—aggregates computed on-the-fly via SQL	Fast for summary queries using the semantic model; slower for non-materialized columns
Scalability / Capacity	Highly scalable	Highly scalable	Highly scalable
Flexibility / Ad-hoc Queries	Flexible	Highly flexible	Moderate — fast if data exists in the semantic model; otherwise, queries access the underlying data source.
Storage Overhead	High — stores many aggregates	Lower — primarily base tables, indexes, and materialized views	Medium — aggregates stored in the semantic model; detailed data remains in the source
ETL / Preprocessing Cost	High — semantic model building and aggregation computation; supports full or incremental processing.	Low — minimal precomputation, more on-demand computation	Medium — requires maintaining the semantic model and synchronizing with relational data
Maintenance & Complexity	High — semantic model schema design, rebuilds, and tuning required	High — semantic model schema design, relational DB tuning, indexing, and query optimization	High integration of the semantic model with relational systems, synchronization, and orchestration is needed
Cost (Setup, Hardware, Licensing, Human Resources)	High costs for semantic model creation and initial infrastructure	Low — leverages existing RDBMS and staff familiar with SQL/DB	Medium to high — requires both semantic model system and relational DB setup

Multidimensional Model

Kyvos aggregates data into a multidimensional semantic model, which is stored within its datastore. These semantic models contain pre-computed summaries of the data across various dimensions and hierarchies. Queries are served directly from these semantic models, which allows for sub-second response times even on very large datasets.

Key Features

Eliminates dependency on the source system by pre-aggregating and storing data in semantic models.
Delivers high performance through result caching and efficient query execution.
Flexible semantic model allows runtime modifications (e.g., switching from count to average).
Well-suited for complex schemas and environments requiring high query performance.
Minimal processing cost compared to roll-up methods.
Ensures data security and reduces query complexity.

Recommendations

Multidimensional Model is highly recommended and appropriate when:

Schema complexity is high.
Query flexibility and runtime aggregation are required.
Cost efficiency and predictable performance are priorities.

Relational Multidimensional Model

In Relational Multidimensional Model, analytical queries are executed directly against the underlying relational data sources, such as Snowflake, BigQuery, Redshift, or other cloud data warehouses. Unlike Multidimensional Model, data is not pre-aggregated or stored in semantic models. Instead, the system dynamically generates queries that fetch the required results in real time.

Key Features

Relies directly on the external data source for query execution.
Provides flexibility but lacks the performance and efficiency of Multidimensional Model or Hybrid Multidimensional Model.

Recommendations

Should be used:

When cardinality is very low.
When the computation cost in Multidimensional Model/Hybrid Multidimensional Model becomes prohibitively high.
Generally, not recommended as a default option due to reliance on source performance.

Performance and Cost Considerations

Query execution costs scale with the amount of data scanned (e.g., high costs observed in BigQuery and Snowflake environments).
Quota limits are applied to manage query costs and prevent runaway scans.
Multidimensional Model offers the optimal balance of cost efficiency and performance, featuring one-time model processing and minimal recurring computational costs.
Kyvos’ scaling engine enhances Multidimensional Model’s ability to deliver fast performance even at scale.

Hybrid Multidimensional Model

Hybrid Multidimensional Model offers a hybrid approach by combining the strengths of Multidimensional Model and Relational Multidimensional Model. In this model, frequently used aggregations are pre-calculated and stored in semantic models (Multidimensional Model). At the same time, less common or ad-hoc queries are executed directly against the underlying data source (Relational Multidimensional Model). This approach balances performance with flexibility and storage efficiency.

Key Features

Combines pre-aggregation with direct queries to the source.
Allows inclusion of high-cardinality attributes in the semantic model.
Offers immediate refresh capabilities where data latency is critical.

Recommendations

Should be used when:

Dimensions have high cardinality and therefore contain computation cost.
Business requires their inclusion in the semantic model for completeness.
Immediate refresh and lower semantic model storage are desired.
Note: Large queries may still be sent directly to the source, which can impact performance for very large datasets.

Decision Framework

Cases	Recommended

Cases	Recommended
Complex schema with high query demands	Multidimensional Model
High-cardinality dimensions and compute costs	Hybrid Multidimensional Model
Need for immediate refresh / near real-time updates with a smaller source size	Hybrid Multidimensional Model
low cardinality attributes	Relational Multidimensional Model
Extremely high computation costs	Relational Multidimensional Model

Summary of Recommendations

Multidimensional Model: Highly recommended for complex schemas, high performance, and cost efficiency.
Relational Multidimensional Model: Recommended; limited to specific edge cases with low cardinality or excessive computation costs.
Hybrid Multidimensional Model: Recommended selectively for high-cardinality attributes not directly driving reporting but needed in the semantic model.

Why Multidimensional Model is Highly Recommended

Multidimensional Model is the preferred and default recommendation in nearly all deployment scenarios within Kyvos.

Performance Advantages

Pre-aggregated data semantic models ensure queries are resolved quickly without repeatedly hitting the source system.
Built-in result caching dramatically improves response times, even under heavy query loads.
Kyvos’ scaling engine enhances Multidimensional Model performance, enabling sub-second query responses on large datasets.

Cost Efficiency

Since queries are executed against pre-computed semantic models, data scan costs are minimized.
A one-time model processing cost is incurred, but recurring query execution costs remain negligible compared to Relational Multidimensional Model or direct source queries.
Prevents runaway query costs often seen in cloud platforms, such as BigQuery and Snowflake.

Flexibility

Supports runtime query modifications (e.g., changing aggregations from count to average without reprocessing the semantic model).
Allows schema adjustments and complex modeling without degrading performance.
Handles both simple and highly complex schema requirements seamlessly.

Simplicity and Reliability

Reduces reliance on source systems, lowering risks of performance bottlenecks.
Avoids the complexity of managing direct queries against multiple sources.
Provides built-in security and governance mechanisms, making it enterprise-ready.