Configuring load-based scaling

Applies to: Kyvos Enterprise Kyvos Cloud (SaaS on AWS) Kyvos AWS Marketplace

Kyvos Azure Marketplace Kyvos GCP Marketplace Kyvos Single Node Installation (Kyvos SNI)

When the usage pattern of a cluster is not fixed, load-based scaling is recommended. This type of scaling allows you to automatically scale up or scale down the Query Engine instances based on the resource utilization of the Query Engine instances. By configuring cluster scaling in this way, you can optimize the utilization of your cloud cluster and reduce compute costs.

Setting load-based cluster scaling rules
Transitioning period

Setting Load-based cluster scaling rules

From Kyvos 2024.2 onwards, load based scaling is implemented based on the CPU and Memory usage of the Query Engine instances. System resources are monitored for all BI Servers and Query Engines every 30 seconds, and Query Engines’ scaling will be performed based on this data.

Note

From Kyvos 2024.9 onwards, if you use Query Engines as a compute server:

Query Engines will be automatically started when the semantic model is processed.
Scaling and shutdown of the Query Engine will be skipped when the semantic model is processed.

The cluster will be scaled up step-by-step. For instance, it will scale from Low to Moderate and then from Moderate to High. Similarly, when scaling down the cluster, it will scale from High to Moderate and then from Moderate to Low.

Load-based cluster scaling is enabled by default. You can further configure it on the Cluster Scaling page using the Load option.

To set a load-based scaling, from the Toolbox, click Setup > Cluster Scaling. The Cluster Scaling page is displayed.

Kyvos provides the following scaling modes for Load-based scaling:

Managed: In Managed scaling, Kyvos intelligently manages the cluster capacity to scale up and scale down Query Engine instances.
Custom: Custom scaling allows you to set rules based on your cluster usage patterns. This feature presently supports scaling based on CPU and Memory utilization. When the CPU or Memory load condition meets the configured parameters, the Query Engine instances can be scaled up or down accordingly.

To set the scaling modes for Load-Based scaling, perform the following steps.

On the Cluster Scaling page, the Load option is selected by default.
To set the scaling mode, select one of the following:
1. Managed: Select the required capacity from the list to start Query Engine when any query is fired.
2. Custom: Select this option to configure the custom rules as per your cluster usage pattern.
  To set scale up rules,
  - Select the required capacity from the list to start Query Engine when any query is fired.
  - Enter a percentage to scale up the cluster if CPU or Memory utilization threshold goes above the specified percentage. Also, specify the number of data points and the total number of data points to set.
  - To set scale down rules,
    1. Specify the BI Server and/or Query Engine from the list to shut down when no queries are fired for the specified period of time.
    2. Enter a percentage to scale down the cluster if CPU and Memory utilization threshold remains below the specified percentage. Also, specify the number of data points and the total number of data points to set.
Click Save. The load-based scaling mode is set.

NOTE

A data point is information on resource utilization captured every 30 seconds.

Transitioning period

In earlier versions of Kyvos, the process of scaling Query Engines (QEs) involved a technique called chunking. Chunking refers to stopping and starting QEs in batches rather than handling all QEs simultaneously.

For example, if a cluster had 8 QEs, the scaling operation would first stop or start 5 QEs and then handle the remaining 3 in a subsequent batch. This approach aimed to reduce the load on system resources during scaling and ensure service stability, but it often resulted in longer overall scaling times and a less predictable transition period.

From Kyvos 2025.6 onwards, the scaling process has been improved to stop and start all QEs simultaneously, eliminating the need for chunking. This enhancement ensures faster, more consistent, and efficient scaling operations, reducing downtime and improving performance during scaling transitions.

The following tables specify the approximate time required to complete the process during the transitioning period.

Important

From Kyvos 2025.6 onwards, the default behavior includes the following:
- The value of the QE_STARTUP_QUERY_HOLD_TIME property is set to 5.
  - All the Query Engines started and scaled simultaneously.
  - Queries will be hold in case Queries Engine startup, stop and scaling.
    - Queries will be hold until the configured percentage of Query Engines has started, as governed by the CLUSTER_SCALING_ACTIVE_QE_PERCENTAGE property.
    - Queries will fail if Query Engines are not started within the configured value of the QE_STARTUP_QUERY_HOLD_TIME property.
If the value of the QE_STARTUP_QUERY_HOLD_TIME property is set to 0:
- Queries will be failed during startup
- Query Engine will be scaled in chunks
- Queries will be served during scaling
If the cluster is down and a query is executed, the first query triggers the cluster startup process, and all queries fail when the value of the QE_STARTUP_QUERY_HOLD_TIME property is 0, if it's any positive integer, queries will be hold until all QE are started. In case of query failure, the following messages are displayed:
- Message 1: "Could not serve the query as Query Engine Cluster is not available. Query Engine is launched. Please try after some time."
- Message 2: "Could not serve the query as Query Engine Cluster is starting. Please try after some time."
The capacity of the BI Servers cannot be changed.
All BI Servers can be shut down except the Coordination Master. If there is only one BI Server, this BI Server is treated as the Coordination Master.
The Settings option and Add Schedule option are disabled on the Load screen.
You can view on-screen notifications that provide you with timely information about the state of the cluster.
When you scale down the Query Engines, you reduce the capacity of the node, including the number of cores and memory. Conversely, when you scale up the Query Engines, you increase the capacity of the node by adding more cores and memory.
The Query Engines do not start for any relational multidimensional model-based queries.
During the transitioning period of Query Engines; such as scale-up, scale-down, or shut down; you can design and refine the semantic model because the Coordination Master is always up and running .