General Best Practices for QoS
- Use Rekey I/O Rate threshold to limit LDT impact, if any, to your production workloads. Rekey I/O Rate approach is a simpler method for a Administrator or system administrator to enforce a limit on the volume of data that LDT should rekey per second. You can choose a threshold, in units of MBs per second, which is a small percentage of peak IOPS from your production workload.
Note
When choosing a threshold on CTE protected hosts with GuardPoints over NFS/CIFS shares, you must consider network bandwidth between your host and NFS/CIFS servers. QoS does not monitor the impact of LDT operations on network connections between your hosts and NFS/CIFS servers. However, the rate selected as LDT Rekey IO rate directly correlates to the network bandwidth to the target NFS/CIFS servers. In the follow-up discussions for selecting optimal rekey IO rate, you must monitor and collect the network traffic instead of disk IO transfers.
-
You will see the effects of QoS settings only if the number and/or types of files in the GuardPoints stress the rekey or scan processes. On hosts with a relatively small number of files, the rekey or scan process may complete quickly without hitting a threshold and causing throttling to occur.
-
Use QoS CPU parameters as an alternate method for controlling the effect LDT has on application performance.
Set limits on LDT CPU usage whenever runtime monitoring shows that user applications are affected by LDT. Start by setting the CPU parameter to 10%, then increase or decrease in 5% intervals, as needed, to tune the CPU allocation. When an acceptable level is reached, and LDT is not noticeably affecting user applications, leave the QoS CPU parameters at a constant setting.
-
Use monitoring tools.
Monitor host CPU utilization with tools like
vmstat
,top
, andiotop
on Linux and perform on Windows.You can also monitor and obtain statistics with the
voradmin ldt stats
command.For more information about
voradmin ldt stats
, see Obtaining LDT Statistics at the Command Line. -
Select CPU resource allocation for LDT from 1% to the available limit minus 5%.
If the monitoring tools indicate system CPU usage, without LDT, it is at N%, available CPU resource is M%, where M = 100 - N. Select a percentage within 1 - (M - 5) to allocate to LDT CPU usage. However, remember that QoS tolerates 2% - 4% leeway in the actual CPU usage, so adjust your selection by 2 - 4%.
-
Do not set CPU resources to 0% or 99% in an attempt to minimize or stop LDT
A CPU% value of 0 or 99 is reserved for disabling the QoS CPU monitoring function. This does not stop LDT or minimize its resource usage; rather the opposite. It enables LDT to run with its maximum rekey rate. Note that when CPU % is not set, LDT clients enforce a 5% CPU threshold by default.
-
Cap the CPU allocation.
QoS provides a CAP CPU Allocation parameter. Set this parameter to True. This ensures that LDT resource usage never exceeds the allocated percentage.
Example: Setting QoS before starting LDT
You can be proactive and set up QoS parameters before enabling GuardPoints that are protected with LDT policies. This ensures QoS starts monitoring and controlling LDT resource usage from the start. The following graph shows an example where 10% of the CPU is assigned to LDT. QoS makes sure that LDT is restricted to use only 10% of the CPU. There is a tolerance level of +/- 4%, so actual LDT usage can range between 5% and 15% of CPU. In the following example, applications use 75% of the CPU resources. As the graph shows, when LDT starts, application CPU utilization drops for a moment, because LDT exceeds the CPU threshold. QoS immediately reduces LDT’s CPU usage to 12%, which is within tolerance levels for a 10% setting, and the application CPU share returns to normal.
QoS makes visible improvement immediately when LDT starts
The graph above was obtained on a Linux system running sysbench.
-
To find the amount of CPU resources currently in use by applications, type:
top -n 1 -b | grep sysbench | awk 'BEGIN {cpu=0} {cpu += $9} END {print cpu}'
-
To find the amount of CPU currently in use by the LDT-protected host, type:
top | grep Cpu
-
To find the amount of CPU currently in use by LDT, type:
voradmin ldt stats | grep CPU
Example: Monitoring and controlling resource usage during LDT
Suppose that LDT has started with CPU set to 25%, and users realize their applications are affected. For example, there might be a higher than expected level of LDT I/O operations. To return application performance to normal, reduce the CPU allocation for LDT.
-
Set the CPU parameter to a lower value, such as 10%.
-
Select the Cap CPU Allocation option.
QoS restricts LDT CPU usage to 10%. The application user should monitor their application. If the application’s performance is still affected, reduce the CPU parameter further, such as to 5%. Repeat this procedure until application performance returns to a satisfactory level.
The following graph shows an example where QoS is not enabled to monitor and control LDT CPU usage from the start. When LDT starts, application CPU usage drops from 65% to 20%. By setting the QoS CPU parameter to 10%, application usage is greatly improved.
CPU usage allocation before and after QoS CPU parameter is set
Example: How QoS CPU settings affect I/O bandwidth
Controlling CPU utilization indirectly controls I/O bandwidth. When LDT is consuming less CPU resources, it is usually performing fewer operations of all kinds, including input and output.
The following graph shows how an application's I/O throughput is affected by LDT, and how QoS can reduce this effect through monitoring and controlling the LDT CPU resource usage.
This graph is from the same system described in Example: Monitoring and controlling resource usage during LDT. No QoS parameters are set at first. When LDT starts, application I/O drops from more than 600 MB per second to below 30 MB per second. By setting the QoS CPU parameter to 10%, the LDT I/O operations drop to below the application’s ability to perform so application I/O operations is greatly improved.
I/O operations before and after QoS CPU limit is set
To obtain the data for this graph:
-
Use
iotop
and benchmarking tools on an RHEL system to obtain application I/O throughput. -
Use
voradmin ldt stats
to obtain the LDT I/O throughput and rekey rate.