General Best Practices for QoS
-
Use Rekey I/O Rate threshold to limit CTE-LDT impact, if any, to your production workloads. Rekey I/O Rate approach is a simpler method for a Administrator or system administrator to enforce a limit on the volume of data that CTE-LDT should rekey per second. You can choose a threshold, in units of MBs per second, which is a small percentage of peak IOPS from your production workload.
When choosing a threshold on CTE protected hosts with GuardPoints over NFS/CIFS shares, you must consider network bandwidth between your host and NFS/CIFS servers. QoS does not monitor the impact of LDT operations on network connections between your hosts and NFS/CIFS servers. However, the rate selected as LDT Rekey IO rate directly correlates to the network bandwidth to the target NFS/CIFS servers. In the follow-up discussions for selecting optimal rekey IO rate, you must monitor and collect the network traffic instead of disk IO transfers.
-
You will see the effects of QoS settings only if the number and/or types of files in the GuardPoints stress the rekey or scan processes. On hosts with a relatively small number of files, the rekey or scan process may complete quickly without hitting a threshold and causing throttling to occur.
-
Use QoS CPU parameters as an alternate method for controlling the effect CTE-LDT has on application performance.
Set limits on CTE-LDT CPU usage whenever runtime monitoring shows that user applications are affected by CTE-LDT. Start by setting the CPU parameter to 10%, then increase or decrease in 5% intervals, as needed, to tune the CPU allocation. When an acceptable level is reached, and CTE-LDT is not noticeably affecting user applications, leave the QoS CPU parameters at a constant setting.
-
Use monitoring tools.
Monitor host CPU utilization with tools like
vmstat
,top
, andiotop
on Linux and perform on Windows.You can also monitor and obtain statistics with the
voradmin ldt stats
command.For more information about
voradmin ldt stats
, see Obtaining CTE-LDT Statistics at the Command Line. -
Select CPU resource allocation for CTE-LDT from 1% to the available limit minus 5%.
If the monitoring tools indicate system CPU usage, without CTE-LDT, it is at N%, available CPU resource is M%, where M = 100 - N. Select a percentage within 1 - (M - 5) to allocate to CTE-LDT CPU usage. However, remember that QoS tolerates 2% - 4% leeway in the actual CPU usage, so adjust your selection by 2 - 4%.
-
Do not set CPU resources to 0% or 99% in an attempt to minimize or stop CTE-LDT
A CPU% value of 0 or 99 is reserved for disabling the QoS CPU monitoring function. This does not stop CTE-LDT or minimize its resource usage; rather the opposite. It enables CTE-LDT to run with its maximum rekey rate. Note that when CPU % is not set, CTE-LDT clients enforce a 5% CPU threshold by default.
-
Cap the CPU allocation.
QoS provides a CAP CPU Allocation parameter. Set this parameter to True. This ensures that CTE-LDT resource usage never exceeds the allocated percentage.
Example: Setting QoS before starting CTE-LDT
You can be proactive and set up QoS parameters before enabling GuardPoints that are protected with CTE-LDT policies. This ensures QoS starts monitoring and controlling CTE-LDT resource usage from the start. The following graph shows an example where 10% of the CPU is assigned to CTE-LDT. QoS makes sure that CTE-LDT is restricted to use only 10% of the CPU. There is a tolerance level of +/- 4%, so actual CTE-LDT usage can range between 5% and 15% of CPU. In the following example, applications use 75% of the CPU resources. As the graph shows, when CTE-LDT starts, application CPU utilization drops for a moment, because CTE-LDT exceeds the CPU threshold. QoS immediately reduces CTE-LDT’s CPU usage to 12%, which is within tolerance levels for a 10% setting, and the application CPU share returns to normal.
QoS makes visible improvement immediately when CTE-LDT starts
The graph above was obtained on a Linux system running sysbench.
-
To find the amount of CPU resources currently in use by applications, type:
top -n 1 -b | grep sysbench | awk 'BEGIN {cpu=0} {cpu += $9} END {print cpu}'
-
To find the amount of CPU currently in use by the CTE-LDT-protected host, type:
top | grep Cpu
-
To find the amount of CPU currently in use by CTE-LDT, type:
voradmin ldt stats | grep CPU
Example: Monitoring and controlling resource usage during CTE-LDT
Suppose that CTE-LDT has started with CPU set to 25%, and users realize their applications are affected. For example, there might be a higher than expected level of CTE-LDT I/O operations. To return application performance to normal, reduce the CPU allocation for CTE-LDT.
-
Set the CPU parameter to a lower value, such as 10%.
-
Select the Cap CPU Allocation option.
QoS restricts CTE-LDT CPU usage to 10%. The application user should monitor their application. If the application’s performance is still affected, reduce the CPU parameter further, such as to 5%. Repeat this procedure until application performance returns to a satisfactory level.
The following graph shows an example where QoS is not enabled to monitor and control CTE-LDT CPU usage from the start. When CTE-LDT starts, application CPU usage drops from 65% to 20%. By setting the QoS CPU parameter to 10%, application usage is greatly improved.
CPU usage allocation before and after QoS CPU parameter is set
Example: How QoS CPU settings affect I/O bandwidth
Controlling CPU utilization indirectly controls I/O bandwidth. When CTE-LDT is consuming less CPU resources, it is usually performing fewer operations of all kinds, including input and output.
The following graph shows how an application's I/O throughput is affected by CTE-LDT, and how QoS can reduce this effect through monitoring and controlling the CTE-LDT CPU resource usage.
This graph is from the same system described in Example: Monitoring and controlling resource usage during CTE-LDT. No QoS parameters are set at first. When CTE-LDT starts, application I/O drops from more than 600 MB per second to below 30 MB per second. By setting the QoS CPU parameter to 10%, the CTE-LDT I/O operations drop to below the application’s ability to perform so application I/O operations is greatly improved.
I/O operations before and after QoS CPU limit is set
To obtain the data for this graph:
-
Use
iotop
and benchmarking tools on an RHEL system to obtain application I/O throughput. -
Use
voradmin ldt stats
to obtain the CTE-LDT I/O throughput and rekey rate.