Prometheus Metrics Endpoint
You can use the Prometheus metrics endpoint to connect the Prometheus monitoring system to CipherTrust Manager. You can set Prometheus to scrape the CipherTrust Manager continuously, providing metrics over time to help monitor overall system health, performance, and cryptographic activity.
A sample configuration with Prometheus and Grafana docker images is available on Github. The Grafana data visualization application provides graph visualizations of the Prometheus-collected metrics.
Prerequisites for Sample Configuration
CipherTrust Manager 2.7.0 or later
Docker
Docker Compose (
docker-compose
)
Sample Configuration Setup
On your CipherTrust Manager, enable Prometheus metrics, either through a
POST
to the/v1/system/metrics/prometheus/enable
endpoint, or with theksctl metrics prometheus enable
CLI command.A token is returned, which Prometheus needs to scrape CipherTrust Manager.
Note
This token does not expire, but can be manually renewed with
ksctl metrics prometheus renew-token
or aPOST
to/v1/system/metrics/prometheus/renew-token
.Get the token which Prometheus needs to scrape CipherTrust Manager, if needed. You can use
GET
with the/v1/system/metrics/prometheus/status
endpoint orksctl metrics prometheus status
.In the Prometheus Metrics directory, edit the
prometheus.yml
file.At minimum, you must provide the CipherTrust Manager hostname/IP in
targets
and the prometheus API token inbearer token
. Prometheus can scrape multiple CipherTrust Managers, which might or might not share the API Token. This is an example configuration file with three CipherTrust Manager nodes, of which two share the same Prometheus API token:scrape_configs: - job_name: "CipherTrust Manager" scheme: "https" tls_config: #ca_file: "/trusted_cas/web-keysecure-local.pem" #server_name: "web.keysecure.local" insecure_skip_verify: true bearer_token: "1zplR4njZsRN5dNeWAFXhkL1x7MU9q4H" metrics_path: "/api/v1/system/metrics/prometheus" static_configs: - targets: - "1.1.1.1" - "1.1.1.2" - job_name: "CipherTrust Manager Staging" scheme: "https" tls_config: #ca_file: "/trusted_cas/web-keysecure-local.pem" #server_name: "web.keysecure.local" insecure_skip_verify: true bearer_token: "TnRHpdL9v8MnWv8DhN9xuAaKgPevMEZs" metrics_path: "/api/v1/system/metrics/prometheus" static_configs: - targets: - "1.1.1.3"
Set up TLS authentication. By default, the Prometheus configuration sets
insecure_skip_verify: true
which is not recommended for production deployments as it skips SSL/TLS certificate validation for the CipherTrust Manager server.On CipherTrust Manager, download the certificate associated with the web interface. Export to a
pem
format.ksctl interfaces certificate get --name web --icertfile <desired-filename>.pem
Use openssl to retrieve the Common Name (CN) of the certificate, which will become the
server_name
value in Prometheus.openssl x509 -noout -subject -in <your-file>.pem
Example response:
subject=C = US, ST = MD, L = Belcamp, O = Gemalto, CN = web.keysecure.local
The CN value,
web.keysecure.local
, is the value needed for Prometheus.Copy the certificate file to the
trusted_cas
folder in the Prometheus Metrics directory.Edit the
prometheus.yaml
file to include theca_file
path andserver_name
of the certificate, and disable theinsecure_skip_verify
parameter. For example:scrape_configs: - job_name: "CipherTrust Manager" scheme: "https" tls_config: ca_file: "/trusted_cas/web-keysecure-local.pem" server_name: "web.keysecure.local" #insecure_skip_verify: true bearer_token: "TnRHpdL9v8MnWv8DhN9xuAaKgPevMEZs" metrics_path: "/api/v1/system/metrics/prometheus" static_configs: - targets: - "1.1.1.1"
In the Prometheus directory run
make up
to start the stack.Note
You can run
make down
to stop the stack andmake clear
to stop the stack and all persisted data.Visit the Prometheus Dashboard in a browser at http://localhost:9090.
Navigate to Status > Target to ensure that Prometheus is scraping CipherTrust Manager. The state should display as
UP
for each node, with no errors.If you detect a problem, verify the metrics endpoint on CipherTrust Manager with
ksctl metrics prometheus get --api-token <api-token>
, orcurl -k 'https://<hostname>/api/v1/system/metrics/prometheus' -H 'Authorization: Bearer <api-token>' --compressed
). You can also use The Docker Compose logs to debug problem, withdocker-compose logs -f
.
Visit the Grafana Dashboard in a browser at http://localhost:3000.
Login with the user
admin
and the passwordadmin
. Set a new password when prompted.Go to Dashboards -> Home to view the included dashboards.
Metrics Prefixes
The following high-level categories of metrics are returned from the endpoint:
Prefixes | Metrics Type | Prometheus Exporter, Package, or Integration |
---|---|---|
ciphertrust_ | Metrics for specific CipherTrust resource applications. For example, size of the server audit log or key cache hits. | N/A |
dummy_ | Custom internal metrics that can be disregarded. | N/A |
docker_ | Metrics for the Docker containers that underlie CipherTrust Manager microservices, such as state, lifecycle, and resource usage. | docker_exporter |
go_ | Metrics gathered by the Go runtime. Most useful for debugging purposes with Thales engineers. | go-metrics |
http_ , httpclient_ | Metrics about HTTP traffic to and from the CipherTrust Manager REST API endpoints. For example, response time and number of requests to an endpoint. | N/A |
node_ | Metrics for the CipherTrust Manager host, such as CPU and disk details. | Node exporter |
process_ | Metrics for microservices written in Go, including CPU, memory and file descriptor usage as well as the process start time. | Process collector of the go Prometheus package |
promhttp_ | Measures number of times individual microservices are called divided by HTTP code. | promhttp Go package |
sql_ | CipherTrust-specific metrics collected to analyze performance issues. Most useful for debugging purposes with Thales engineers. | N/A |
Available Metrics Dashboards
The following dashboards are displayed in Grafana for CipherTrust Manager:
CipherTrust Manager Developer - Metrics relevant to internal CipherTrust Manager developers to debug problems. This includes:
Average JWT processing time
Applications and Accounts Totals
Key Encryption Key (KEK)s Count
Authorization Policies Cache Hits pr Minute
Average Prometheus Metrics Scraping Response Time
CipherTrust Manager Host - Metrics about the health of the CipherTrust Manager host, including CPU details, memory details, network details, network connections, and disk details.
CipherTrust Manager HTTP Traffic - Metrics about HTTP traffic to the CipherTrust Manager. This includes:
Average HTTP Response Time Per Minute
HTTP Requests in the Last Minute
Average Network Latency Per Minute
Average CM HTTP Client Response Time Per Minute
HTTP 500 Errors in the Last Minute
CipherTrust Manager NAE - Basic metrics about the performance of the NAE-XML cryptographic interface. This includes XML response time and XML processing time.
CipherTrust Manager NAE Developer - More detailed metrics about operations and performance on the NAE-XML interface, intended for debugging. This includes:
Key Info Cache Misses Time Per Minute
Key Info Cache Hits Time Per Minute
XML Total Processing time
XML Parsing Time
XML Transmit Time
XML Receive Time
XML Execution Time
CipherTrust Manager Resources - Metrics about creation and use of objects on CipherTrust Manager, such as audit records, keys, licenses, backup, and users. This includes:
Audit Records Created Per Second Over The Last Minute
Audit Records Created In Last Five Minutes
Total Number of Audit Records
Total Number of Keys By Algorithm
Crypto Operations Per Second Over The Last Minute
Total Number Of Connector Licenses Deployed
Number of License Units Consumed
License Unit Consumption by Percentage
Total Number Of Group Users in the System
Total Number Of Key Rotations
Key Rotations In Last Five Minutes
Time Taken To Create Backup
Number of Backups taken
CipherTrust Manager Services - Metrics about the performance of individual microservices within CipherTrust Manager, intended for debug purposes. This includes:
CPU percentage
Memory usage
Network I/O (transmitting and receiving)
Disk I/O (reading and writing)
CipherTrust Manager Node Metrics - Metrics of the nodes in a clustered system showing the node connection information. This includes:
Write, flush, and replay lags
Sent, write, flush, and replay lag sizes
Whether replication is blocked
Whether node is connected
Connect time for a node
Apply rate
Catchup interval
Uptime for a connection