Services
Ambari User Interface
The Ambari UI contains operations for the management of Thales Data Platform services. It supports operations such as configuring a service and restart of a service.
Accessing Ambari
The Ambari server is configured to use TLS (https) at port 443 and using a self-signed certificate.
Access the Ambari GUI by using the hostname configured in the Set the Hostname.
Log in as admin with the password that you set in the step Configure Ambari.
Browsing HDFS via User Interface
Open the Namenode UI:
Click to expand the Utilities menu and next click the Browse the file system link.
YARN Resource Manager
You can get de details of the job execution (performance, executors, memory consumed, etc), you need to go to the YARN Resource Manager UI. To get there, you can click on YARN and then on the ResourceManager UI link, on the Quick Links menu.
Note
The link in the Ambari UI will refer to the user to http://<yarn-node>:8088/ui2/
, but this link is wrong. To get the correct Resource Manager UI just remove /ui2/
from the dir.
Once there, you can find a list of all the jobs. Each job can be in one of these states:
Accepted: The job has been submitted to the cluster, and YARN is waiting to have enough resources to execute the job.
Running: The job has resources assigned and it's executing.
Succeeded: The job has finished succeeded.
Failed: The job has failed by any unexpected reason.
Canceled: The job has been manually cancelled by the user.
DDC launches four different jobs on the cluster:
ScanProcessorDDC: Process the scan data to create the data lake.
ScanReporterDDC: Generate the reports.
DDCPqsTagger: Henry connector for remediation.
DataObjectReporterDDC: Create the data object report based on a dynamic query.
You can click on any application ID to go to the job execution details. There you can check the executors used to run the job, the memory assigned, and even more advanced information like the sql execution plan. Let us take a look at an example of a ScanProcessorDDC to understand what you can watch out for to know if you need to increase the minimum recommended configuration.
Resource Manager UI
In the image below, you can see that the org.thales.ScanProcessorDDC job has finished succeeded. You can also see how much time it took from when it was submitted to when it was completed.
Job Details Screen
If you click on the application id link, you will see the job details page. In this page you can:
see how much time took the job to be completed and the tracking URL, that is detailed in the image below,
go check the job logs.
Job Event Timeline
After clicking the tracking url link, you will see this screen.
It allows you to view the following information:
You can check how many executors have been added.
You can see how much time the job takes to be completed.
The list of jobs that our application consists of.
A link that takes us to the environment screen (below).
From here, you can go to the executors' details.
Environment Screen
In this screen you can confirm that the properties that you configured on Spark are considered by Spark correctly.
Executors Screen
This screen is the most useful when you try to understand if there are enough nodes assigned to each job or you need to add more resources or nodes.
If you check this screen on a finished job, you can get the following information:
RAM memory assigned to the application and if there is any dead node.
Workers assigned to this particular job.
How much memory was allocated for this job inside each node.
How much time inverted each node on the execution.
However, if you want to know how much memory is being consumed during the execution, you need to open the Executors view during the execution. This operation is easier with a scan or report that takes some time to complete.