Services
Ambari User Interface
The Ambari UI contains operations for the management of Thales Data Platform services. It supports operations such as configuring a service and restart of a service.
Accessing Ambari
The Ambari server is configured to use TLS (https) at port 443 and using a self-signed certificate.
Access the Ambari GUI by using the hostname configured in the Set the Hostname.
Log in as admin with the password that you set in the step Configure Ambari.
Tip
Ambari UI Not responding If you cannot access the Ambari UI, try resetting the Ambari service. Open an SSH session on the machine and run the command ambari-server restart
Service Auto Start
The Auto-Start Settings option is enabled by default, but only the Ambari Metrics Collector component is set to auto-start by default.
If the cluster experiences an unexpected shutdown, you need to manually start all the services via the Ambari Web UI. To do that, click ... at the top of the Services menu in the toolbar on the left, then select Start All:
If you require additional components to auto-start, you need to enable that by selecting their corresponding checkboxes in the Admin / Service Auto Start screen.
For details on enabling auto-start for additional components, refer to the official Ambari documentation at:
HDFS
Browsing HDFS via User Interface
Open the Namenode UI:
Click to expand the Utilities menu and next click the Browse the file system link.
HDFS ACL's
HDFS can use an ACL to check if the user has permissions to read or write on a particular folder. The user(s) admin(s) of HDFS can be configured as an ACL (Access Control List) on the property dfs.cluster.administrators/
via Ambari. For more information refer to the ACLs section of the Apache Hadoop HDFS Permissions Guide.
Please check HDFS Structure and Permissions for DDC for more details regarding HDFS usage. For additional information about the HDFS groups and users, refer to InformIT HDFS Commands, HDFS Permissions and HDFS Storage and Apache Hadoop HDFS Commands Guide.
HDFS Structure and Permissions for DDC
Note
The HDFS structure information applies to DDC version 2.4.
Regarding the HDFS permissions, this information is color-coded inside the directory structure below, in the following manner:
- Read and write
- Read only
No access
ciphertrust_ddc/
(root folder)dataLake/
(Kylo, Knox)dataObject/
PartitionedBy=ScanID, ScanExecutionID, DatastoreID. DataObjects information in parquet format partitioned for queries. Generated by Clover processor.global/
PartitionedBy=ScanID, ScanExecutionID, DatastoreID. Scan configuration information in parquet format partitioned for queries. Generated by Clover processor.remediation/
PartitionedBy=ScanExecutionID. Generated by Clover. This persists the remediation PQS tables for reporting.reports/
{report-ID}/
Both sub-folders contain JSON files for the info cards of the Scan and Datastore interfaces.dataStoreSummary/
finalScanSummary/
scanExecutions/
PartitionedBy=ScanID, year, Json file with ScanExecutionIDS. Used for reporting.
dataObjects/
(Kylo, Knox){scanExecutionID}/
Generated by Oleander. Contains the DataObject information of a particular Scan execution.
er2-reports/
(Kylo, Knox){scanExecutionID}/
Generated by Oleander. Contains the raw data of Er2 compressed and cyphered.
global/
(Kylo, Knox){scanExecutionID}/
Generated by Oleander. Contains the scan configuration information of a particular Scan execution.
installation/
(Kylo, Knox)reports/
(Kylo, Knox)scan/
aggregated/
{report-template-ID}/
{report-ID}/
Generated by Clover. All these folders are used for building the Scan Report.dataObjectDetail/
dataObjectSummary/
dataStoreDetails/
infoTypesSummary/
scanDetails/
summaryReport/
dynamic/
{report-ID}/
{dynamic-query-id}/
dataObjectDetails/
Generated by Clover. Contains a Json file with the result of the dynamic query. This information is used for building the DataObject table after sorting by a particular column, searching or any other query.
YARN Resource Manager
You can get de details of the job execution (performance, executors, memory consumed, etc), you need to go to the YARN Resource Manager UI. To get there, you can click on YARN and then on the ResourceManager UI link, on the Quick Links menu.
Note
The link in the Ambari UI will refer to the user to http://<yarn-node>:8088/ui2/
, but this link is wrong. To get the correct Resource Manager UI just remove /ui2/
from the dir.
Once there, you can find a list of all the jobs. Each job can be in one of these states:
Accepted: The job has been submitted to the cluster, and YARN is waiting to have enough resources to execute the job.
Running: The job has resources assigned and it's executing.
Succeeded: The job has finished succeeded.
Failed: The job has failed by any unexpected reason.
Canceled: The job has been manually cancelled by the user.
DDC launches four different jobs on the cluster:
ScanProcessorDDC: Process the scan data to create the data lake.
ScanReporterDDC: Generate the reports.
DDCPqsTagger: Henry connector for remediation.
DataObjectReporterDDC: Create the data object report based on a dynamic query.
You can click on any application ID to go to the job execution details. There you can check the executors used to run the job, the memory assigned, and even more advanced information like the sql execution plan. Let us take a look at an example of a ScanProcessorDDC to understand what you can watch out for to know if you need to increase the minimum recommended configuration.
Resource Manager UI
In the image below, you can see that the org.thales.ScanProcessorDDC job has finished succeeded. You can also see how much time it took from when it was submitted to when it was completed.
Job Details Screen
If you click on the application id link, you will see the job details page. In this page you can:
see how much time took the job to be completed and the tracking URL, that is detailed in the image below,
go check the job logs.
Job Event Timeline
After clicking the tracking url link, you will see this screen.
It allows you to view the following information:
You can check how many executors have been added.
You can see how much time the job takes to be completed.
The list of jobs that our application consists of.
A link that takes us to the environment screen (below).
From here, you can go to the executors' details.
Environment Screen
In this screen you can confirm that the properties that you configured on Spark are considered by Spark correctly.
Executors Screen
This screen is the most useful when you try to understand if there are enough nodes assigned to each job or you need to add more resources or nodes.
If you check this screen on a finished job, you can get the following information:
RAM memory assigned to the application and if there is any dead node.
Workers assigned to this particular job.
How much memory was allocated for this job inside each node.
How much time inverted each node on the execution.
However, if you want to know how much memory is being consumed during the execution, you need to open the Executors view during the execution. This operation is easier with a scan or report that takes some time to complete.