DDC overview
This section provides an overview of the Thales CipherTrust Data Discovery and Classification (DDC) solution.
Workflow
This diagram provides a high-level flow of the DDC and DDC ML solution.
Solution architecture
DDC solution operates using a standard DDC architecture and a DDC ML architecture. The two architectures differ in types of components included and in the way these components are configured and interact with one another.
DDC architecture
The following image shows the standard DDC architecture.
Architecture components
The following table outlines different architecture components along with their respective roles:
Component | Description |
---|---|
CipherTrust Manager, DDC Server | At the heart of the Data Discovery and Classification solution is CipherTrust Manager on which runs the DDC Server. It is from here that users interact with the DDC GUI or use the DDC APIs to create classification profiles, add data stores, launch scans and generate reports. |
TDP (On-prem): Hadoop, Spark, HDFS | TDP (On-prem) is configured to work with Hadoop data clusters. DDC uses Hadoop to generate reports from scans and to store their results (report data). DDC can directly query HDFS but it requires Spark to interface with Hadoop's HBase. For installing TDP (On-prem), see Thales Data Platform Deployment Guide. |
TDPaaS | TDPaaS is a server-less, cloud-based service used for storing the scan and report data. It is a SaaS component that offers an alternative to Hadoop services of on-prem TDP. For configuring DDC to use TDPaaS, see Configuring TDPaaS. |
DDC Agent | DDC Agents perform the actual scanning jobs and report the results back to the DDC Server for analysis and processing. DDC supports two types of Agent configurations: • Local Agents: Installed and configured directly on the machine that contains sensitive data. • Proxy Agents: Installed and configured on a proxy machine that is used to scan sensitive data on other machines. |
Data Stores | A data store is where the data actually resides. It can be a file server, a database, or a Hadoop cluster. For more information, see Discovering Sensitive Information. |
DDC ML architecture
The following image shows the DDC ML architecture.
Architecture components
The following table outlines different architecture components along with their respective roles:
Component | Description |
---|---|
CipherTrust Manager, DDC Server | At the heart of the Data Discovery and Classification solution is CipherTrust Manager on which runs the DDC Server. It is from here that users interact with the DDC GUI or use the DDC APIs to create classification profiles, add data stores, launch scans and generate reports. |
TDPaaS | TDPaaS is a server-less, cloud-based service used for storing the scan and report data. It is a SaaS component that offers an alternative to Hadoop services of on-prem TDP. For configuring DDC to use TDPaaS, see Configuring TDPaaS. |
MLaaS | MLaaS (Machine Learning as a Service) is a multi-tenant service on Google Cloud Platform that is completely managed. It offers pre-trained machine learning models to complete tasks like text embedding, document classification, and named entity recognition. MLaaS works with CipherTrust Manager to process scan, index, and search requests. It takes the scan metadata information produced by the DDC ML agent and uploads it to Google Cloud. It then creates an inverted index that connects embedding and document metadata, allowing search for similar items using the k-nearest neighbors algorithm. For configuring DDC to use MLaaS, see Configuring MLaaS. |
DDC ML Agent | A unified agent with machine learning capabilities that also encompasses the features of the standard DDC agent. DDC ML agents perform the actual scanning jobs and report the results back to the DDC Server for analysis and processing. For installing DDC ML agent, see DDC ML Agent. |
Data Stores | A data store is where the data actually resides. Currently, DDC ML supports only Local and Network data stores. For more information see Discovering Sensitive Information. |