Concepts
Location
A location specifies a site where the file servers, databases, and data centers that contain data to scan are located. Locations are used to indicate where different data stores are physically located. For more information see Locations.
Sensitivity levels
A sensitivity level defines how sensitive the data is. Sensitivity levels are required in creating classification profiles and data stores. Prebuilt sensitivity levels are:
None: The sensitivity level for such data has not yet been specified.
Public: Specifies the least sensitive data with no specific need for data security. Such data can be shared with anybody.
Internal: Specifies the data with low sensitivity. Exposure of such data may not affect an organization, but is not meant for public disclosure.
Private: Specifies that the data is personal. Such data should be protected from public viewing.
Restricted: Specifies highly sensitive data, for example, customer's personal data and trade secrets etc. This type of data requires the best possible data security. Disclosure of such data can lead to severe financial and legal consequences for an organization. Businesses must prioritize remediation efforts related to this type of data.
Information type
An information type (or infotype) categorizes data to look for during a scan. A large number of predefined information types are available to better categorize the data. For more information see Information Types.
Tag
A tag helps group data together. Tags are used to filter data for generating reports. They can be specified when creating data stores and classification profiles.
Data Discovery and Classification includes a number of predefined Tags, but also provides the ability to create custom Tags when creating data stores and classification profiles.
Predefined tags
The predefined Tags are APA, APPI, CCPA, FINANCIAL, GDPR, GDPR-FINANCIAL, GDPR-HEALTHCARE, GDPR-ID, GDPR-PII, HEALTH, HIPAA, KVKK, LEGAL, LGPD, NYDFS, PCI, PERSONAL, PHI, PII, SHIELD and UK-GDPR.
Classification profile
A classification profile defines what kind of sensitive information to search for during a scan. It includes information such as a sensitivity level, information types, and tags. Classification profiles can be created based on predefined templates or custom templates. For more information see Classification Profiles.
Data object
A file, a database table, and a BLOB in a database table as stored in a data store are called Data Objects.
Sensitive data object
A data object that contains any data match is called a Sensitive Data Object.
Data Match
A concrete instance of any of the infotypes is called a Data Match.
Risk
A risk is the presence of a sensitive data object in a data store.The risk is calculated per the data object and data store. The risk is directly related with the matches found in the data object or data store.
Scan
A scan is an entity that helps in scanning data stores. Each scan specifies the location to scan and what to look for during scanning. Findings of scans can be used to generate reports for different purposes. Scans can be either run manually (any time) or scheduled to run and stop at a specified time.
Note
DDC supports partial database scanning. To enable this, you need to configure the number of rows to be scanned. When you run a database scan, the scan results included data from the specified number of rows only.
If you don't specify the number of rows to scan, the entire databases will be scanned.
For more information see Scans.
Labels
Labels in Data Discovery Classification (DDC) are used to categorize and control standard agents. Labels are tags that highlight an agent's specific features, like its ability to scan certain types of data stores or its performance capabilities. An agent can have several labels. Agent labels can only be created in the Labels section inside the agents. See Managing agent labels to learn how to add or edit agent labels.
Labels have following characteristics:
Customizable: There are no pre-defined labels. You can create them based on your requirements. For example, you could label an agent "Paris Data Center" or "AWS" to ensure DDC selects an agent that is co-located with the data store, providing low latency and high bandwidth access. Alternatively, you might use labels like "Critical" for agents dedicated to scanning highly sensitive data.
Immutable after creation: Once a custom label is created, you can't modify it. However, you can always assign new or existing labels to an agent.
Agent labels vs data store labels:
It's important to understand the relationship between agent labels and data store labels:
Agent Labels: These represent an agent's capabilities.
Data Store Labels: These indicate the capabilities required for any agent to scan that specific data store.
For an agent to be selected to scan a particular data store, it must have all the labels defined for that data store. It can, however, have additional labels. See Automatic agent selection for more information.
You can also use labels to ensure that only the right agents handle your sensitive data, increasing DDC deployment's security. For more information, see Mitigating security risks.
Encryption keys
DDC uses AES256 encryption to protect sensitive data. DDC creates a number of encryption keys that are stored in the CipherTrust Manager. You can find these DDC keys in the Keys & Access Management application in the CipherTrust Manager.
Four encryption keys to protect the Hadoop configuration before storing it inside the DDC database. Each key is used to protect one configuration parameter (HDFS Server, and HDFS credentials).
- Encryption key format:
citrus-<UUID>
. Example:citrus-6e0cb668-3a3d-4f2c-8687-17092b83b41b
.
- Encryption key format:
One encryption key for each data store. Each key is used to encrypt data store credentials before storing it in the DDC database, and encrypt scan results of that data store before storing them in HDFS. Datastore encryption key formats:
- For CipherTrust Manager version 2.15 or earlier:
d<UUID>
. Example:d8b2d8404-c9ae-4a34-800a-01258dfaa383
. - For CipherTrust Manager version 2.16 or later:
DDC-Data-Store-<UUID>
. Example:DDC-Data-Store-8b2d8404-c9ae-4a34-800a-01258dfaa383
.
- For CipherTrust Manager version 2.15 or earlier:
One encryption key for each scan. Each key is used to encrypt the scan data before storing it in HDFS. Scan encryption key formats:
- For CipherTrust Manager version 2.15 or earlier:
s<UUID>
. Example:s14912791-bed5-4e73-b733-6a36ecfe338f
. - For CipherTrust Manager version 2.16 or later:
DDC-Scan-<UUID>
. Example:DDC-Scan-14912791-bed5-4e73-b733-6a36ecfe338f
.
- For CipherTrust Manager version 2.15 or earlier:
Warning
Never delete any encryption key. Otherwise, DDC will not be able to process the related datastores and scans properly.