Concepts
Location
A location specifies a site where the file servers, databases, and data centers that contain data to scan are located. Locations are used to indicate where different data stores are physically located. For more information see Locations.
Sensitivity Levels
A sensitivity level defines how sensitive the data is. Sensitivity levels are required in creating classification profiles and data stores. Prebuilt sensitivity levels are:
None: The sensitivity level for such data has not yet been specified.
Public: Specifies the least sensitive data with no specific need for data security. Such data can be shared with anybody.
Internal: Specifies the data with low sensitivity. Exposure of such data may not affect an organization, but is not meant for public disclosure.
Private: Specifies that the data is personal. Such data should be protected from public viewing.
Restricted: Specifies highly sensitive data, for example, customer's personal data and trade secrets etc. This type of data requires the best possible data security. Disclosure of such data can lead to severe financial and legal consequences for an organization. Businesses must prioritize remediation efforts related to this type of data.
Information Type
An information type (or infotype) categorizes data to look for during a scan. A large number of predefined information types are available to better categorize the data. For more information see Information Types.
Tag
A tag helps group data together. Tags are used to filter data for generating reports. They can be specified when creating data stores and classification profiles.
Data Discovery and Classification includes a number of predefined Tags, but also provides the ability to create custom Tags when creating data stores and classification profiles.
Predefined Tags
The predefined Tags are APA, APPI, CCPA, FINANCIAL, GDPR, GDPR-FINANCIAL, GDPR-HEALTHCARE, GDPR-ID, GDPR-PII, HEALTH, HIPAA, KVKK, LEGAL, LGPD, NYDFS, PCI, PERSONAL, PHI, PII, SHIELD and UK-GDPR.
Classification Profile
A classification profile defines what kind of sensitive information to search for during a scan. It includes information such as a sensitivity level, information types, and tags. Classification profiles can be created based on predefined templates or custom templates. For more information see Classification Profiles.
Data Object
A file, a database table, and a BLOB in a database table as stored in a data store are called Data Objects.
Sensitive Data Object
A data object that contains any data match is called a Sensitive Data Object.
Data Match
A concrete instance of any of the infotypes is called a Data Match.
Risk
A risk is the presence of a sensitive data object in a data store.The risk is calculated per the data object and data store. The risk is directly related with the matches found in the data object or data store.
Scan
A scan is an entity that helps in scanning data stores. Each scan specifies the location to scan and what to look for during scanning. Findings of scans can be used to generate reports for different purposes. Scans can be either run manually (any time) or scheduled to run and stop at a specified time.
Note
DDC supports partial database scanning. To enable this, you need to configure the number of rows to be scanned. When you run a database scan, the scan results included data from the specified number of rows only.
If you don't specify the number of rows to scan, the entire databases will be scanned.
For more information see Scans.
Encryption Keys
DDC uses AES256 encryption to protect sensitive data. DDC creates a number of encryption keys that are stored in CM. You can find these DDC keys in the Keys & Access Management application in CM.
Four encryption keys to protect the Hadoop configuration before storing it inside the DDC Database. Each key is used to protect one configuration parameter (HDFS Server, and HDFS credentials).
- Encryption key format: citrus-<UUID>. Example: citrus-6e0cb668-3a3d-4f2c-8687-17092b83b41b.
One encryption key for each data store. Each key is used to encrypt data store credentials before storing it in the DDC database, and encrypt scan results of that data store before storing them in HDFS. Datastore encryption key formats:
- Prior to Kylo-2.15: d<UUID>. Example: d8b2d8404-c9ae-4a34-800a-01258dfaa383.
- Kylo-2.15 onwards: DDC-Data-Store-<UUID>. Example: DDC-Data-Store-8b2d8404-c9ae-4a34-800a-01258dfaa383.
One encryption key for each scan. Each key is used to encrypt the scan data before storing it in HDFS. Scan encryption key formats:
- Prior to Kylo-2.15: s<UUID>. Example: s14912791-bed5-4e73-b733-6a36ecfe338f.
- Kylo-2.15 onwards: DDC-Scan-<UUID>. Example: DDC-Scan-14912791-bed5-4e73-b733-6a36ecfe338f.
Warning
Never delete any encryption key. Otherwise, DDC will not be able to process the related datastores and scans properly.