Managing Information Types
An information type (infotype) categorizes data to look for during a scan. A large number of prebuilt information types are available to better categorize the data.
Different regions and countries can have different regulatory requirements, so these information types are categorized based on geographical regions. These regions are Global, Africa, Americas, Asia, Europe, and Oceania. The information types can be further categorized into:
Financial: Financial data such as credit card numbers and bank account details.
Personal Data: Personal data such as age, gender, race, and religion.
Medical: Medical data such as history of medical problems and disabilities.
National ID: National identity documents such as Social Security Number (SSN).
For a list of all available predefined information types, refer to the appendix Information Types.
also allows you to create custom information types. For more information, see Creating Custom Infotypes.
Creating Custom Infotypes
You can create a custom information type, if you require one. This can be achieved from the Infotypes screen. To access it, click Settings and then Infotypes in the sidebar on the left.
Click the +Add Infotype button in the top right corner of the Infotypes screen. The Add Infotype wizard is displayed.
In the General Info step of the wizard, provide the following information for your new infotype:
Name: Choose a name for your infotype.
Category: Select a category to which your infotype belongs (Financial, Personal Data, Medical, or National ID).
Family: Select a family for your infotype. A family is a subcategory inside the Category and the choice of options depends on what you selected in the Category menu. The following families are available inside their corresponding categories:
Financial: Credit/Debit Cards; Bank Account Info
Medical: Patient Health Data
National ID: Personal Identification
Personal Data: Email addresses; Login credentials; Card Number; Ethnicity; License Number; Roll Number; Passport Number; Date Of Birth; MAC Address; Mailing Address; Telephone Number; Gender; Religion; IP Address; Phone Number; Name
Weight: Set the risk weight for the custom infotype, which will be used to calculate the risk formula per Data Object. Minimum weight is 1 and maximum is 10. That weight can be modified later, if needed, by editing the custom infotype in the Infotypes screen.
Note
- You can set a weight only for a custom infotype - built-in infotypes have a pre-defined weight than cannot be changed.
- If you update a custom weight, you need to run a new scan and generate a new report to see the effects of the update. Historical reports will not be updated.
- You can set a weight only for a custom infotype - built-in infotypes have a pre-defined weight than cannot be changed.
Region: Select the region for your infotype (Global, Africa, Americas, Asia, Europe, and Oceania).
Click Next to go to the next step of the wizard.
In the Infotype Definition step of the wizard, you configure the rules for your new information type. You can start by configuring the rules in the Simple View tab, and then you can view or edit these rules as translated into an expression of the internal language of the Data Discovery and Classification engine in the Expert View tab. Each new expression is validated when you press SAVE button, and errors, if there are any, are displayed.
Tip
Alternatively you can directly click Expert View tab ignoring the Simple View. If you add in the RULE text area any expression not supported by the Simple View this tab will be disabled. For an extended guide on the expressions available in this internal language refer to DDC GLASS Reference.
To configure the rules for your new information type, click to expand the Add Rules menu in the Simple View tab and select one of the following types:
Character: Search for one or more specific characters as specified in the Select Rule menu. If the character is found, the location will be returned as a match. For a list of available character type rules, refer to "Character Type Rules Explained".
Use the From and To, controls to set the number of consecutive occurrences of the selected character.
Phrase: Search for a specific pattern as defined in the Phrase textbox (in layman terms, it is used to look for specific words). Searching for phrases is case insensitive.
Built-In: Pre-defined infotypes can be used in combination with other types (Character or Phrase). The complete list of built-in information types is available in the appendix Information Types.
Note
Currently, you cannot use the Indian Passport Number built-in infotype for creating custom infotypes.
Use the Apply button to complete your selection. The selection is displayed in the list of defined infotype rules. You can remove it from the list of rules by clicking the Remove link on its right.
You can use each of these types on their own, or combine them to form a more complex rule, involving multiple types in various configurations. See Examples of Custom Infotypes for some examples.
Due to a known limitation, it is currently not possible to have two built-in infotypes, one after the other. They must be separated by something, such as another type infotype, or even a plain space. For example:
"american express" " " "english name"
Due to a known limitation, when adding a range of characters, all the possible combinations inside the range will appear as matches. For example, for a range of numbers from 1 to 4, when in a document you have a sequence1234
, the search will yield the following matches:"1", "12","123" and "1234"
Note
When you introduce spaces at the beginning or the end of the phrase, the spaces are removed. Also, when you introduce more than one space between words, only one space is considered.
Click Save to save your new infotype. Your new information type has been added and is now listed in the Infotypes screen, and marked 'Custom' in the Type column.
Character Type Rules Explained
Specific predefined characters are used to create custom infotypes using character based rules. They are explained below:
Rule | Expert View Keyword | Match |
---|---|---|
Space | SPACE | Any white-space character. |
Horizontal space | HSPACE | Tab characters and all Unicode "space separator" characters. |
Vertical space | VSPACE | All Unicode "line break" characters. |
Any | BYTE | Wildcard character that will match any character. |
Alphanumeric | ALNUM | ASCII numerical characters and letters. |
Alphabet | LETTER | ASCII alphabet characters. |
Digit | DIGIT | ASCII numerical characters. |
Printable | PRINTABLE | Any printable character. |
Printable ASCII only | PRINTABLEASCII | Any printable ASCII character, including horizontal and vertical white-space characters. |
Printable non-alphabet | PRINTABLENONALPHA | Printable ASCII characters, excluding alphabet characters and including horizontal and vertical white-space characters. |
Printable non-alphanumeric | PRINTABLENONALNUM | Printable ASCII characters, excluding alphanumeric characters and including horizontal and vertical white-space characters. |
Graphic | GRAPHIC | Any ASCII character that is not white-space or control character. |
Same line | SAMELINE | Any printable ASCII character, including horizontal white-space characters but excluding vertical white-space characters. |
Non-alphanumeric | NONALNUM | Symbols that are neither a number nor a letter; e.g. apostrophes ‘, parentheses (), brackets [], hyphens -, periods ., and commas ,. |
Non-alphabet | NONALPHA | Any non-alphabet characters; e.g. ~ ` ! @ # $ % ^ & * ( ) _ - + = |
Non-digit | NONDIGIT | Any non-numerical character. |
Examples of Custom Infotypes
Example 1
You want to search for a "Driver License Number" from Illinois, whose format is M532-4218-1341
. You would then create the following rule:
Character Alphabet From 1 to 1 Times
Character Digit From 3 to 3 Times
Phrase -
Character Digit From 4 to 4 Times
Phrase -
Character Digit From 4 to 4 Times
The above example will have the following syntax in the Expert View:
RANGE LETTER TIMES 1-1
THEN RANGE DIGIT TIMES 3-3
THEN WORD NOCASE '-'
THEN RANGE DIGIT TIMES 4-4
THEN WORD NOCASE '-'
THEN RANGE DIGIT TIMES 4-4
In this rule, the expression " Character Alphabet From 1 to 1 Times " means that you only expect one alphabetic character, the expression " Character Digit From 3 to 3 Times " means that you expect exactly three digits, and the expression " Phrase -" means that you expect to find a hyphen (-) in the sequence.
Example 2
You want to search for a name and last name separated by a number of spaces between 1 and 3. To that end you would create the following rule:
Phrase John
Character Space From 1 to 3
Phrase Gordon
The above example will have the following syntax in the Expert View:
WORD NOCASE 'John'
THEN RANGE SPACE TIMES 1-3
THEN WORD NOCASE 'Gordon'
This rule will allow you to search for a combination of John
and Gordon
with one through three spaces between them. By comparison, using the rule PhraseJohn Gordon
will only allow you to search for a combination John Gordon
, with only one space. Any additional spaces in the phrase will be truncated.
Example 3
You want to search for the Spanish NIE (foreigner's identity number) preceded by the phrase NIE:
and a number of spaces between 0 and 5. For example, NIE X8691474Q
.
Phrase NIE
Character Space From 0 to 5
Built-in Spanish NIE
The above example will have the following syntax in the Expert View:
INCLUDE 'DEFINE_NID'
WORD NOCASE 'NIE'
THEN RANGE SPACE TIMES 0-5
THEN REFER 'NID_SPAIN_NIE'
This rule will find both NIE X8691474Q
and nie x8691474q
since searching (regardless of the type - Character, Phrase, or Built-in) is case insensitive.
Search Precision
Search precision allows you to change the precision level (in other words, the confidence type) for certain infotypes. Search precision is supported for built-in infotypes. Infotypes that Support Search Precision lists all built-in infotypes that support search precision.
Modifying Search Precision for an Infotype
You can change the precision for a infotypes with which you want to scan, before creating a scan. This can be achieved from the Infotypes screen.
To access it, on the DDC main page click Classification Profiles then Infotypes in the sidebar on the left.
To change the precision value for an infotype, click the toggle switch in the Search Precision column - select LOW or HIGH.
HIGH - higher search confidence (i.e. precision), in other words, more robust search.
LOW - lower search confidence, in other words, more relaxed search.
In case an infotype does not support search precision, it will show N/A in the Search Precision column.
Creating Scans with Modified Search Precision
To create a scan with an infotype with a modified search precision, you use the same steps as for creating any scan for any datastore. Just make sure that you select the classification profile that uses the infotype that you modified. For more information, see Adding Scans.
Reports with Modified Search Precision
Scan Trend Report: When a scan is generated using search precision, the scan trend report will display an asterisk "*" for the scans when the info type Search Precision value was changed.
Aggregated Report: Data objects in aggregated reports will display an asterisk "*" after the infotypes detected where search precision was modified (this only applies to low search precision, as high precision setting will not display asterisks).
For more information on these reports, see Report Details.
Infotypes that Support Search Precision
Below, you have a list of all infotypes that support search precision. Within brackets you have the infotype IDs as coded internally by DDC, for example to be used with with the REST API. For more information about the REST API, see CLI.
Note
The API will apparently allow you to change the search precision level for any infotype, however, this will not work if that infotype does not support search precision.
- Australian Business Number (BANK_AUSTRALIA_ABN)
- Australian Company Number (BANK_AUSTRALIA_ACN)
- United Kingdom VAT Number (BANK_UNITED_KINGDOM_VAT)
- Australian Bank Account Number (BANK_AUSTRALIA_BANK_ACCOUNT)
- Canadian Bank Account Number (BANK_CANADA_BANK_ACCOUNT)
- Japanese Bank Account Number (BANK_JAPAN_BANK_ACCOUNT)
- South Korean Corporation Registration Number (BANK_SOUTH_KOREA_CRN)
- South Korean Taxpayer Identification Number (BANK_SOUTH_KOREA_TIN)
- South Korean NH Bank Account Number (BANK_SOUTH_KOREA_BANK_ACCOUNT_NH)
- South Korean KB Bank Account Number (BANK_SOUTH_KOREA_BANK_ACCOUNT_KB)
- South Korean KEB Hana Bank Account Number (BANK_SOUTH_KOREA_BANK_ACCOUNT_KEB_HANA)
- South Korean Shinhan Bank Account Number (BANK_SOUTH_KOREA_BANK_ACCOUNT_SHINHAN)
- South Korean Gwangju Bank Account Number (BANK_SOUTH_KOREA_BANK_ACCOUNT_GWANGJU)
- South Korean Jeju Bank Account Number (BANK_SOUTH_KOREA_BANK_ACCOUNT_JEJU)
- South Korean Jeonbuk Bank Account Number (BANK_SOUTH_KOREA_BANK_ACCOUNT_JEONBUK)
- Australian Tax File Number (NID_AUSTRALIA_TFN)
- Austrian SSN (NID_AUSTRIA_SSN)
- Austrian Personalausweis (NID_AUSTRIA_PERSONALAUSWEIS)
- Belgian eID (NID_BELGIUM_EID)
- Belgian National Number (NID_BELGIUM_NN)
- Brazilian CPF (NID_BRAZIL_CPF)
- Brazilian Registro Geral (NID_BRAZIL_RG)
- Bulgarian EGN (NID_BULGARIA_EGN)
- Canadian Social Insurance Number (NID_CANADA_SIN)
- Chilean RUN (NID_CHILE_RUN)
- Croatian OIB (NID_CROATIA_OIB)
- Czech Republic RC (NID_CZECH_RC)
- Danish CPR (NID_DENMARK_CPR)
- French Carte Vitale (NID_FRANCE_CV)
- French INSEE (NID_FRANCE_INSEE)
- French CNI (NID_FRANCE_CNI)
- Greek AFM (NID_GREECE_AFM)
- German Personalausweis (NID_GERMAN_PERSONALAUSWEIS)
- Hong Kong ID (NID_HONGKONG_HKID)
- Hungarian Personal ID (NID_HUNGARY_PIN)
- Icelandish Kennitala (NID_ICELAND_ID)
- Indian Aadhaar Number (NID_INDIA_AADHAAR)
- Iranian National Identification Number (NID_IRAN_NID)
- Irish Personal Public Service Number (NID_IRELAND_PPS)
- Israeli Identity Number (NID_ISRAEL_ID)
- Italian Codice Fiscale (NID_ITALY_CF)
- Italian CARTA D'IDENTITÀ (NID_ITALY_CID)
- Latvian Personas Kods (NID_LATVIA_PK)
- Luxembourg ID (NID_LUXEMBOURG_ID)
- Malaysian NRIC (NID_MALAYSIA_NRIC)
- Maltese eID (NID_MALTA_EID)
- Dutch Burgerservicenummer (NID_NETHERLANDS_BSN)
- New Zealand Inland Revenue Number (NID_NEW_ZEALAND_IRD)
- Portuguese Citizen's Card (NID_PORTUGAL_CC)
- Portuguese Fiscal Number (NID_PORTUGAL_FN)
- Portuguese Identity Number (NID_PORTUGAL_IN)
- Romanian Identity Card (NID_ROMANIA_IDC)
- Slovakian RC (NID_SLOVAKIA_RC)
- South African Identity Number (NID_SOUTH_AFRICA_ID)
- South Korean RRN (NID_SOUTH_KOREA_RRN)
- South Korean Foreigner Number (NID_SOUTH_KOREA_FRN)
- Spanish DNI (NID_SPAIN_DNI)
- Sri Lankan National Identity Card (NID_SRI_LANKA_NIC)
- Swedish Personnummer (NID_SWEDEN_PIN)
- Swiss Social Security Number (NID_SWITZERLAND_AVS)
- United Arab Emirates ID (NID_UAE_IC)
- United Kingdom NI Number (NID_UNITED_KINGDOM_NINO)
- United Kingdom Self Assessment UTR Number (NID_UNITED_KINGDOM_SA_UTR)
- United States Social Security Number (NID_UNITED_STATES_SSN)
- Australian Medicare Card (PHD_AUSTRALIA_MEDICARE)
- United Kingdom Health and Care Number (PHD_UNITED_KINGDOM_HCN)
- United Kingdom National Health Service Number (PHD_UNITED_KINGDOM_NHS)
- United States Health Insurance Claim Number (PHD_UNITED_STATES_HICN)
- United States Health Plan Identifier (PHD_UNITED_STATES_HPID)
- United States Medicare Beneficiary Identifier (MBI) (PHD_UNITED_STATES_MBI)
- United States National Provider Identifier (PHD_UNITED_STATES_NPI)
- Australian Telephone Number (PII_AUSTRALIA_PHONE)
- Austrian Telephone Number (PII_AUSTRIA_PHONE)
- Canadian Telephone Number (PII_CANADA_PHONE)
- South Korean Phone Number (PII_SOUTH_KOREA_PHONE)
- Luxembourg Phone Number (PII_LUXEMBOURG_PHONE)
- Portuguese Phone Number (PII_PORTUGAL_PHONE)
- Credentials username (PII_MISC_CREDENTIALS_USERNAME_ASCII)
- Dutch Driver License Number (PII_NETHERLANDS_DLN)
- Dutch Telephone Number (PII_NETHERLANDS_PHONE)
- New Zealand Telephone Number (PII_NEW_ZEALAND_PHONE)
- United Kingdom Telephone Number (PII_UNITED_KINGDOM_PHONE)
- United States Telephone Number (PII_UNITED_STATES_PHONE)
- United States Mailing Address (PII_USA_ADDRESS)
- Turkish Telephone Number (PII_TURKISH_PHONE)
- Spanish Driver License Number (PII_SPAIN_DLN)
- Spanish Telephone Number (PII_SPAIN_PHONE)
- Belgian Telephone Number (PII_BELGIUM_PHONE)
- French Telephone Number (PII_FRANCE_PHONE)
- German Driver License Number (PII_GERMAN_DLN)
- German Passport Number (PII_GERMAN_PASSPORT)
- German Telephone Number (PII_GERMAN_PHONE)
- Irish Driver License Number (PII_IRELAND_DLN)
- Irish Passport Number (PII_IRELAND_PASSPORT)
- Irish Telephone Number (PII_IRELAND_PHONE)
- Italian Telephone Number (PII_ITALY_PHONE)
- Polish Passport Number (PII_POLAND_PASSPORT)
- Polish Telephone Number (PII_POLAND_PHONE)
- Portuguese Driver License Number (PII_POLAND_DLN)
- Swedish Driver License Number (PII_SWEDEN_DLN)
- South Korean Passport (PII_SOUTH_KOREA_PASSPORT)
- South Korean Driver License Number (PII_SOUTH_KOREA_DLN)
For more information on these infotypes, refer to the appendix Information Types.