Managing Information Types
An information type (infotype) categorizes data to look for during a scan. A large number of prebuilt information types are available to better categorize the data.
Different regions and countries can have different regulatory requirements, so these information types are categorized based on geographical regions. These regions are Global, Africa, Americas, Asia, Europe, and Oceania. The information types can be further categorized into:
Financial: Financial data such as credit card numbers and bank account details.
Personal Data: Personal data such as age, gender, race, and religion.
Medical: Medical data such as history of medical problems and disabilities.
National ID: National identity documents such as Social Security Number (SSN).
For a list of all available predefined information types, refer to the appendix Information Types.
also allows you to create custom information types. For more information, see Creating Custom Infotypes.
Creating Custom Infotypes
You can create a custom information type, if you require one. This can be achieved from the Infotypes screen. To access it, click Settings and then Infotypes in the sidebar on the left.
Click the +Add Infotype button in the top right corner of the Infotypes screen. The Add Infotype wizard is displayed.
In the General Info step of the wizard, provide the following information for your new infotype:
Name: Choose a name for your infotype.
Category: Select a category to which your infotype belongs (Financial, Personal Data, Medical, or National ID).
Family: Select a family for your infotype. A family is a subcategory inside the Category and the choice of options depends on what you selected in the Category menu. The following families are available inside their corresponding categories:
Financial: Credit/Debit Cards; Bank Account Info
Medical: Patient Health Data
National ID: Personal Identification
Personal Data: Email addresses; Login credentials; Card Number; Ethnicity; License Number; Roll Number; Passport Number; Date Of Birth; MAC Address; Mailing Address; Telephone Number; Gender; Religion; IP Address; Phone Number; Name
Region: Select the region for your infotype (Global, Africa, Americas, Asia, Europe, and Oceania).
Click Next to go to the next step of the wizard.
In the Infotype Definition step of the wizard, you configure the rules for your new information type. You can start by configuring the rules in the Simple View tab, and then you can view or edit these rules as translated into an expression of the internal language of the Data Discovery and Classification engine in the Expert View tab. Each new expression is validated when you press SAVE button, and errors, if there are any, are displayed.
Tip
Alternatively you can directly click Expert View tab ignoring the Simple View. If you add in the RULE text area any expression not supported by the Simple View this tab will be disabled. For an extended guide on the expressions available in this internal language refer to DDC GLASS Reference.
To configure the rules for your new information type, click to expand the Add Rules menu in the Simple View tab and select one of the following types:
Character: Search for one or more specific characters as specified in the Select Rule menu. If the character is found, the location will be returned as a match. For a list of available character type rules, refer to "Character Type Rules Explained".
Use the From and To, controls to set the number of consecutive occurrences of the selected character.
Phrase: Search for a specific pattern as defined in the Phrase textbox (in layman terms, it is used to look for specific words). Searching for phrases is case insensitive.
Built-In: Pre-defined infotypes can be used in combination with other types (Character or Phrase). The complete list of built-in information types is available in the appendix Information Types.
Note
Currently, you cannot use the Indian Passport Number built-in infotype for creating custom infotypes.
Use the Apply button to complete your selection. The selection is displayed in the list of defined infotype rules. You can remove it from the list of rules by clicking the Remove link on its right.
You can use each of these types on their own, or combine them to form a more complex rule, involving multiple types in various configurations. See Examples of Custom Infotypes for some examples.
Due to a known limitation, it is currently not possible to have two built-in infotypes, one after the other. They must be separated by something, such as another type infotype, or even a plain space. For example:
"american express" " " "english name"
Due to a known limitation, when adding a range of characters, all the possible combinations inside the range will appear as matches. For example, for a range of numbers from 1 to 4, when in a document you have a sequence1234
, the search will yield the following matches:"1", "12","123" and "1234"
Note
When you introduce spaces at the beginning or the end of the phrase, the spaces are removed. Also, when you introduce more than one space between words, only one space is considered.
Click Save to save your new infotype. Your new information type has been added and is now listed in the Infotypes screen, and marked 'Custom' in the Type column.
Character Type Rules Explained
Specific predefined characters are used to create custom infotypes using character based rules. They are explained below:
Rule | Expert View Keyword | Match |
---|---|---|
Space | SPACE | Any white-space character. |
Horizontal space | HSPACE | Tab characters and all Unicode "space separator" characters. |
Vertical space | VSPACE | All Unicode "line break" characters. |
Any | BYTE | Wildcard character that will match any character. |
Alphanumeric | ALNUM | ASCII numerical characters and letters. |
Alphabet | LETTER | ASCII alphabet characters. |
Digit | DIGIT | ASCII numerical characters. |
Printable | PRINTABLE | Any printable character. |
Printable ASCII only | PRINTABLEASCII | Any printable ASCII character, including horizontal and vertical white-space characters. |
Printable non-alphabet | PRINTABLENONALPHA | Printable ASCII characters, excluding alphabet characters and including horizontal and vertical white-space characters. |
Printable non-alphanumeric | PRINTABLENONALNUM | Printable ASCII characters, excluding alphanumeric characters and including horizontal and vertical white-space characters. |
Graphic | GRAPHIC | Any ASCII character that is not white-space or control character. |
Same line | SAMELINE | Any printable ASCII character, including horizontal white-space characters but excluding vertical white-space characters. |
Non-alphanumeric | NONALNUM | Symbols that are neither a number nor a letter; e.g. apostrophes ‘, parentheses (), brackets [], hyphens -, periods ., and commas ,. |
Non-alphabet | NONALPHA | Any non-alphabet characters; e.g. ~ ` ! @ # $ % ^ & * ( ) _ - + = |
Non-digit | NONDIGIT | Any non-numerical character. |
Examples of Custom Infotypes
Example 1
You want to search for a "Driver License Number" from Illinois, whose format is M532-4218-1341
. You would then create the following rule:
Character Alphabet From 1 to 1 Times
Character Digit From 3 to 3 Times
Phrase -
Character Digit From 4 to 4 Times
Phrase -
Character Digit From 4 to 4 Times
The above example will have the following syntax in the Expert View:
RANGE LETTER TIMES 1-1
THEN RANGE DIGIT TIMES 3-3
THEN WORD NOCASE '-'
THEN RANGE DIGIT TIMES 4-4
THEN WORD NOCASE '-'
THEN RANGE DIGIT TIMES 4-4
In this rule, the expression " Character Alphabet From 1 to 1 Times " means that you only expect one alphabetic character, the expression " Character Digit From 3 to 3 Times " means that you expect exactly three digits, and the expression " Phrase -" means that you expect to find a hyphen (-) in the sequence.
Example 2
You want to search for a name and last name separated by a number of spaces between 1 and 3. To that end you would create the following rule:
Phrase John
Character Space From 1 to 3
Phrase Gordon
The above example will have the following syntax in the Expert View:
WORD NOCASE 'John'
THEN RANGE SPACE TIMES 1-3
THEN WORD NOCASE 'Gordon'
This rule will allow you to search for a combination of John
and Gordon
with one through three spaces between them. By comparison, using the rule PhraseJohn Gordon
will only allow you to search for a combination John Gordon
, with only one space. Any additional spaces in the phrase will be truncated.
Example 3
You want to search for the Spanish NIE (foreigner's identity number) preceded by the phrase NIE:
and a number of spaces between 0 and 5. For example, NIE X8691474Q
.
Phrase NIE
Character Space From 0 to 5
Built-in Spanish NIE
The above example will have the following syntax in the Expert View:
INCLUDE 'DEFINE_NID'
WORD NOCASE 'NIE'
THEN RANGE SPACE TIMES 0-5
THEN REFER 'NID_SPAIN_NIE'
This rule will find both NIE X8691474Q
and nie x8691474q
since searching (regardless of the type - Character, Phrase, or Built-in) is case insensitive.