Your suggested change has been received. Thank you.

close

Suggest A Change

https://thales.na.market.dpondemand.io/docs/dpod/services/kmo….

back

DDC Administration

Managing Information Types

search

Please Note:

Managing Information Types

An information type (infotype) categorizes data to look for during a scan. A large number of prebuilt information types are available to better categorize the data.

Different regions and countries can have different regulatory requirements, so these information types are categorized based on geographical regions. These regions are Global, Africa, Americas, Asia, Europe, and Oceania. The information types can be further categorized into:

  • Financial: Financial data such as credit card numbers and bank account details.

  • Personal Data: Personal data such as age, gender, race, and religion.

  • Medical: Medical data such as history of medical problems and disabilities.

  • National ID: National identity documents such as Social Security Number (SSN).

For a list of all available predefined information types, refer to the appendix Information Types.

also allows you to create custom information types. For more information, see Creating Custom Infotypes.

Creating Custom Infotypes

You can create a custom information type, if you require one. This can be achieved from the Infotypes screen. To access it, click Settings and then Infotypes in the sidebar on the left.

  1. Click the +Add Infotype button in the top right corner of the Infotypes screen. The Add Infotype wizard is displayed.

    1. In the General Info step of the wizard, provide the following information for your new infotype:

      • Name: Choose a name for your infotype.

      • Category: Select a category to which your infotype belongs (Financial, Personal Data, Medical, or National ID).

      • Family: Select a family for your infotype. A family is a subcategory inside the Category and the choice of options depends on what you selected in the Category menu. The following families are available inside their corresponding categories:

        Financial: Credit/Debit Cards; Bank Account Info

        Medical: Patient Health Data

        National ID: Personal Identification

        Personal Data: Email addresses; Login credentials; Card Number; Ethnicity; License Number; Roll Number; Passport Number; Date Of Birth; MAC Address; Mailing Address; Telephone Number; Gender; Religion; IP Address; Phone Number; Name

      • Region: Select the region for your infotype (Global, Africa, Americas, Asia, Europe, and Oceania).

      Click Next to go to the next step of the wizard.

    2. In the Infotype Definition step of the wizard, you configure the rules for your new information type. You can start by configuring the rules in the Simple View tab, and then you can view or edit these rules as translated into an expression of the internal language of the Data Discovery and Classification engine in the Expert View tab. Each new expression is validated when you press SAVE button, and errors, if there are any, are displayed.

      Alternatively you can directly click Expert View tab ignoring the Simple View. If you add in the RULE text area any expression not supported by the Simple View this tab will be disabled. For an extended guide on the expressions available in this internal language refer to DDC GLASS Reference.

      To configure the rules for your new information type, click to expand the Add Rules menu in the Simple View tab and select one of the following types:

      • Character: Search for one or more specific characters as specified in the Select Rule menu. If the character is found, the location will be returned as a match. For a list of available character type rules, refer to "Character Type Rules Explained".

        Use the From and To, controls to set the number of consecutive occurrences of the selected character.

      • Phrase: Search for a specific pattern as defined in the Phrase textbox (in layman terms, it is used to look for specific words). Searching for phrases is case insensitive.

      • Built-In: Pre-defined infotypes can be used in combination with other types (Character or Phrase). The complete list of built-in information types is available in the appendix "Information Types".

      Use the Apply button to complete your selection. The selection is displayed in the list of defined infotype rules. You can remove it from the list of rules by clicking the Remove link on its right.

      You can use each of these types on their own, or combine them to form a more complex rule, involving multiple types in various configurations. See "Examples of Custom Infotypes" for some examples.

    Due to a known limitation, it is currently not possible to have two built-in infotypes, one after the other. They must be separated by something, such as another type infotype, or even a plain space. For example:
    "american express" " " "english name"
    Due to a known limitation, when adding a range of characters, all the possible combinations inside the range will appear as matches. For example, for a range of numbers from 1 to 4, when in a document you have a sequence 1234, the search will yield the following matches:
    "1", "12","123" and "1234"
    When you introduce spaces at the beginning or the end of the phrase, the spaces are removed. Also, when you introduce more than one space between words, only one space is considered.

  2. Click Save to save your new infotype. Your new information type has been added and is now listed in the Infotypes screen, and marked 'Custom' in the Type column.

Character Type Rules Explained

Specific predefined characters are used to create custom infotypes using character based rules. They are explained below:

RuleExpert View KeywordMatch
SpaceSPACEAny white-space character.
Horizontal spaceHSPACETab characters and all Unicode "space separator" characters.
Vertical spaceVSPACEAll Unicode "line break" characters.
AnyBYTEWildcard character that will match any character.
AlphanumericALNUMASCII numerical characters and letters.
AlphabetLETTERASCII alphabet characters.
DigitDIGITASCII numerical characters.
PrintablePRINTABLEAny printable character.
Printable ASCII onlyPRINTABLEASCIIAny printable ASCII character, including horizontal and vertical white-space characters.
Printable non-alphabetPRINTABLENONALPHAPrintable ASCII characters, excluding alphabet characters and including horizontal and vertical white-space characters.
Printable non-alphanumericPRINTABLENONALNUMPrintable ASCII characters, excluding alphanumeric characters and including horizontal and vertical white-space characters.
GraphicGRAPHICAny ASCII character that is not white-space or control character.
Same lineSAMELINEAny printable ASCII character, including horizontal white-space characters but excluding vertical white-space characters.
Non-alphanumericNONALNUMSymbols that are neither a number nor a letter; e.g. apostrophes ‘, parentheses (), brackets [], hyphens -, periods ., and commas ,.
Non-alphabetNONALPHAAny non-alphabet characters; e.g. ~ ` ! @ # $ % ^ & * ( ) _ - + =
Non-digitNONDIGITAny non-numerical character.

Examples of Custom Infotypes

Example 1

You want to search for a "Driver License Number" from Illinois, whose format is M532-4218-1341. You would then create the following rule:

Character Alphabet From 1 to 1 Times

Character Digit From 3 to 3 Times

Phrase -

Character Digit From 4 to 4 Times

Phrase -

Character Digit From 4 to 4 Times

The above example will have the following syntax in the Expert View:

RANGE LETTER TIMES 1-1
THEN RANGE DIGIT TIMES 3-3
THEN WORD NOCASE '-'
THEN RANGE DIGIT TIMES 4-4
THEN WORD NOCASE '-'
THEN RANGE DIGIT TIMES 4-4

In this rule, the expression " Character Alphabet From 1 to 1 Times " means that you only expect one alphabetic character, the expression " Character Digit From 3 to 3 Times " means that you expect exactly three digits, and the expression " Phrase -" means that you expect to find a hyphen (-) in the sequence.

Example 2

You want to search for a name and last name separated by a number of spaces between 1 and 3. To that end you would create the following rule:

Phrase John

Character Space From 1 to 3

Phrase Gordon

The above example will have the following syntax in the Expert View:

WORD NOCASE 'John'
THEN RANGE SPACE TIMES 1-3
THEN WORD NOCASE 'Gordon'

This rule will allow you to search for a combination of John and Gordon with one through three spaces between them. By comparison, using the rule PhraseJohn Gordon will only allow you to search for a combination John Gordon, with only one space. Any additional spaces in the phrase will be truncated.

Example 3

You want to search for the Spanish NIE (foreigner's identity number) preceded by the phrase NIE: and a number of spaces between 0 and 5. For example, NIE X8691474Q.

Phrase NIE

Character Space From 0 to 5

Built-in Spanish NIE

The above example will have the following syntax in the Expert View:

INCLUDE 'DEFINE_NID'
WORD NOCASE 'NIE'
THEN RANGE SPACE TIMES 0-5
THEN REFER 'NID_SPAIN_NIE'

This rule will find both NIE X8691474Q and nie x8691474q since searching (regardless of the type - Character, Phrase, or Built-in) is case insensitive.