Your suggested change has been received. Thank you.

close

Suggest A Change

https://thales.na.market.dpondemand.io/docs/dpod/services/kmo….

back

Reference

DDC GLASS Reference

search

Please Note:

DDC GLASS Reference

This document serves as a guide on the syntax, operators and rules to observe when writing GLASS TechnologyTM (GLASS) expressions using the Expression Editor. This is not a comprehensive GLASS guide and only covers basic GLASS operators. Please contact Thales Technical Support if you have more detailed questions about GLASS.

BYTE LEVEL OPERATION

The scanning engine evaluates scan data in octets. This means the engine has the ability to look for any byte within the data stream that passes through the scanning engine.

GLASS SYNTAX

You have to follow several basic rules when defining GLASS expressions:

  1. An expression is a combination of operators and values which is terminated by a new line.

    For readability, a single expression can be split across multiple lines by ending a line with a backslash \ character.

    The example below forms a single expression:

    1
    2
    WORD "Foo" THEN \
    RANGE "0-9" TIMES 4
    
  2. Operators and values are separated by one or more blank spaces.

    • Operators are keywords that describe what actions to perform.

    • Values are literals or integers.

  3. Blank lines in the Expression Editor are ignored by the compiler.

  4. A comment is anything that follows a hash # character and will be ignored by the compiler. Comments can start at the beginning or in the middle of a line.

CHARACTER ENCODING

You must write GLASS expressions in ASCII or UTF-8 notation only. The engine operates at byte level, so any expression that is UTF-8 encoded can be matched to the corresponding octets if they are present within the input data stream to the scanning engine.

Writing custom GLASS expressions using anything other than ASCII or UTF-8 encoding will yield unexpected results.

For example, the word "world" is written as "Мир" (/mir/) in Russian. Example 1 and Example 2 are two ways to define the GLASS expression to search for the phrase "Hello, Мир!" using the WORD operator.

Example 1

UTF-8 encoded expression for "Hello, Мир!".

1
WORD 'Hello, Мир!'

Example 2

ASCII encoded expression for "Hello, Мир!" specifying a UTF-8 encoded octet sequence.

1
WORD 'Hello, \xd0\x9c\xd0\xb8\xd1\x80!'

LITERALS AND INTEGERS

You can define GLASS expressions. These values can be in the form of literals or integers.

LITERALS

Description

Literals are defined as a string of characters that are surrounded by matching single quotes '' or double quotes " ". These characters must be in ASCII or UTF-8 encoding only.

"Search for this pattern"
'Look for this pattern too'

Certain literal characters have a special meaning when preceded by a backslash \ character:

Escaped LiteralEscaped Literal MeaningASCII Code
\tHorizontal tab0x09
\nNew line0x0A
\vVertical tab0X0B
\fForm feed0X0C
\rCarriage return0X0D
\"Literal double quote character0x22
\'Literal single quote character0x27
\\Literal backslash character0x5C
\xHHThe two characters HH following \x will be taken as hexadecimal values of a character-

Example 3

1
WORD "First Phrase\nSecond Phrase\n"

For Example 3, the engine searches for a single pattern consisting of the strings "First Phrase" and "Second Phrase" separated by a new line, followed by a new line at the end of "Second Phrase".

First Phrase
Second Phrase

INTEGERS

Integers are defined as a string of ASCII digits in the inclusive range of 0-9. You can use the underscore _ character to separate the digits for readability. For example, if you use the underscore _ character as a thousands separator, the GLASS expressions are simpler to read.

Example 4

12345
12_345
1_2_3_4_5
1234_5_

In Example 4, the integers from line 1 to line 4 are all equivalent. The GLASS engine will process all 4 representations as 12345.

Certain operators in GLASS require positive or negative integers to be specified explicitly. By default, integers are always positive unless a sign is provided.

To explicitly express a positive or negative integer, prepend:

  1. The plus + sign (ASCII 0x2B) for positive integers.

  2. The minus - sign (ASCII 0x2B) for positive integers.

For more information, see Operators.

OPERATORS

Operators are functions that can be used in GLASS expressions to instruct the engine to perform a specific action. All operators are left associative and case insensitive.

For readability, it is recommended to use uppercase letters to specify operators within GLASS expressions.

GLASS operators can be grouped by function:

  1. Primary pattern generators

    • Used to produce matching patterns.

    • Operators: WORD, RANGE

    • Advanced operators: MAP & GROUP

  2. Secondary pattern generators

    • Used to combine the results of primary pattern generators to be processed by any expression that follows.

    • Operators: ( ) , THEN, OR (evaluated in this order)

  3. Pattern modifiers

    • Used with pattern generators to provide flexibility in what patterns to match.

    • Operators: TIMES, BOUND

WORD

Description

Search for a specific pattern as defined by the <literal>. If the pattern is found, the location will be returned as a match.

Matches can happen anywhere in a stream of bytes and are not limited to the traditional word boundaries only.

Syntax

WORD [NOCASE] <literal>

Literals are case sensitive. You can use the NOCASE keyword to instruct the engine to be case insensitive when searching for matching patterns.

Example 5

The expression below searches for the string "Foo".

1
WORD "Foo"

Based on Example 5, all the following lines will be marked as match locations:

Foo
FooBar
BazFoo
BazFooBar

Example 6

The expression below searches for the string "HELLO world".

1
WORD NOCASE "HELLO world"

Based on Example 6, the following lines will be marked as match locations:

hello world
HELLO WORLD
HeLlO wOrLd

RANGE

Description

Search for one or more specific characters defined by the <literal>. If the character is found, the location will be returned as a match.

There are several rules when defining literals that you can use with the RANGE operator:

  1. Using the hyphen - between two characters instructs the GLASS engine to include all values between both characters. For example,

    1
    2
    RANGE "0-9"
    RANGE "a-z"
    

    Line 1 matches the following characters: 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. Line 2 matches all lowercase characters in the alphabet from a to z.

  2. Characters can be defined using hexadecimal values. For example,

    1
    RANGE "\x41-\x5A"
    

    0x41 and 0x5A are the hexadecimal representations of the uppercase ASCIIcharacters A and Z respectively. Therefore, line 1 matches all uppercase characters in the alphabet from A to Z.

  3. The caret ^ symbol before a literal instructs the GLASS engine to match all characters that are not defined in the RANGE. For example,

    1
    RANGE "^0-9"
    

    Line 1 matches all characters except the ASCII digits from 0 to 9.

  4. Literals are case sensitive. The NOCASE keyword can be used to instruct the engine to be case insensitive when searching for matching characters.

    1
    2
    RANGE "aBc"
    RANGE NOCASE "abc"
    

    Line 1 matches only the lowercase characters a and c, as well uppercase B. Line 2 matches the characters a, A, b, B, c and C.

Syntax

RANGE [NOCASE] <literal>

Example 7

1
2
RANGE "a-zA-Z"
RANGE NOCASE "a-z"

Both line 1 and line 2 matches all lowercase and uppercase characters in the ASCII alphabet set.

Keywords

There are several predefined keywords representing common character sets that can be used to replace the <literal> value. Keywords are case insensitive.

KeywordDescriptionLiteral Characters
SPACEMatches any ASCII whitespace characters like blank space, horizontal tab, new line, vertical tab, form feed and carriage return"\t\n\v\f\r"
BYTEMatches any byte within the ASCII 0x00 to 0xFF range"\x00-\xFF"
ALNUMMatches any ASCII alphanumeric character"a-zA-Z0-9"
LETTERMatches any ASCII alphabet character"a-zA-Z"
DIGITMatches any ASCII numeral"0-9"
PRINTABLEMatches any printable ASCII character including horizontal and vertical whitespace"a-zA-Z0-9\r\n\v\f\t!\"#$%&'()*+,-./:;<=>?@[\]^_`{
PRINTABLENONALPHAMatches any printable ASCII characters excluding alphabet characters and including horizontal and vertical whitespace"0-9\r\n\v\f\t!\"#$%&'()*+,-./:;<=>?@[\]^_`{
PRINTABLENONALNUMMatches any printable ASCII characters excluding alphanumeric characters and including horizontal and vertical whitespace"\r\n\v\f\t!\"#$%&'()*+,-./:;<=>?@[\]^_`{
GRAPHICMatches any ASCII character that is not whitespace or a control character"a-zA-Z0-9!\"#$%&'()*+,-./:;<=>?@[\]^_`{
SAMELINEMatches any printable ASCII character including horizontal whitespace but excluding vertical whitespace"a-zA-Z0-9\r\t!\"#$%&'()*+,-./:;<=>?@[\]^_`{
NONALNUMMatches any character that is not an ASCII alphanumeric character"^a-zA-Z0-9"
NONALPHAMatches any character that is not an ASCII alphabet"^a-zA-Z"
NONDIGITMatches any character that is not an ASCII numeral"^0-9"
LINEMatches any new line or carriage return character"\r\n"

Example 8

1
2
RANGE "a-zA-Z"
RANGE LETTER

Both line 1 and line 2 are equivalent and matches all lowercase and uppercase characters in the ASCII alphabet set.

Example 9

1
2
RANGE "^0-9"
RANGE NONDIGIT

Both line 1 and line 2 are equivalent and matches any character that is not an ASCII numeral.

Example 10

1
RANGE PRINTABLE

Line 1 matches any printable ASCII character.

TIMES

Description

Repeat the preceding expression for N number of times. N can also be specified as a range.

Syntax

WORD <literal> TIMES <integer>[-<integer>]

RANGE <literal> TIMES <integer>[-<integer>]

If only one integer is defined, it will require this exact number of literals.

If two integers are defined, the first one will be the lower limit and the second one the upper limit of literals. Also, there will be as many matches as substrings found.

Example 11

1
WORD "abc" TIMES 2

Example 11 matches any string where the pattern "abc" is repeated exactly two (N=2) times, such as "123abcabc456". Example 11 is interpreted the same way as:

Example 12

1
RANGE DIGIT TIMES 12

Example 12 matches any string consisting of twelve (N=12) consecutive ASCII numerals, such as "123456789012".

Example 13

1
RANGE DIGIT TIMES 16-18

Example 13 matches any string of 16, 17 or 18 consecutive digits, such as:

abc1234567812345678
012345678012345678xyz

The example 13 has 4 matches:

  • abc1234567812345678

  • 012345678012345678xyz

  • 012345678012345678xyz

  • 012345678012345678xyz

THEN

Description

Use THEN to combine two or more expressions that must be matched consecutively.

Syntax

<expression> THEN <expression>

Example 14

1
WORD "HELLO" THEN RANGE "!.?" THEN WORD " I'm here."

Example 14 matches any string that contains "HELLO" followed immediately by the "!", "." or "?" character, and then followed by the phrase " I'm here.", as below:

HELLO! I'm here.
HELLO. I'm here.
HELLO? I'm here.

Example 15

1
RANGE "0-9" TIMES 4 THEN WORD "abc"

Example 15 matches any string that contains four consecutive ASCII numerals followed immediately by the pattern "abc", such as:

1111abc
9876abc

OR

Description

Use OR to combine two expressions that can be matched on either side of the OR operator.

Syntax

<expression> OR <expression>

Example 16

1
WORD "personal details" OR WORD "personal information"

Example 16 matches any string that contains either "personal details" or "personal information".

For example, the underlined sections in line 1 and line 2 will be marked as match locations:

This file contains my personal details.
Search for any folder containing personal information.

Example 17

1
RANGE "0-9" TIMES 4 OR WORD "abcd"

Example 17 matches any string that contains four consecutive ASCII numerals or the pattern "abcd".

BOUND

Description

Use the BOUND operator to set specific rules or delimiters on how a pattern must match to the left, right or both sides of the pattern to be marked as a valid match.

The boundary for search patterns can be a specific character or range of characters. You can also use the BOUND operator to check if a pattern occurs at the beginning or end of a file to be marked as a match.

  1. The pattern "abc" must be preceded by a colon :.

    1
    WORD "abc" BOUND LEFT ":"
    
  2. The pattern "abc" must be surrounded at both ends by only non-alphanumeric characters.

    1
    WORD "abc" BOUND NONALNUM
    
  3. The pattern "abc" must occur at the beginning of a file (BOF) stream.

    1
    WORD "abc" BOUND BOF
    

Syntax

<pattern / expression> BOUND [LEFT|RIGHT] <range of characters>

<pattern / expression> BOUND BOF|EOF

The boundary for search patterns can be set up using various keywords:

KeywordDescription
<pattern/expression> BOUND <range of characters>Match the same <range of characters> on both sides, surrounding the <pattern / expression>.
<pattern/expression> BOUND LEFT <range of characters>Match a <range of characters> on the LEFT side, just before the <pattern / expression>.
<pattern/expression> BOUND RIGHT <range of characters>Match a <range of characters> on the RIGHT side, just after the <pattern / expression>.
<pattern/expression> BOUND LEFT <range of characters> BOUND RIGHT <range of characters>Match a <range of characters> on both sides, surrounding the <pattern / expression>.
<pattern/expression> BOUND BOFMatch a <pattern / expression> that is found at the start of a file.
<pattern/expression> BOUND EOFMatch a <pattern / expression> that is found at the end of a file.

For BOUND, BOUND LEFT, and BOUND RIGHT operators, it is possible to set a number of bytes, indicating how many bytes (before and/or after) the <pattern/expression> will be searched for the <range of characters>. For example: WORD “abc” BOUND NONALNUM WITHIN 32 BYTES.

Example 18

1
WORD "End of internet." BOUND EOF

Example 18 instructs the engine to check that the pattern "End of internet." appears at the end of a stream to be considered a match.

Example 19

1
RANGE DIGIT TIMES 4 BOUND NONDIGIT

Example 19 instructs the engine to search for a sequence of four consecutive ASCII numerals that are bounded by non-digit characters on either side of the four-digit sequence.

Based on Example 19, the sections in line 1 and line 2 below will be marked as match locations as the four-digit sequences are bounded by whitespace, brackets and comma characters.

1234 5678 A1234
1111,2222{3333}[4444]
123456

Line 3 contains three sets of four-digit sequences: "1234", "2345" and "3456". However, these will not be marked as match locations as they do not fulfil the BOUND conditions.

Example 20

1
RANGE DIGIT TIMES 4

If the BOUND operator from Example 19 is removed as shown in Example 20, line 3 that contains the string "123456" would now be marked with three matches: "1234","2345" and "3456".

PARENTHESIS

Description

The parenthesis ( ) are operators to combine a number of expressions into a single logical statement, or to alter the precedence of operations. You can also use parentheses to clearly show the precedence of operations in complicated expressions.

Expressions contained within parentheses are evaluated first.

Syntax

( <expression> )

Example 21

1
2
WORD "Folder" OR WORD "File" THEN RANGE DIGIT
WORD "Folder" OR (WORD "File" THEN RANGE DIGIT)

In Example 21, both expressions on line 1 and line 2 are equivalent. The expressions match any string containing the pattern "Folder", or any string containing "File" followed immediately by a single digit from 0 to 9.

The parentheses in line 2 does not change any operation precedence; it is only used to explicitly show how the expression is parsed by the GLASS engine.

Based on Example 21, the underlined sections in line 1 and line 2 will be marked as match locations:

Folder 1 contains sensitive data.
Personal details found in File9.

Example 22

1
(WORD "Folder" OR WORD "File") THEN RANGE "0-9"

Example 22 uses parentheses to change how the expression is parsed by the GLASS engine. The expression now matches the pattern "Folder" or "File", followed immediately by a single digit from 0 to 9.

Based on Example 22, the underlined sections in line 1 and line 2 will be marked as match locations:

Folder1 contains sensitive data.
Personal details found in File9.

MAP

Description

A MAP defines a list of words for future reference.

Syntax

MAP [NOCASE] 'MAP_NAME' 'ITEM_1' [, 'ITEM_2', ..., 'ITEM_N']

Example 23

1
2
MAP 'VISA_KEYWORDS' \
'Visa', 'Visa Card Number', 'Visa Number', 'Visa card', 'Visa CC'

Example 23 defines a list of keywords containing Visa, Visa Card Number, Visa Number, Visa card, and Visa CC.

GROUP

Description

A GROUP refers to a list previously created with MAP.

Syntax

GROUP 'MAP_NAME'

Example 24

1
2
3
4
MAP 'VISA_KEYWORDS' \
'Visa', 'Visa Card Number', 'Visa Number', 'Visa card', 'Visa CC'
GROUP NOCASE 'VISA_KEYWORDS' \
THEN RANGE DIGIT TIMES 16 BOUND NONALNUM

Example 24 uses the list from previous example by requiring a 16-digit number to be after any keyword from the MAP by combining THEN operator and GROUP operator.

SAMPLE GLASS EXPRESSIONS

In this section, we will take a look at some real-world GLASS expressions to help you get started with writing your custom infotypes.

SEARCH FOR SEVEN DIGIT CLUB MEMBERSHIP ID

Requirements

  1. Search for club membership ID which consists of seven consecutive digits. No restrictions on the first and last digit of the membership ID.

  2. Seven consecutive digits must not be contained within a string of alphanumeric characters. No other restrictions on pattern boundaries.

Solution

Step 1

1
RANGE DIGIT TIMES 7

Step 1 starts with the most basic requirement of the club membership ID, which is a string of seven consecutive ASCII numerals.

Step 2

Based on the second requirement, the seven consecutive digits must not be contained within any string of alphanumeric characters. This means the boundaries on each side of the membership ID can be any character except ASCII alphabets and numerals.

1
RANGE DIGIT TIMES 7 BOUND NONALNUM

In Step 2 the BOUND operator is used to reduce false matches by specifying the boundaries on each side of the seven-digit club membership ID.

Example 25

Membership ID: 1012345
ID123456789
Name,Sherlock Holmes,ID,2023456,Email,sherlock@example.com

Based on the expression from Step 2, only the underlined sections in line 1 and line 2 will be marked as match locations.

SEARCH FOR COMPANY EMAIL ADDRESSES

Requirements

  1. Search for company email addresses with the format <mailbox>@example.com.

  2. The maximum length of a mailbox name is limited to 64 ASCII characters.

  3. Valid email addresses can only start with ASCII alphabets but may contain a combination of alphabets, numerals and in the mailbox name.

  4. Email addresses should be bounded only by non-alphanumeric characters.

Solution

Step 1

1
WORD "@example.com"

Step 1 starts with the most straightforward expression to match the domain "example.com".

Step 2

1
RANGE ALNUM TIMES 1-64 THEN WORD "@example.com"

In Step 2, the ALNUM keyword and TIMES operator are used to limit the range of allowed characters along with the maximum length of the mailbox name.

Step 3

1
2
(RANGE LETTER) THEN (RANGE ALNUM TIMES 1-63) THEN \
(WORD "@example.com")

Based on the third requirement, mailbox names can only start with ASCII alphabets. RANGE LETTER limits the first character of the mailbox name to an ASCII alphabet, followed by up to 63 alphanumeric characters, and ending with "@example.com".

The parentheses ( ) are not compulsory but are added for readability.

Step 4

1
2
((RANGE LETTER) THEN (RANGE ALNUM TIMES 1-63) THEN \
(WORD "@example.com")) BOUND NONALNUM

Step 4 uses the BOUND operator to reduce false matches by specifying the boundaries on each side of the company email addresses.

The outermost parentheses ( ) are used to apply the BOUND operator to all the expressions within that set of parentheses. Without them, the BOUND operator would only apply to the WORD that preceded it.

Example 26

Employee1,employee1@example.com,Marketing
Email: employee1@example.com
employee1@example.comemployee2@example.com
123@employee.com

Based on the expression from Step 4, only the underlined sections in line 1 and line 2 will be marked as match locations.