Appendix
Scan Filters
This section provides you with information on scan filters in various data stores, syntax details and examples of their usage. ("N/S" stands for "not supported". Numbers used in the parameters section are provided as examples.)
Files
LocalStorage - Linux
Exclude location by prefix
root directory: / (Supported but not recommended)
1 folder:
/folder1 - Exclude all the folders starting by /folder1 like /folder1, /folder12, /folder123 and their content
/folder1/ - Exclude all the folders starting by the name /folder1/ and their content like /folder1/file.txt, or /folder1/myfile.txt.
more folders:
/folder1/folder2/ - Exclude all the folders starting by /folder1
/folder2/ like /folder1/folder2/file3.txt, or /folder1/folder2/folder3/myfile.txt
files:
folder1/file.txt - Exclude all the files with name file and extension txt and located in a folder with name folder1
Exclude location by suffix
root directory: N/S
1 folder:
myfolder - Exclude all the directories with that name and their content. It is equivalent to */myfolder
myfolder/*.txt - Exclude all the txt files in the folders with name myfolder
more folders:
/folder1/folder2/ - Exclude the folder with name folder2 located in parent folder named folder1
files:
myfolder/*.pdf - Exclude files with extension pdf: myfolder/*.pdf
myfolder/otherDir/myfile.txt - Exclude a specific file: myfolder/otherDir/myfile.txt
myfolder/subfolder/*.txt - Exclude all txt files located in that folder and subfolder. Subfolders can be used to improve the filtering.
Exclude locations by expression
root directory: / (Supported but not recommended)
1 folder:
*/folder1/* - Exclude all folders named folder1 and their content
more folders:
/folder1/folder - Exclude all folders in folder 1 whose name is folder (?) like /folder1/folder (1), /folder1/folder (2) or /folder1/folder (a))
files:
*file.txt - Exclude the elements matching with the expression like /myfolder/file.txt, /myfolder/myfile.txt, /myfolder/otherfolder/file.txt, /myfolder/otherfolder/mysensitivefile.txt
*/myfile.txt - Exclude the elements matching with the expression like /myfolder/myfile.txt or /myfolder/otherfolder/myfile.txt (but NOT /myfolder/file.txt or /myfolder/otherfolder/mysensitivefile.txt)
Include locations within modification date
parameters:
toDate: "2021-05-30"
fromDate: "2021-08-01"
Include files whose modification date is between that dates
Include locations modified recently
parameters:
- days: 5 - Include files in the 5 previous days modified, even though there is a different filter excluding them
Exclude locations greater than file size
parameters:
- size: 20 (size in MB) - Exclude the files whose size in MB is 20 or greater than 20 MB
LocalStorage - Windows
Exclude location by prefix
root directory: C:\ - Exclude the volume C:\
1 folder:
*\my_folder - Exclude all the folders with name my_folder and their content
more folders:
*\my_folder\my_subfolder - Exclude all subfolders with name my_subfolder located in a folder with name my_folder and their content
files:
C:\mydir\ - Exclude all files located in C:\mydir\ that begins with name file like file1.txt, file2.pdf*, etc.
Exclude location by suffix
root directory: C:\ (Supported but not recommended)
1 folder:
my_folder - Exclude all the folders with name my_folder and their content
more folders:
my_folder\my_subfolder - Exclude all the folders with name my_subfolder located in a folder with name my_folder
files:
.txt - Exclude all files with extension txt
Exclude locations by expression
root directory: C:\ - Exclude a whole volume C:\*
1 folder:
*\my_folder - Exclude all folders named my_folder an their content
more folders:
my_folder\my_subfolder - Exclude all the folders with name my_subfolder located in a folder with name my_folder
files:
*.txt - Exclude all files with extension .txt
Include locations within modification date
parameters:
toDate: "2021-05-30"
fromDate: "2021-08-01"
Include files whose modification date is between that dates
Include locations modified recently
parameters:
- days: 5 - Include files in the 5 previous days modified, even though there is a different filter excluding them
Exclude locations greater than file size
parameters:
- size: 20 (size in MB) - Exclude the files whose size in MB is 20 or greater than 20 MB
SMB
Exclude location by prefix
root directory: N/S
1 folder:
<sharename>\\sambafolder or *\sambafolder
more folders:
*folder\subfolder
files:
*\sambafolder\file.txt
Exclude location by suffix
root directory: N/S
1 folder:
myfolder - Exclude folder called :
\\samba
\\exclude_suffixmore folders:
myfolder\mysubfolder
files:
myfolder\mysubfolder\file.txt
Exclude locations by expression
root directory: N/S
1 folder:
*my?folder - Exclude folder called \samba\my_folder, also works for my-folder, etc
more folders:
*my_folder\my_subfolder
files:
*my_folder\my_subfolder
Include locations within modification date
parameters:
toDate: "2021-05-30"
fromDate: "2021-08-01"
Include locations modified recently
parameters:
- days: 5
Exclude locations greater than file size
parameters:
- size: 20 (size in MB)
NFS
Exclude location by prefix
root directory: N/S
1 folder: /mnt/nfs/myfolder or *myfolder - Exclude folder with name myfolder and all the subfolder contained
Note
The mount point for NFS file system in this example is:/mnt/nfs
more folders: /mnt/nfs/myfolder/mysubfolder or *myfolder/mysubfolder
files:
/myfolder/myfile.txt - Exclude txt file with that name located in that folder
Exclude location by suffix
root directory: N/S
1 folder:
myfolder - Exclude folder named myfolder
more folders:
myfolder/mysubfolder - Exclude folder myfolder/mysubfolder
files:
myfolder/myfile.xlsx - Exclude that file in myfolder with that extension
Exclude locations by expression
root directory: *
1 folder:
*my?folder - Exclude folder called my_folder, my-folder, etc
more folders:
myfolder/mysubfolder - Exclude folder myfolder/mysubfolder
files:
myfolder/myfile.xlsx - Exclude that file in myfolder with that extension
Include locations within modification date
parameters:
toDate: "2021-05-30"
fromDate: "2021-08-01"
Include locations modified recently
parameters:
- days: 5
Exclude locations greater than file size
parameters:
- size: 20 (size in MB)
Hadoop
Exclude location by prefix
root directory: N/S
1 folder:
*/my_data - Exclude all the directories with name my_data and their content
more folders:
*/my_data/subdir - Exclude all directories with name subdir wich parent directoriy is named my_data
files:
/my_data/file.ppt - Exclude file with that name and extension located in the directory my_data
Exclude location by suffix
root directory: N/S
1 folder:
mydir - Exclude all directories with name mydir, my_directory, etc, and their content
mydir/ - Exclude all directories with name mydir and their content
more folders:
mydir/subdir - Exclude all directories with name subdir, subdirectory, etc, and their content which are located in a directory with name mydir
mydir/subdir/ - Exclude directory with name subdir and their content which is located in a directory with name mydir
files:
*.pptx - Exclude all files with extension pptx
Exclude locations by expression
root directory: N/S
1 folder:
*/mydir - Exclude all directories with name mydir and their content
more folders:
*/mydir/subdir - Exclude directory with name subdir and their content which is located in a directory with name mydir
files:
*/mydir/My_file.pptx - Exclude the file with that name and extension located in a directory with name mydir
Include locations within modification date
parameters:
toDate: "2021-05-30"
fromDate: "2021-08-01"
Include files whose modification date is between that dates
Include locations modified recently
parameters:
- days: 5 - Include files in the 5 previous days modified, even though there is a different filter excluding them
Exclude locations greater than file size
parameters:
- size: 20 (size in MB) - Exclude the files whose size in MB is 20 or greater than 20 MB
AWS S3
Exclude location by prefix
root directory: Bucket (Sample: asrm-ddcqa*)
1 folder: myfolder
more folders: myfolder/mysubfolder
files: -
Exclude location by suffix
root directory: Bucket
1 folder: Folder
more folders: -
files: -
Exclude locations by expression
root directory: Bucket
1 folder: Folder
more folders: -
files: -
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
Databases
Oracle
Exclude location by prefix
Schema: N/S
Table: HR(SERVICE_NAME=XE):1521/my (You have to include the database, SID and port in the format <database>(SERVICE_NAME=<sid>):<port>/<table-prefix>)
Column: N/S
Exclude location by suffix
Schema: N/S
Table: user (Just add the last characters of the name of the table. * is not needed.)
Column: N/S
Exclude locations by expression
Schema: N/S
Table: HR(SERVICE_NAME=XE):1521/sensitive_expression_data (Add the table at the end of the string: <database>(SERVICE_NAME=<sid>):<port>/<my-table>)
Column: N/S
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
IBM DB2
Note
The format database, port, schema, table and column are required depending on the required level <database>:<port>/<schema>/<table>/<column-prefix>.
Exclude location by prefix
Database: testdb:50000 (Supported but not meaningful to use)
Schema: testdb:50000/DB2IN - Exclude schemas that beggin with DB2IN located in that database
Table: testdb:50000/DB2INST1/SENSITIVE_DATA - Exclude tables that begin with SENSITIVE_DATA located in that database/schema
Column: testdb:50000/DB2INST1/SENSITIVE_DATA/first_ - Exclude columns that begin with first_ located in that database/schema/table
Exclude location by suffix
Database: testdb:50000 (Supported but not meaningful to use)
Schema: _MYSCHEMA - Exclude a schema. * not needed.
Table: MYTABLE - Exclude a table. * not needed.
Column: _name - Exclude a column. * not needed.
Exclude locations by expression
Database: testdb:50000* (Supported but not meaningful to use)
Schema: testdb:50000/OTHER_SCHEMA/*" - Exclude a schema and its content
Table: testdb:50000/DB2INST1/*DATA_EX*" - Exclude all tables that contain DATA_EX in their names
Column: testdb:50000/DB2INST1/DATA_EXPRESSION/*first_name*" Exclude all columns that contain firs_name in its name
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
Microsoft SQL
Note
The asterisk (*) means that the filter matches anything. There is no limitation as to adding the asterisk before a schema or not. For example, the filter can be effective with these two expressions:
testddc:1433/testschema
*/testschema
You are omitting the database+port in the second expression - it means that if there is only one schema named testschema or the scan has only that database added as data store, the expressions are equivalent. If you add more than one mssql databases as data stores in the scan and there are more than one schema named testschema, then the filters are not equivalent because with the second one all testchema schemes are filtered, not in the first expression (only the testschema schema in testddc database).
Exclude location by prefix
Database: testddc:1433
Note
Supported but not useful (the database name is specified when the data store is instanciated so no meaning then to filter out the whole db).
Schema: testddc:1433/testschema
Note
The schema is aways relative to what is called the catalog (ie the database name and the port).
The same filter can be written also as */testschema but using testschema only is not effective as a prefix.
Table: testddc:1433/testschema/mytable
Note
The entire complete path should be specified or optionally omitted using a *. A filter like */mytable has the side effect of being all the tables whose name is mytable being filtered - no matter which schema they belong to.
Column: N/S
Exclude location by suffix
Database: testddc:1433
Note
Supported but not useful (the database name is specified when the data store is instantiated so no meaning then to filter out the whole db).
Schema: testddc:1433/testschema (As it is a suffix, testschema also works with the same effect)
Table: testddc:1433/testschema/mytable
Note
If the table name is unique in all schemas mytable is enough to filter.
If different schemas contain the same tablename then caution should be used.
Column: N/S
Exclude locations by expression
Database: testddc:1433
Note
Supported but not useful (the database name is specified when the data store is instantiated so there is no point filtering out the whole db).
Schema: testddc:1433/testschema
Note
Tthe schema is aways relative to what is called the catalog (that is, the database name and the port). The same filter can be written also as /testschema but testschema* however it is not effective as a prefix.
Table: testddc:1433/testschema/mytable
Note
The entire complete path should be specified or optionally omitted using a *. A filter like */mytable has the side effect of having all the tables whose name is mytable filtered, no matter which schema they belong to.
Column: N/S
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
Azure tables
Note
Azure tables only contains the Table. There are no databases or schemas.
Exclude location by prefix
- Database: N/S
- Schema: N/S
- Table: table (Trailing '*' is mandatory to match row_count)
- Column: N/S
Exclude location by suffix
- Database: N/S
- Schema: N/S
- Table: table* (Trailing '*' is mandatory to match row_count)
- Column: N/S
Exclude locations by expression
- Database: N/S
- Schema: N/S
- Table: table* (Trailing '*' is mandatory to match row_count)
- Column: N/S
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
MySQL
Exclude location by prefix
Database: database
Note
It is not mandatory to include * at the end of the database.
It will ignore the database called database.
Schema: N/S
Note
MySQL only contains the database and tables.
In some cases, for MySQL, database are also refered as schema.
MySQL scan path is database/table.
Table: database/table or *table
Note
It is mandatory to include * at the beginning of the table name.
Replace the trailing * with `database/` if you only want to exclude a table from a specific database.
Column: *testddc/sensitive_data/email (It's mandatory to include * at the beginning.)
Exclude location by suffix
Database: database*
Note
It is mandatory to include * at the end of the database.
You can replace * with `/table` to only ignore a specific table from that database.
Schema: N/S
Table: database/table or table
Note
It will ignore all tables ending with the word table.
Include the full path if you wish to ignore only a table in a specific database.
Column: testddc/sensitive_data/email
Exclude locations by expression
Database: database* (It is mandatory to include * at the end of the database.)
Schema: N/S
Table: database/table or *table (It is mandatory to include * at the beginning of the table name.)
Column: *testddc/sensitive_data/email
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
PostgreSQL
Exclude location by prefix
Database: database (e.g. hr)
Schema: database:port/schema (e.g. hr:5432/prod for excluding the schemas that have the prod prefix or hr:5432/* for excluding all schemas from hr db scan locations)
Table: database:port/schema/table (e.g. hr:5432/hr/prod) or table (e.g. prod), but please take into account that the latter will scan tables with that name in all schemas.
Column: *database:port/schema/table/column* (e.g. *hr:5432/hr/prod/EMAIL*)
Note
It is mandatory to include * at the beginning and end.
Exclude location by suffix
Database: database* (e.g. hr*)
Note
It is mandatory to include * at the end of the database.
Schema: *schema* (e.g. *prod*)
Note
It is mandatory to include * at the beginning and end.
Along with schema it is also excluding table having suffix prod.
Table: database:port/schema/table (e.g. hr:5432/hr/prod) or table (e.g. prod), but please take into account the latter will scan tables with that name in all schemas.
Column: database:port/schema/table/column (e.g. hr:5432/hr/prod/EMAIL)
Exclude locations by expression
Database: database* (e.g. hr*)
Note
It is mandatory to include * at the end.
Schema: database:port/schema/* (e.g. hr:5432/prod/*)
Note
It is mandatory to include * at the end.
Table: database:port/schema/table (e.g. hr:5432/hr/prod)
Column: *database:port/schema/table/column* (e.g. *hr:5432/prod/prod/EMAIL*)
Note
It is mandatory to include * at the beginning and end.
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
NoSQL
Azure blobs
Exclude location by prefix
Account: N/S
Container: ddctest* or ddctest
Blob: ddctest/reuters/reut2-012.sgm or *reut2-012.sgm
Exclude location by suffix
Account: N/S
Container: *my_data*
Note
- */my_data - N/S.
- *my_data - N/S
- */my_data*/ - N/S.
- *my_data* - Supported, but along with the Container it excludes the Blob also having suffix my_data and in report filter type is "Exclude location by expression" not by suffix.
Warning
Container is allowed for exclude_suffix, but you should be careful when using it.
- */my_data - N/S.
Blob:
- my_data
- my_data*
- ddctestblob/reut2*
Exclude locations by expression
Account: N/S
Container: ddctestdata* or *test* (It is mandatory to include * at the end of container name.)
Blob:
- *reut2*
- *.txt
- ddctestblob/reut2*
- *dctestblob/*.txt
- */reut2* (It is mandatory to give a * at the start if you want to exclude a specific blob like reut2-012.sgm would be *reut2-012.sgm)
Include locations within modification date
parameters:
- toDate: "2021-08-04"
- fromDate: "2021-09-24"
Note
Date format is in "YYYY-MM-DD".
toDate and fromDate are inclusive for files that are being scanned.
The files matching include_date_range - it does not have to match include_recent.
Include locations modified recently
parameters:
- date: 2
Note
The files matching include recent doesn't have to match include_data_range.
Exclude locations greater than file size
parameters:
- size: 1 (Size in MB)
MongoDB
Exclude location by prefix
Database: database (It is not mandatory to include * at the end of the database.)
Collection: database/collection* or *collection*
Note
It is mandatory to include * at the beginning and end of the schema.
Leading * matches database while trailing * match the _id.
_id is primary key in mongoDB. ER2 creates an object for each document/record/ row instead of each table.
Fields: N/S
Exclude location by suffix
Database: database* (It is mandatory to include * at the end of the database.)
Collection: database/collection* or collection*
Note
It is mandatory to include * at the end of the database.
The trailing * matches _id.
Fields: N/S
Exclude locations by expression
Databse: database* (It is mandatory to include * at the end of the database.)
Collection: database/collection* or *collection* (It is mandatory to include * at the beginning and end of the schema.)
Fields: N/S
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
SAP Hana
Exclude location by prefix
Database: database
Schema:
database:port/schema_name
database:port/*
Note
It is mandatory to include port if it is different than the default one.
We can replace schema_name with * to exclude all schema's.
Table:
database:port/schema_name/table
database:port/*/table
database:port/*/*
Note
You can replace schema_name with * , to exclude the table from all schemas. Also we can replace the table name with *, to exclude all tables.
Column: *database:port/schema/table/column* (It is mandatory to include * at the beginning and end.)
Exclude location by suffix
Database: database* (It is mandatory to include * at the end of the database.)
Schema: -
Table:
table
database:port/schema/*table
Note
It will ignore all tables ending with the word table.
Include the full path if you wish to ignore only a table in a specific schema and database.
Column:
column
table/column
database:port/schema/table/column
Note
"Pipe" (|) can be used to exclude multiple columns.
Exclude locations by expression
Database: *database* (It is mandatory to add * at the beginning and ending and then only the filter type in the report is considered as "exclude_expression".)
Schema: database:port/schema/* (It is mandatory to include * at the end.)
Table: database:port/schema/table_expression (for example, HXE:39041/*/*cop?)
Column:
*column*
*table/column*
*schema/*/column*
*database:port/schema/table/column*
Note
To exclude the column location for a specific table, you have to add the table name before the column.
To exclude from a specific schema you have to add the schema as shown in example 3.
It is mandatory to include * at the beginning and end.
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
G-Mail
Exclude location by prefix
User: datastorecicduser
Account: - (User and Account seem to be the same thing)
Folder/Label: datastorecicduser/inbox or *inbox
Note
Use lower case for folder names.
Labels are not supported as label id is required to filter.
The second option would scan the inbox of every user.
Question: I am filtering Gmail scan by labels - why am I not getting expected results?
Answer: For the default system labels, Gmail creates some folders that do not match the label name. Please look at Gmail documentation to learn the right labels to filter data objects in a Gmail scan.
File:
datastorecicduser/inbox/mydata filter test data/Mon, 23 Aug 2021 10:54:40 +0530/50-contacts.csv or
datastorecicduser/inbox/mydata filter test data/*/50-contacts.csv or
datastorecicduser/inbox/test filter test data or
datastorecicduser/Label_4800715244570918733/test filter test data/ or
datastorecicduser/*/mydata filter test data/
Note
The second example would be recommened to the user to avoid manully checking email's date and time and converitng it to the required format.
The third option is used if you want to scan a specific email and all its content.
As mentioned in the previous comment, label filters need label id as mentioned in the fourth example.
The user can replace label id with '*', as shown in the fifth example, if he is scanning a specific label and wants to filter a specific email or file.
The fourth and fifth example can be used similarly for exclude_suffix and exclude_expression.
Exclude location by suffix
User: datastorecicduser* (Trailing * matches remaining path.)
Account: -
Folder/Label: datastorecicduser/inbox* or *inbox*
Note
Question: I am filtering Gmail scan by labels - why am I not getting the expected results?
Answer: For the default system labels, Gmail creates some folders that do not match the label name. Please look at Gmail documentation to learn the right labels to filter data objects in a Gmail scan.
File:
datastorecicduser/inbox/mydata filter test data/Mon, 23 Aug 2021 10:54:40 +0530/50-contacts.csv or
datastorecicduser/inbox/mydata filter test data/*/50-contacts.csv or
datastorecicduser/inbox/mydata filter test data*
Exclude locations by expression
User: datastorecicduser*
Account: -
Folder/Label: datastorecicduser/inbox* or *inbox*
Note
Question: I am filtering Gmail scan by labels - why am I not getting the expected results?
Answer: For the default system labels, Gmail creates some folders that do not match the label name. Please look at Gmail documentation to learn the right labels to filter data objects in a Gmail scan.
File:
datastorecicduser/inbox/mydata filter test data/Mon, 23 Aug 2021 10:54:40 +0530/50-contacts.csv or
datastorecicduser/inbox/mydata filter test data/*/50-contacts.csv or
datastorecicduser/inbox/mydata filter test data*
Include locations within modification date
parameters:
toDate: "2021-08-04"
fromDate: "2021-08-24"
Note
Date format is in "YYYY-MM-DD".
toDate and fromDate are inclusive for files that are being scanned.
The files matching include_date_range do not have to match include_recent.
The date when the email was sent/recieved is considered.
Include locations modified recently
parameters:
- days: 2
Note
Days should be between 1 to 99 (both inclusive).
The files matching include_recent do not have to match include_data_range.
The date when the email was sent/received is considered.
Exclude locations greater than file size
N/A
Others
Teradata
Exclude location by prefix
Schema: teradata (You can replace * with a schema to exclude all schemas.)
Table: teradata/sensitive_data or */sensitive_data (You can include * to exclude table from all schemas.)
Column: *teradata/sensitive_data/EMAIL (It is mandatory to include * at the beginning.)
Exclude location by suffix
Schema: teradata*
Table: teradata/sensitive*
Column:
EMAIL
EMAIL | MAC_ADDR
teradata/EMAIL|MAC_ADDR
teradata/sensitive_data/EMAIL
Exclude locations by expression
Schema: *teradata* or teradata* or teradata
Table:
teradata/sensitive_data_replica
*/sensitive_data_replica
Column:
*EMAIL*
*teradata/*/EMAIL*
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
Exchange Online
Exclude location by prefix
Group: All Users
User/Account:
All Users/sample@sjcpl.onmicrosoft.com or
*sample@sjcpl.onmicrosoft.com
Note
The second option would filter out "sample@sjcpl.onmicrosoft.com" user data objects from every group.
Folder: All Users/sample@sjcpl.onmicrosoft.com/inbox or *inbox
Note
Folder name is case-sensitive.
The second option would filter out inbox data objects of every user and group.
Attachment:
All Users/sample@sjcpl.onmicrosoft.com/Inbox/Mail a/2021-02-22T06:40:18Z/maildir-a.zip or
All Users/sample@sjcpl.onmicrosoft.com/Inbox/Mail a/*/maildir-a.zip or
maildir-a.zip or
All Users/sample@sjcpl.onmicrosoft.com/folder_name/subject or
All Users/sample@sjcpl.onmicrosoft.com/Inbox/Mail a/2021-02-22T06:40:18Z or
*subject
Note
The second example would be recommended to the user to avoid manually checking mail's date and time and converting it to required format.
The third option would filter out data objects with attachment maildir-a.zip.
The fourth option is used if you want to filter out a specific mail and all its content with a corresponding subject name.
The fifth and sixth option would filter out data objects with given timestamp and subject name.
Exclude location by suffix
Group: All Users* (You have to use trailing * to exclude given location.)
User/Account: All Users/sample@sjcpl.onmicrosoft.com* or *sample@sjcpl.onmicrosoft.com*
Folder: All Users/sample@sjcpl.onmicrosoft.com/inbox* or *inbox*
Attachment:
All Users/sample@sjcpl.onmicrosoft.com/Inbox/Mail a/2021-02-22T06:40:18Z/maildir-a.zip* or
All Users/sample@sjcpl.onmicrosoft.com/Inbox/Mail a/*/maildir-a.zip* or
*maildir-a.zip*
Exclude locations by expression
Group: All Users* (You have to use trailing * to exclude a given location.)
User/Account: All Users/sample@sjcpl.onmicrosoft.com* or *sample@sjcpl.onmicrosoft.com*
Folder: All Users/sample@sjcpl.onmicrosoft.com/inbox* or *inbox*
Attachment:
All Users/sample@sjcpl.onmicrosoft.com/Inbox/Mail a/2021-02-22T06:40:18Z/maildir-a.zip* or
All Users/sample@sjcpl.onmicrosoft.com/Inbox/Mail a/*/maildir-a.zip* or
*maildir-a.zip*
Include locations within modification date
N/S
Include locations modified recently
N/S
Exclude locations greater than file size
N/S
Sharepoint
N/S
G-Drive
Exclude location by prefix
User: datastorecicduser (This is for a user with email address datastorecicduser@ddc-thalescpl.com)
Folder: datastorecicduser/my drive/some/folder (The path should be in lower case.)
File: datastorecicduser/my drive/some/folder/file.ext
Exclude location by suffix
User: datastorecicduser* (Trailing * matches the remaining path.)
Folder: datastorecicduser/my drive/some/folder* or *folder* (The second example with exclude folder present inside every user's drive.)
File: datastorecicduser/my drive/some/folder/file.ext* or *file.ext*
Note
Trailing * is mandatory even for absolute path.
The second example with exclude file.ext present inside every user's drive.
Exclude locations by expression
User: datastorecicduser*
Folder: datastorecicduser/my drive/some/folder* or *folder*
File: datastorecicduser/my drive/some/folder/file.ext* or *file.ext*
Include locations within modification date
parameters:
toDate: "2021-08-18"
fromDate: "2021-08-19"
Note
Date format is in "YYYY-MM-DD".
toDate and fromDate are inclusive for files that are being scanned.
The files matching include_date_range do not have to match .include_recent.
This filter only works on modified date.
Include locations modified recently
parameters:
- days: 2
Note
Days should be between 1 to 99 (both inclusive).
The files matching include_recent do not have to match .include_data_range.
This filter only works on a modified date.
Exclude locations greater than file size
parameters:
- size: 2 (Size is in MB)
Note
Fractions or decimal points are not allowed.
Information Types
Infotype Name | Category | Region |
---|---|---|
AES Key | Personal Data | Global |
American Express | Financial | Global |
Artifactory Token | Personal Data | Global |
Australian Bank Account Number | Financial | Oceania |
Australian Business Number | Financial | Oceania |
Australian Company Number | Financial | Oceania |
Australian Driver License Number | Personal Data | Oceania |
Australian Healthcare Identifier - Organisation | Medical | Oceania |
Australian Individual Healthcare Identifier | Medical | Oceania |
Australian Mailing Address | Personal Data | Oceania |
Australian Medicare Card | Medical | Oceania |
Australian Medicare Provider | Medical | Oceania |
Australian Passport Number | Personal Data | Oceania |
Australian Tax File Number | National ID | Oceania |
Australian Telephone Number | Personal Data | Oceania |
Austrian Driver License Number | Personal Data | Europe |
Austrian Mailing Address | Personal Data | Europe |
Austrian Passport Number | Personal Data | Europe |
Austrian Personalausweis | National ID | Europe |
Austrian SSN | National ID | Europe |
Austrian Telephone Number | Personal Data | Europe |
AWS Key ID | Personal Data | Global |
Azure Storage Key | Personal Data | Global |
Basic Auth Secret | Personal Data | Global |
Belgian Driver License Number | Personal Data | Europe |
Belgian eID | National ID | Europe |
Belgian National Number | National ID | Europe |
Belgian Passport Number | Personal Data | Europe |
Belgian Telephone Number | Personal Data | Europe |
Brazilian CPF | National ID | Americas |
Brazilian Registro Geral | National ID | Americas |
Bulgarian EGN | National ID | Europe |
Canadian Bank Account Number | Financial | Americas |
Canadian Health Service Number | Medical | Americas |
Canadian Mailing Address | Personal Data | Americas |
Canadian Passport Number | Personal Data | Americas |
Canadian Personal Health Identification Number (PHIN) | Medical | Americas |
Canadian Social Insurance Number | National ID | Americas |
Canadian Telephone Number | Personal Data | Americas |
Chilean RUN | National ID | Americas |
China Union Pay | Financial | Global |
Cloudant Credentials | Personal Data | Global |
Credentials password | Personal Data | Global |
Credentials username | Personal Data | Global |
Croatian OIB | National ID | Europe |
Cypriot Passport Number | Personal Data | Europe |
Czech Republic RC | National ID | Europe |
Danish CPR | National ID | Europe |
Danish Driver License Number | Personal Data | Europe |
Danish Passport Number | Personal Data | Europe |
Date Of Birth (under 18) | Personal Data | Global |
Date Of Birth | Personal Data | Global |
DB2 Credentials | Personal Data | Global |
DH Key | Personal Data | Global |
Diners Club | Financial | Global |
Discord Token | Personal Data | Global |
Discover | Financial | Global |
Drug Enforcement Agency Number | Medical | Americas |
DSA Public Key | Personal Data | Global |
Dutch Burgerservicenummer | National ID | Europe |
Dutch Driver License Number | Personal Data | Europe |
Dutch NIK | National ID | Europe |
Dutch Passport Number | Personal Data | Europe |
Dutch Telephone Number | Personal Data | Europe |
ECC Public Key | Personal Data | Global |
Email addresses | Personal Data | Global |
Ethnicity (English) | Personal Data | Global |
European EHIC | Medical | Europe |
Finnish HETU | National ID | Europe |
French Carte Vitale | National ID | Europe |
French CNI | National ID | Europe |
French Driver License Number | Personal Data | Europe |
French INSEE | National ID | Europe |
French Mailing Address | Personal Data | Europe |
French Passport Number | Personal Data | Europe |
French Telephone Number | Personal Data | Europe |
Gambian National Identification Number | National | Africa |
Gender (English) | Personal Data | Global |
Generic Bank Account Number | Financial | Global |
German Driver License Number | Personal Data | Europe |
German Mailing Address | Personal Data | Europe |
German Passport Number | Personal Data | Europe |
German Personalausweis | National ID | Europe |
German Telephone Number | Personal Data | Europe |
Github Token | Personal Data | Global |
Greek AFM | National ID | Europe |
Greek AMKA | National ID | Europe |
Greek Passport Number | Personal Data | Europe |
Hong Kong ID | National ID | Asia |
Hungarian Personal ID | National ID | Europe |
IBM Cloud IAM Key | Personal Data | Global |
IBM COS HMAC Credentials | Personal Data | Global |
Icelandish Kennitala | National ID | Europe |
Indian Aadhaar Number | National ID | Asia |
Indian Address | Personal Data | Asia |
Indian Bank Account Number | Financial Data | Asia |
Indian Driving License Number | Personal Data | Asia |
Indian Marital Status | Personal Data | Asia |
Indian MGNREGA Job Card ID | National ID | Asia |
Indian Name | Personal Data | Asia |
Indian PAN (Juridical) Number | National ID | Asia |
Indian Passport Number | Personal Data | Asia |
Indian Phone Number | Personal Data | Asia |
Indian Ration Card Number | National ID | Asia |
Indian Voter ID | National ID | Asia |
International Bank Account Number (IBAN) | Financial | Global |
IP Address | Personal Data | Global |
Iranian National Identification Number | National | Asia |
Irish Driver License Number | Personal Data | Europe |
Irish Passport Card Number | Personal Data | Europe |
Irish Passport Number | Personal Data | Europe |
Irish Personal Public Service Number | National | Europe |
Irish Telephone Number | Personal Data | Europe |
ISO8583 message with PAN | Financial | Global |
Israeli Bank Account Number | Financial | Asia |
Israeli Identity Number | National ID | Asia |
Italian CARTA D'IDENTITÀ | National ID | Europe |
Italian Codice Fiscale | National ID | Europe |
Italian Driver License Number | Personal Data | Europe |
Italian Mailing Address | Personal Data | Europe |
Italian Passport | Personal Data | Europe |
Italian Telephone Number | Personal Data | Europe |
Japanese Bank Account Number | Financial | Asia |
Japanese Driver License Number | Personal Data | Asia |
Japanese Passport Number | Personal Data | Asia |
Japanese Resident Registration Number | National | Asia |
Japanese Social Insurance Number (SIN) | National | Asia |
JCB | Financial | Global |
JSON Web Token | Personal Data | Global |
Laser | Financial | Global |
Latvian Personas Kods | National ID | Europe |
License Number | Personal Data | Global |
Login credentials | Personal Data | Global |
Luxembourg Driver License Number | Personal Data | Europe |
Luxembourg ID | National ID | Europe |
Luxembourg Passport Number | Personal Data | Europe |
Luxembourg Phone Number | Personal Data | Europe |
MAC Address | Personal Data | Global |
Macedonian UMCN | National ID | Europe |
Maestro | Financial | Global |
Mailchimp Access Key | Personal Data | Global |
Malaysian NRIC | National ID | Asia |
Maltese eID | National ID | Europe |
Mastercard | Financial | Global |
Medicare Beneficiary Identifier (MBI) | Medical | North America |
Mexican CURP | National ID | Americas |
Mongo DB Credentials | Personal Data | Global |
MSSQL Database Credentials | Personal Data | Global |
MySQL Database Credentials | Personal Data | Global |
New Zealand Inland Revenue Number | National ID | Oceania |
New Zealand Mailing Address | Personal Data | Oceania |
New Zealand Passport Number | Personal Data | Oceania |
New Zealand Telephone Number | Personal Data | Oceania |
Norwegian Birth Number | National ID | Europe |
Norwegian Driver License Number | Personal Data | Europe |
Norwegian Passport Number | Personal Data | Europe |
NPM token | Personal Data | Global |
Oracle Database Credentials | Personal Data | Global |
Passport Number | Personal Data | Global |
Peoples Republic of China ID | National ID | Asia |
Personal Names (Austrian) | Personal Data | Europe |
Personal Names (Belgian) | Personal Data | Europe |
Personal Names (English) | Personal Data | Global |
Personal Names (French) | Personal Data | Europe |
Personal Names (German) | Personal Data | Europe |
Personal Names (Italian) | Personal Data | Europe |
Personal Names (Netherlands) | Personal Data | Europe |
Personal Names (Polish) | Personal Data | Europe |
Personal Names (Portuguese) | Personal Data | Europe |
PGP Public Key | Personal Data | |
Polish Driver License Number | Personal Data | Europe |
Polish Identity Card | National ID | Europe |
Polish Mailing Address | Personal Data | Europe |
Polish Passport Number | Personal Data | Europe |
Polish PESEL | National ID | Europe |
Polish Telephone Number | Personal Data | Europe |
Portuguese Citizen's Card | National ID | Europe |
Portuguese Driver License Number | Personal Data | Europe |
Portuguese Fiscal Number | National ID | Europe |
Portuguese Identity Number | National ID | Europe |
Portuguese Mailing Address | Personal Data | Europe |
Portuguese Passport Number | Personal Data | Europe |
Portuguese Phone Number | Personal Data | Europe |
PostgreSQL Database Credentials | Personal Data | Global |
Private Key | Personal Data | Global |
Private Label Card | Financial | Global |
Profanity (English) | Personal Data | Global |
Redis Credentials | Personal Data | Global |
Religion (English) | Personal Data | Global |
Romanian Identity Card | National ID | Europe |
Romanian Numerical Personal Code | National ID | Europe |
RSA Public Key | Personal Data | Global |
Saudi Arabia National ID | National ID | Asia |
SendGrid API Key | Personal Data | Global |
Serbian UMCN | National ID | Europe |
Singaporean Mailing Address | Personal Data | Asia |
Singaporean NRIC | National ID | Asia |
Singaporean Passport Number | Personal Data | Asia |
Singaporean Telephone Number | Personal Data | Asia |
Slack Token | Personal Data | Global |
Slovakian RC | National ID | Europe |
Slovenian EMSO | National ID | Europe |
SoftLayer Credentials | Personal Data | Global |
South African Identity Number | National ID | Africa |
South Korean Corporation Registration Number (법인등록번호) | Financial | Asia |
South Korean Driver License Number | Personal Data | Asia |
South Korean Foreigner Number | National ID | Asia |
South Korean Gwangju Bank (광주은행) Account Number | Financial | Asia |
South Korean Jeju Bank (제주은행) Account Number | Financial | Asia |
South Korean Jeonbuk Bank (전북은행) Account Number | Financial | Asia |
South Korean KB Bank (국민은행) Account Number | Financial | Asia |
South Korean KEB Hana Bank (KEB하나은행) Account Number | Financial | Asia |
South Korean NH Bank (농협은행) Account Number | Financial | Asia |
South Korean Passport | Personal Data | Asia |
South Korean Phone Number | Personal Data | Asia |
South Korean RRN | National ID | Asia |
South Korean Shinhan Bank (신한은행) Account Number | Financial | Asia |
South Korean Taxpayer Identification Number (사업자등록번호) | Financial | Asia |
Spanish DNI | National ID | Europe |
Spanish Driver License Number | Personal Data | Europe |
Spanish NIE | National ID | Europe |
Spanish Passport Number | Personal Data | Europe |
Spanish Social Security Number | National ID | Europe |
Spanish Telephone Number | Personal Data | Europe |
Square Oauth Secret | Personal Data | Global |
Sri Lankan National Identity Card | National ID | Asia |
SSH Private Key | Personal Data | Global |
SSH Public Key | Personal Data | Global |
Stripe Access Key | Personal Data | Global |
Swedish Driver License Number | Personal Data | Europe |
Swedish Nationellt ID-kort | National ID | Europe |
Swedish Passport Number | Personal Data | Europe |
Swedish Personnummer | National ID | Europe |
SWIFT Code | Financial | Global |
Swiss Social Security Number | National ID | Europe |
Taiwanese ID | National ID | Asia |
TDES Key | Personal Data | Global |
Thai Population Identification Code | National ID | Asia |
Troy | Financial | Global |
Turkish Identification Number | National ID | Europe |
Turkish Telephone Number | Personal Data | Europe |
Twilio API Key | Personal Data | Global |
United Arab Emirates ID | National ID | Asia |
United Kingdom Community Health Index | Medical | Europe |
United Kingdom Driver License Number | Personal Data | Europe |
United Kingdom Electoral Roll Number | Personal Data | Europe |
United Kingdom Health and Care Number | Medical | Europe |
United Kingdom Mailing Address | Personal Data | Europe |
United Kingdom National Health Service Number | Medical | Europe |
United Kingdom NI Number | National ID | Europe |
United Kingdom Passport Number | Personal Data | Europe |
United Kingdom Self Assessment UTR Number | National ID | Europe |
United Kingdom Telephone Number | Personal Data | Europe |
United Kingdom VAT Number | Financial | Europe |
United States Bank Account Number | Financial | Americas |
United States Driver License Number | Personal Data | Americas |
United States Health Insurance Claim Number | Medical | Americas |
United States Health Plan Identifier | Medical | Americas |
United States Individual Taxpayer Identification Number (ITIN) | National ID | Americas |
United States Mailing Address | Personal Data | Americas |
United States National Provider Identifier | Medical | Americas |
United States Passport Card Number | Personal Data | North America |
United States Passport Number | Personal Data | North America |
United States Routing Transit Number | Financial | Americas |
United States Social Security Number | National | Americas |
United States Telephone Number | Personal Data | Americas |
Visa | Financial | Global |
Yugoslavia UMCN | National ID | Europe |
Supported Formats
Files
Type | Format |
---|---|
Compressed | bzip2, Gzip (all types), TAR, Zip (all types) |
Databases | Access, DBase, SQLite, MSSQL MDF & LDF |
Images | BMP, FAX, GIF, JPG, PDF (embedded), PNG, TIF |
Microsoft Backup Archive | Microsoft Binary / BKF |
Microsoft Office | v5, 6, 95, 97, 2000, XP, 2003 onwards |
Open Source | Star Office / Open Office / Libre Office |
Open Standards | PDF, RTF, HTML, XML, CSV, TXT |
Office files
WORD
Legacy: Legacy filename extensions denote binary Microsoft Word formatting that became outdated with the release of Microsoft Office 2007. Although the latest version of Microsoft Word can still open them, they are no longer developed. Legacy filename extensions include:
.doc – Legacy Word document; Microsoft Office refers to them as "Microsoft Word 97 – 2003 Document"
.dot – Legacy Word templates; officially designated "Microsoft Word 97 – 2003 Template"
.wbk – Legacy Word document backup; referred as "Microsoft Word Backup Document"
OOXML: Office Open XML (OOXML) format was introduced with Microsoft Office 2007 and became the default format of Microsoft Word ever since. Pertaining file extensions include:
.docx – Word document
.docm – Word macro-enabled document; same as docx, but may contain macros and scripts
.dotx – Word template
.dotm – Word macro-enabled template; same as dotx, but may contain macros and scripts
.docb – Word binary document introduced in Microsoft Office 2007
EXCEL
Legacy: Legacy filename extensions denote binary Microsoft Excel formats that became outdated with the release of Microsoft Office 2007. Although the latest version of Microsoft Excel can still open them, they are no longer developed. Legacy filename extensions include:
.xls – Legacy Excel worksheets; officially designated "Microsoft Excel 97-2003 Worksheet"
.xlt – Legacy Excel templates; officially designated "Microsoft Excel 97-2003 Template"
.xlm – Legacy Excel macro
OOXML: Office Open XML (OOXML) format was introduced with Microsoft Office 2007 and became the default format of Microsoft Excel ever since. Excel-related file extensions of this format include:
.xlsx – Excel workbook
.xlsm – Excel macro-enabled workbook; same as xlsx but may contain macros and scripts
.xltx – Excel template
.xltm – Excel macro-enabled template; same as xltx but may contain macros and scripts
POWERPOINT
Legacy:
.ppt – Legacy PowerPoint presentation
.pot – Legacy PowerPoint template
.pps – Legacy PowerPoint slideshow
OOXML:
.pptx – PowerPoint presentation
.pptm – PowerPoint macro-enabled presentation
.potx – PowerPoint template
.potm – PowerPoint macro-enabled template
.ppam – PowerPoint add-in
.ppsx – PowerPoint slideshow
.ppsm – PowerPoint macro-enabled slideshow
.sldx – PowerPoint slide
.sldm – PowerPoint macro-enabled slide
ACCESS
Legacy:
.ade – Protected Access Data Project (not supported in 2013)
.adp - Access Data Project (not supported in 2013)
.mdb - Access Database (2003 and earlier)
.cdb - Access Database (Pocket Access for Windows CE)
.mda - Access Database, used for addins (Access 2, 95, 97), previously used for workgroups (Access 2)
.mdt - Access Add-in Data (2003 and earlier)
.mdf - Access (SQL Server) detached database (2000)
.mde - Protected Access Database, with compiled VBA and macros (2003 and earlier)
.ldb - Access lock files (associated with .mdb)
Available formats since Access 2007:
.accdb – The file extension for the new Office Access 2007 file format. This takes the place of the MDB file extension
.accde – The file extension for Office Access 2007 files that are in "execute only" mode. ACCDE files have all Visual Basic for Applications (VBA) source code hidden. A user of an ACCDE file can only execute VBA code, but not view or modify it. ACCDE takes the place of the MDE file extension
.accdt – The file extension for Access Database Templates
.accdr – is a new file extension that enables you to open a database in runtime mode. By simply changing a database's file extension from .accdb to .accdr, you can create a "locked-down" version of your Office Access database. You can change the file extension back to .accdb to restore full functionality
OUTLOOK
.pst - Outlook
.ost - Outlook
.msg - Outlook
.dbx - Outlook
OTHER
.pub – a Microsoft Publisher publication
.xps – a XML-based document format used for printing (on Windows Vista and later) and preserving documents
Databases
Microsoft SQL
Oracle
IBM DB2
PostgresQL
SAP HANA
MySQL
MongoDB
Big Data
Hadoop
Teradata
Binary Large Objects
Database | Object Type |
---|---|
Oracle | BLOB, CLOB |
Microsoft SQL Server | VARBINARY, Filestream |
PostgreSQL | bytea, text, Large Objects(oid) |
MySQL | BLOB, TINYBLOB, MEDIUMBLOB, LONGBLOB |
IBM DB2 | BLOBs, CLOB, DBCLOB |
Teradata | BLOBs |
MongoDB | GridFS |
SAP HANA | BLOB, NCLOB |
Configuration Backup
You can back up and restore the DDC configuration by using the Backup/Restore functionality available in CipherTrust Manager. Such a backup will include the following elements:
Data Stores
Branch Locations
Classification Profiles
Infotypes
Report definitions
This backup will not include the information about the scan executions.
Creating/Restoring the Configuration Backup
To create or restore a backup of your DDC configuration:
Log in to CipherTrust Manager.
Click the Admin Settings link on the dashboard.
Select Backups from the sidebar on the left. This will display the Backups screen.
To create a backup of your DDC configuration, click the Create Backup button.
To restore your DDC configuration from a backup, click the Upload Backup button.
For more details refer to these sections of the CipherTrust Manager documentation:
Configuration Backup Limitations
The configuration backup references the DDC Active Node. Restoring the backup to a different CipherTrust Manager cluster leaves DDC referencing an invalid node, and therefore without any valid active node.
The configuration backup contains the definition of the DDC resources (such as the Scan or Data Store definitions). Restoring from a backup that does not contain a certain resource (for example, a Custom Classification Profile) or a resource version after a scan had been completed causes a TDP scan execution data to point to an invalid resource identifier.
If you generate a report that points to the missing resource you may display incomplete data (such as not being able to display the resource name) and/or fail.
Creating/Restoring Backup of Scan Executions
To back up or restore the your Data Discovery and Classification scan executions data you need to access the DDC data stored in Hadoop. For details, refer to the Thales Data Platform Hadoop Backup section in the Thales Data Platform Administrator Guide.
Mounting an NFS Share
To mount an NFS share on a Proxy agent, run this command as root:
sudo mount <nfs-server-hostname|nfs-server-ipaddress>:</target/directory/share-name>
Issues Encountered while Browsing Target Paths
You may encounter following issues while trying to navigate target paths using Browse button.
Datastore | Scenario | Issue |
---|---|---|
AWS | Path inside Add Target Path field contains invalid folder name within valid Bucket name Example: <valid_bucket>/<invalid_folder> | No error toast message shows up. File browser displays "No paths to display". |
GMail | Path inside Add Target Path field contains invalid folder name within valid user account Example: <valid_useraccount>/<invalid_folder> | No error toast message shows up. File browser displays "No paths to display". |
GMail, Google Drive | Add Target Path field is empty or contains a path to a valid folder | User accounts are listed correctly, but targets within it are not displayed. |
NFS | Add Target Path field contains invalid path | No error toast message shows up. File browser displays "No paths to display". |
SharePoint Online, SharePoint Server | Add Target Path field contains: • Invalid folder in valid site collection • Invalid file within valid site collection • Invalid file with valid folder in a site collection Examples: <valid_site_collection>/:site/:file/<valid_folder>/<invalid_file> <valid_site_collection>/:site/:file/<invalid_folder> <valid_site_collection>/:site/:file/<invalid_file> | No error toast message shows up. File browser displays "No paths to display". |
SharePoint Online, SharePoint Server | Add Target Path field contains incorrect site collection | Arbritary folders and files may get displayed without any error. Selecting these results as target leads to a failed scan. |
Exchange Online | Add Target Path field is empty | System default groups are listed. Upon selecting one or more default groups, the scan is executed successfully without any macthes. System default groups cannot undergo scanning. Targets must be valid email or email groups. |
API Request Quota Limit
Datastore | Quota Limit (per minute request count) |
---|---|
Gmail | 15000 |
SharePoint Online | 600 |
Exchange Online | 800 |
OneDrive | 1000 |
Using Wildcard Characters
Asterisk (*) and question mark (?) are two most popular wildcard characters that you can use to traverse locations without providing absolute paths.
In DDC, you can use these characters while applying scan filters to scan selective locations instead of recursively scanning entire directory structures of data store.
Wildcard | Meaning | Examples |
---|---|---|
* | Matches zero or more characters | D:/A/* will traverse all locations starting with D:/A D:/A/*/F/E will traverse all locations starting with D:/A and ending with /F/E */A/E will traverse all locations ending with /A/E *A/E* will traverse all locations that contains A/E D:/A/*/B/C/* will traverse all locations that starts with D:/A/ and contains /B/C/ in the followed path. |
? | Matches exactly one character | D:/A??? will traverse all locations starting with D:/A followed by any three characters. |