Your suggested change has been received. Thank you.

close

Suggest A Change

https://thales.na.market.dpondemand.io/docs/dpod/services/kmo….

back

Bulk Utility

search

Please Note:

Bulk Utility

CT-V provides a command line bulk token utility that enables tokenization/detokenization of a very large data set at impressive speed. This utility is controlled through variables exposed in the configuration files (migration.properties, detokenization.properties and masking.properties). These are provided in the Tokenization/lib/ext directory.

CT-V Bulk Utility allows user to perform bulk tokenization/detokenization for plaintext. The following tasks can be performed through Bulk Utility:

  • Tokenization of plaintext from File-to-File:

    • using token vault

    • without using token vault

  • Tokenization of plaintext from database to database without using token vault.

  • Detokenization of tokens from File-to-File using token vault.

    Note

    Tokens generated without using the token vault (i.e. using masking.properties configuration file) cannot be detokenized.

The following table outlines the operations of CT-V Bulk Utility:

OperationOperation TypeProperties FileToken VaultSequential Token Generation
TokenizationFile-to-Filemigration.propertiesRequiredCan be set during token vault creation using KeySecure Classic or utilities.
TokenizationFile-to-Filemasking.propertiesNot requiredCan be set through masking.properties file.
TokenizationDatabase-to-Databasemasking.propertiesNot requiredCan be set through masking.properties file.
DetokenizationFile-to-Filedetokenization.propertiesRequiredCan be set during token vault creation using KeySecure Classic or utilities.

The user specifies, on command prompt, the operation to be performed by the utility (either tokenization or detokenization) and the properties file to be used. For tokenization, migration.properties or masking.properties file is used; and for detokenization, detokenization.properties file is used.

Note

Bulk masking using masking.properties file is not supported for Informix database.

CT-V Bulk Utility tokenizes large quantity of data in any of the below mentioned forms:

  • Data tokenization from clear text.

  • Data tokenization from already encrypted text. If the input data is already in encrypted form, the utility can decrypt this data and then tokenize it.

  • Data tokenization for clear text in the database table.

  • Data detokenization from delimited type input data.

  • Data detokenization from positional type input data.

CT-V Bulk Utility uses a multi-thread infrastructure to provide high-performance data transfer across a data pipeline. Users can modify certain parameters (Threads.BatchSize, Threads.CryptoThreads, Threads.TokenThreads and Threads.PollTimeout) in migration.properties, masking.properties and detokenization.properties files to improve performance in different bulk tokenization and detokenization scenarios. The utility provides live performance monitoring data as well as results (totals and performance data) for completed migration tasks to inform and optimize performance.

The utility works with data in flat files for File-to-File type operation. It is must to correctly populate the input data file, supplying the data to be tokenized/detokenized, and formatting it. The utility includes a data file reader that reads large flat files and supplies the data to the file reader thread. The utility can also read data directly from the database for Database-to-Database type operation.

Note

At this time, the utility has no capacity to identify and skip individual data elements because of errors. Files with an error are rejected. Make sure all of the data adheres to the descriptions set up in the properties file.

The user sends the plain/tokenized data through an input data file and sets the parameters in migration.properties, masking.properties or detokenization.properties files (also, SfntDbp.properties and SafeNetToken.properties files, if required). The migration.properties, masking.properties and detokenization.properties file allows the user to instruct the tokenization and detokenization of data by setting various parameters like format of input data file, location of input data file, token format, number of records to be tokenized at a time, location of output file, sequence in which columns will be displayed in output file, etc.

CT-V Bulk Utility tokenizes/detokenizes the data and saves it to the output file/destination database, as per the parameters set in the properties file. For tokenization/detokenization using the token vault, the output is also stored in the database.

Note

Multisite is not supported in CT-V Bulk Utility.

Supported Platforms

CT-V Bulk Utility is java based, so it must support all the platforms, but it has been tested and works well on the following platforms:

  • Windows 2008 R2 Enterprise Server 64-bit

  • Windows 2012 Enterprise Server 32-bit

  • Linux (RedHat6)

Supported Databases

The following section lists the databases supported by CT-V Bulk Utility:

Oracle 11gOracle 19cSQL Server 2012SQL Server 2017MySQL 5.7
Oracle 12cOracle 21cSQL Server 2014SQL Server 2019MySQL 8.0

Note: (For MySQL 8.0, the Java runtime environment version must be 8 or above)
Oracle 18cSQL Server 2008SQL Server 2016MySQL 5.6Informix 12.10

Supported Data Types for Bulk Migration (Without Using Token Vault)

The following table shows the data types supported for DB-to-DB masking for MS SQL, MySQL and Oracle:

Data TypeMS SQLMySQLOracle
CHARYesYesYes
VARCHARYesYesYes
NCHARYesYesYes
NVARCHARYesYesYes
INTYesYesYes
SMALLINTYesYesNo
TINYINTYesYesNo
MEDIUMINTNoYesNo
REALYesNoNo
BIGINTYesYesNo
DECIMALYesYesYes
FLOATYesYesYes
NUMERICNoNoNo
DOUBLENoYesNo
DATEYesYesYes
DATETIMEYesYesNo
TIMESTAMPNoYesYes

Note

  • The schema for the destination database will be created by the user and the schema should be same as that of the source database.

  • For sequential token format, if there is a duplicate value in a batch of the input data, then there is a break in the sequence and a skip of value in the next batch. For example, the input file has the data 1, 2, 2, 3, 4, 5, 6, 7 and is run in batches of 4.

    The sequential output is generated in the following manner: 11, 12, 12, 13, 15, 16, 17, 18. The duplicate value 2 has resulted in tokenized value 15 instead of 14 for the corresponding input value 4.

  • Four token formats: RANDOM_TOKEN, FIRST_SIX_LAST_FOUR_TOKEN, LAST_FOUR_TOKEN and LAST_SIX_TOKEN are supported for Date data type.

  • For Oracle database, it is not recommended to use the sequential token format for Date data type, as the tokens will produce a change in the millisecond field of Date. The millisecond is not stored in Oracle for Date data type (directly) and the user will get same values for the date columns.

  • If any data type apart from the ones mentioned in the preceding table is provided, then the data gets copied to the destination database table.