Bulk Utility
CT-V provides a command line bulk token utility that enables tokenization/detokenization of a very large data set at impressive speed. This utility is controlled through variables exposed in the configuration files (migration.properties
, detokenization.properties
and masking.properties
). These are provided in the Tokenization/lib/ext
directory.
CT-V Bulk Utility allows user to perform bulk tokenization/detokenization for plaintext. The following tasks can be performed through Bulk Utility:
Tokenization of plaintext from File-to-File:
using token vault
without using token vault
Tokenization of plaintext from database to database without using token vault.
Detokenization of tokens from File-to-File using token vault.
Note
Tokens generated without using the token vault (i.e. using
masking.properties
configuration file) cannot be detokenized.
The following table outlines the operations of CT-V Bulk Utility:
Operation | Operation Type | Properties File | Token Vault | Sequential Token Generation |
---|---|---|---|---|
Tokenization | File-to-File | migration.properties | Required | Can be set during token vault creation using KeySecure Classic or utilities. |
Tokenization | File-to-File | masking.properties | Not required | Can be set through masking.properties file. |
Tokenization | Database-to-Database | masking.properties | Not required | Can be set through masking.properties file. |
Detokenization | File-to-File | detokenization.properties | Required | Can be set during token vault creation using KeySecure Classic or utilities. |
The user specifies, on command prompt, the operation to be performed by the utility (either tokenization or detokenization) and the properties file to be used. For tokenization, migration.properties
or masking.properties
file is used; and for detokenization, detokenization.properties
file is used.
Note
Bulk masking using
masking.properties
file is not supported for Informix database.Bulk Utility with masking does not support custom formats for
-ftf
and-dtd
operations.
CT-V Bulk Utility tokenizes large quantity of data in any of the below mentioned forms:
Data tokenization from clear text.
Data tokenization from already encrypted text. If the input data is already in encrypted form, the utility can decrypt this data and then tokenize it.
Data tokenization for clear text in the database table.
Data detokenization from delimited type input data.
Data detokenization from positional type input data.
CT-V Bulk Utility uses a multi-thread infrastructure to provide high-performance data transfer across a data pipeline. Users can modify certain parameters (Threads.BatchSize
, Threads.CryptoThreads
, Threads.TokenThreads
and Threads.PollTimeout
) in migration.properties
, masking.properties
and detokenization.properties
files to improve performance in different bulk tokenization and detokenization scenarios. The utility provides live performance monitoring data as well as results (totals and performance data) for completed migration tasks to inform and optimize performance.
The utility works with data in flat files for File-to-File type operation. It is must to correctly populate the input data file, supplying the data to be tokenized/detokenized, and formatting it. The utility includes a data file reader that reads large flat files and supplies the data to the file reader thread. The utility can also read data directly from the database for Database-to-Database type operation.
Note
At this time, the utility has no capacity to identify and skip individual data elements because of errors. Files with an error are rejected. Make sure all of the data adheres to the descriptions set up in the properties file.
The user sends the plain/tokenized data through an input data file and sets the parameters in migration.properties
, masking.properties
or detokenization.properties
files (also, SfntDbp.properties
and SafeNetToken.properties
files, if required). The migration.properties
, masking.properties
and detokenization.properties
file allows the user to instruct the tokenization and detokenization of data by setting various parameters like format of input data file, location of input data file, token format, number of records to be tokenized at a time, location of output file, sequence in which columns will be displayed in output file, etc.
CT-V Bulk Utility tokenizes/detokenizes the data and saves it to the output file/destination database, as per the parameters set in the properties file. For tokenization/detokenization using the token vault, the output is also stored in the database.
Note
Multisite is not supported in CT-V Bulk Utility.
Supported Platforms
CT-V Bulk Utility is java based, so it must support all the platforms, but it has been tested and works well on the following platforms:
Windows 2008 R2 Enterprise Server 64-bit
Windows 2012 Enterprise Server 32-bit
Linux (RedHat6)
Supported Databases
The following section lists the databases supported by CT-V Bulk Utility:
Oracle 11g | Oracle 19c | SQL Server 2012 | SQL Server 2017 | MySQL 5.7 |
Oracle 12c | Oracle 21c | SQL Server 2014 | SQL Server 2019 | MySQL 8.0 Note: (For MySQL 8.0, the Java runtime environment version must be 8 or above) |
Oracle 18c | SQL Server 2008 | SQL Server 2016 | MySQL 5.6 | Informix 12.10 |
Supported Data Types for Bulk Migration (Without Using Token Vault)
The following table shows the data types supported for DB-to-DB masking for MS SQL, MySQL and Oracle:
Data Type | MS SQL | MySQL | Oracle |
---|---|---|---|
CHAR | Yes | Yes | Yes |
VARCHAR | Yes | Yes | Yes |
NCHAR | Yes | Yes | Yes |
NVARCHAR | Yes | Yes | Yes |
INT | Yes | Yes | Yes |
SMALLINT | Yes | Yes | No |
TINYINT | Yes | Yes | No |
MEDIUMINT | No | Yes | No |
REAL | Yes | No | No |
BIGINT | Yes | Yes | No |
DECIMAL | Yes | Yes | Yes |
FLOAT | Yes | Yes | Yes |
NUMERIC | No | No | No |
DOUBLE | No | Yes | No |
DATE | Yes | Yes | Yes |
DATETIME | Yes | Yes | No |
TIMESTAMP | No | Yes | Yes |
Note
The schema for the destination database will be created by the user and the schema should be same as that of the source database.
For sequential token format, if there is a duplicate value in a batch of the input data, then there is a break in the sequence and a skip of value in the next batch. For example, the input file has the data 1, 2, 2, 3, 4, 5, 6, 7 and is run in batches of 4.
The sequential output is generated in the following manner: 11, 12, 12, 13, 15, 16, 17, 18. The duplicate value 2 has resulted in tokenized value 15 instead of 14 for the corresponding input value 4.
Four token formats:
RANDOM_TOKEN
,FIRST_SIX_LAST_FOUR_TOKEN
,LAST_FOUR_TOKEN
andLAST_SIX_TOKEN
are supported for Date data type.For Oracle database, it is not recommended to use the sequential token format for Date data type, as the tokens will produce a change in the millisecond field of Date. The millisecond is not stored in Oracle for Date data type (directly) and the user will get same values for the date columns.
If any data type apart from the ones mentioned in the preceding table is provided, then the data gets copied to the destination database table.