Bulk Tokenization
The Java APIs and the Web Services provide Java developers with the capability to tokenize and detokenize large arrays using these TokenService methods:
insert()
get()
getToken()
This section explains:
Bulk tokenization using multithread processing, and the use of two configuration parameters in the SafeNetToken.properties file to control when multithreading is used and how many threads are used.
Smart Check capabilities, which can:
be activated when TokenService bulk tokenization processing is completed using insert() and get().
enable bulk tokenization processing to complete the processing of a flawed input array, even when it contains row-level errors (for example, an error in an array item value).
store bulk tokenization process exception data in an array, and provide methods for fetching this data through the Java Class TmResult methods.
Your application must import com.safenet.token.TokenService to use the API. The section covers the following topics:
Note
The user may use the obfuscated password or credential. See Creating Obfuscated Data Using Obfuscation Utility for more information
Bulk Tokenization using Multithread Processing
To improve performance and productivity, CT-V provides this bulk tokenization functionality for your use when you use the insert() and get() methods to process large arrays, e.g., arrays with 2000+ elements (Refer to Java APIs for method descriptions). Multithread bulk tokenization expedites the processing of tokenizing large input arrays by automatically splitting large arrays into multiple batches and running these batches as multiple threads.
Two configuration parameters in the SafeNetToken.properties file — NumTMThreads and BatchSizeThresholdToThread — enable you to control when multithreading is used and how many threads are used.
Parameters
Thales provides default values for the bulk tokenization parameters, so you do not need to set them in order to start using multithread processing. You can change the behavior of this process by adjusting the two parameters in the SafeNetToken.properties file. As is true for any Java property, you have the option to specify these parameters by calling System.setProperty().
The two parameters function as follows:
Use NumTMThreads to set the number of threads to use when performing the TokenService methods listed above. The default is 8, with a suggested batch size of 40k.
Use BatchSizeThresholdToThread to specify the minimum number of elements that must be in a String array in order to allow multithreading. The minimum valid value that allows multithreading is 1000. Specifying any value lower than 1000 will result in an exception. The default is 2000. Using this number, if there are 2000 or more elements in the array, then the code will use multithreading (assuming NumTMThreads > 1); if there are less than 2000 elements in the input array, the code will not use multiple threads to perform the operation, and the operation will instead run as one batch. There is no maximum number of elements in the array. Setting BatchSizeThresholdToThread to a high value (e.g., 10,000,000), would prevent TokenService from using threads.
Thales recommends the following guidelines as best practices:
When using the CT-V API in a multi-threaded application on a single CPU, you can expect normal performance when using eight to fifteen threads. Increasing the thread count in this environment increases the likelihood of performance degradation.
When running batch jobs in Oracle, execute the analyze table command on the command line after running the first batch job on a token vault. For example:
analyze table <your token vault table> compute statistics;
If this command is not used, performance may degrade after running batches between 5000 and 10000 rows.
Tip
The Smart Check feature, described below, can be used with bulk tokenization processing to minimize delays caused by row-level errors in the input array.
Data Collisions can occur if you generate too many tokenized values that are the same as tokens that already exist. Bulk tokenization makes 100 attempts to tokenize a value. If tokenizing collides 100 times, tokenization for that element fails. The
Note
bulk tokenization process does not log this specific event. The customer is responsible for logging this data. For example, you may log the batch number in which a failure occurs, and use TmResult to identify the row where the failure occurs.
Smart Check
“Smart Check” is a feature that enables you to bypass errors that occur in bulk tokenization or bulk detokenization input when you use insert() or get() methods. Previously, if you used a process that tokenized or detokenized bulk data, individual data elements containing erroneous or illegal data would cause the entire process to fail. For example, versions of insert() and get() available before release 6.2 threw exceptions when any failure occurred. One invalid input value or one null element in the array could cause the failure of a tokenization process acting on thousands of other valid values.
The Java API Methods and Web Services introduced in this section enable your bulk process to skip bad data in an otherwise correct array and to continue the bulk processing. Each occurrence of problematic data input at the item level is identified in the TmResult error index. This is an array of indices pointing to incorrect input items returned by TmResult, along with associated error messages that provide information useful in identifying and recovering from the error. CT-V returns tokens (or row-level errors) in the exactly the same order as the input elements.
This functionality splits your bulk data into batches. Be aware that you must establish some means of tracking your batches. Indices and error messages generated by the API cannot tell you which batch contains the erroneous row-level data.
Note
“Smart Check” functionality enables you to complete the processing of bulk tokenization input even when it contains row level errors: a significant enhancement.
Java and Web Service Methods that use Smart Check
Signatures of the Java API and Web Service methods that use the Smart Check functionality are listed below for reference. For the standard Java API method information, see Java APIs. For the standard Web Service method information, see SOAP Web Services.
TokenService.insert
public TmResult insert (String[] values, String[] customData, String table, int format, boolean luhnCheck, boolean saveExceptions) throws TokenException
TokenService.get
public TmResult get (String[] token, String[] customData, String table, int format, boolean saveExceptions) throws TokenException
SafeNetTokenizer.InsertBatchWithCustomDataSmartCheck
public TmResult InsertBatchWithCustomDataSmartCheck (String naeUser, String naePassword, String dbUser, String dbPswd, String tableName, String[] values, String[] customData, Integer format, Boolean luhnCheck, Boolean saveExceptions) throws TokenException
SafeNetTokenizer.GetBatchWithCustomDataSmartCheck
public TmResult GetBatchWithCustomDataSmartCheck( String naeUser, String naePassword, String dbUser, String dbPswd, String tableName, String[] tokens, String[] customData, Integer format, Boolean saveExceptions) throws TokenException
Functional Overview
The solution works as follows:
The Java API methods insert() and get() are overloaded, and the new versions give you the option not to throw exceptions when row-level errors (null values, invalid values) appear in the input. Note that the Smart Check versions feature a saveExceptions parameter.
Your application calls the Smart Check-enhanced version of insert() or get(), and when the boolean value of the saveExceptions parameter is set to true, the method is applied.
A Java class, TmResult, provides information regarding the failure or failures. TmResult indicates whether errors occurred. If they occurred, a TmResult method, getStatusType(), indicates the level at which they occurred, as follows:
TmResult.STATUS_OK: There were no errors during processing. You can get the results by using another TmResult method, getOutput().
TmResult.STATUS_BATCH: A batch-level error occurred. The processing failed before the processing of individual elements, and the whole process fails. When statusType is set to the value TmResult.StatusBatch, batchError will write error information, e.g., “Batch Error in CCVault_003” to TmResult. Batch level errors cannot be skipped. The tokenization process throws an exception. When statusType is set to the value TmResult.StatusBatch, you can use getBatchErrorMessage() to troubleshoot the batch-level problem.
TmResult.STATUS_ROW: A row-level error occurred; for example, one of the elements in the input array was invalid. When statusType is set to this value, the error message will convey the error information for that element. If a row-level error occurs, the problematic row is skipped and the bulk tokenization process continues.
To enable you to recover and process the skipped data, information about each row-level error is recorded in an array. In this array, the row error is assigned an error index and an error message. Examples of this information are provided in the table below.
Example: Error index and Error tracking information for Row-Level Errors
Error Order Zero-indexed number assigned to each error that was bypassed | Error Index Row number of the error in the original array (input) | Error Message |
---|---|---|
0 | 2 | Error for value, " " |
1 | 5 | null element |
2 | 6 | invalid data element |
After tokens have been created, the detokenize methods can be used to get the plain text associated with a token or an array of tokens. For example, you can use the Java API get() methods or web service method GetBatchWithCustomDataSmartCheck to check if a token exists for a plain text value. If the token exists, it is returned.
Note
Use the TmResult methods, documented below, to fetch the StatusType, error index, batch error message, and output fields.
Using Smart Check
To use this solution:
Apply the method with the saveException value set to true to apply the Smart Check functionality. API methods: insert() or get()
Web Service methods: InsertBatchWithCustomDataSmartCheck or GetBatchWithCustomDataSmartCheck
The class TmResult indicates the success, as follows:
no errors during processing, indicated by int STATUS_OK = 0
batch-level error, indicated by int STATUS_BATCH = 1
row-level error, indicated by int STATUS_ROW = 2
Results are as follows:
If there is no error at any level, the output array is delivered as expected.
If there is an error at the batch level, TmResult indicates failure - a batch error message is returned, the process fails, and no output tokens are delivered.
If there are errors at the row level, the resulting array is delivered as expected, except that any row containing invalid data has been set to null. A message is returned with error data.
Check for error messages:
For example, if the user passes a null for table/vault name, the statusType is Batch (value = 1), indicating a fatal, batch level error. The error message will list possible reasons for the failure.
If there are errors at the row level, a message is returned with error data: the total number of errors, indices for the elements that failed tokenization or detokenization, and the error message string.
Parse the TmResult object to get the operation results and check the status with
TmResult.getStatusType()
.If status is 0 or 2, call TmResult.getOutput()
If status = 1, use getBatchErrorMessage() to troubleshoot the batch problem.
If status = 2, use getErrorIndex() and getErrorMessage()to feed a recovery process for the bypassed row-level data.
Note
TmResult object maintains all the error information. Input data is not logged, since it would not be secure. So, to get tokens or row-level error messages, you must use a TmResult object.
TmResult Class
Use the TmResult methods to fetch the results of the bulk tokenization and Smart Check processing, including StatusType, Error Index, Error Code, Batch Error Message, and Output.
TmResult provides the following methods:
public String getBatchErrorMessage()
Returns the batch error message. Set only when getStatusType() == STATUS_BATCH.
public Integer getStatusType()
Returns status indicating success or failure of the operation, as follows:
If all elements were successfully processed, it returns STATUS_OK.
If one or more elements failed to be processed, it returns STATUS_ROW, and getErrorIndex() contains the indices of the failed element.
If a failure that is not related to one or more elements in the input array occurs (e.g., if the input array value is null), then it returns STATUS_BATCH.
public Integer[] getErrorIndex()
Returns an array containing indices pointing to each row location that contained an element that failed a CT-V operation.
The index returned via getErrorIndex() maps to the problem row(s) in the array returned by getOutput(). When you are inserting (tokenizing), this output array should contain tokens. When detokenizing, i.e., when calling TokenService.get(), the returned indices map to
Note
the order of the original plain text values.
public String[] getOutput()
Returns the results of the bulk tokenization or detokenization operation. When successful, the output contains tokens returned by insert() and detokenized values returned by get(). If the operation encountered row- level errors, it contains null elements in the corresponding rows.
SiteBean Class
Use the SiteBean class to get site ID, site status, rows updated and error message, if any. SiteBean provides the following methods:
public int getSiteID()
Returns Site ID to which this instance represent.
public String getStatus()
Returns one of the two the status type codes i.e. SUCCESS OR FAILURE
public int getRowsUpdated()
Returns number of rows deleted in site represent by SiteID.
public String getErrorMessage()
Returns error message in case of failure otherwise N/A.