Please Note:

Tokenize and detokenize Unicode blocks using Unicode.properties file

Unlike AlgoSpec, tokenization and detokenization of Unicode blocks using unicode.properties file is not limited to the Unicode blocks mentioned in AlgoSpec. The unicode.properties file enables user to perform tokenization and detokenization of Unicode characters ranging from 0000-FFFF. Tokenization and detokenization of Unicode blocks using unicode.properties file can be performed by specifying the path of unicode.properties file in the UnicodeCodePointProperties parameter of the SafeNetTokenVaultless.properties file.

Tokenization and detokenization can be achieved using:

Range parameter of unicode.properties file
FromFile parameter of unicode.properties file

Tokenization and detokenization using Range parameter

Use Range parameter for tokenization and detokenization of Unicode blocks, if there is continuous range of code points within a Unicode block. Specifying the scope and undefined range of a Unicode block in unicode.properties file is easy. To tokenize and detokenize Unicode input character using Range, specify input in a sequential range of Unicode input characters. Specify only one input range per line for tokenization and detokenization.

For example, if the start and end range of the scope is n and m respectively, then the input value to Scope.Range=n-m.

Similarly, to exclude the undefined range within the scope, enter the start and end range of the undefined characters in Undefined.Range0 parameter of unicode.properties file.

To tokenize and detokenize Unicode block using Range parameter, make sure that the below parameter is set in unicode.properties file:

Unicode.Type.Specifier = Range

Tokenization and detokenization using FromFile parameter

Use FromFile parameter for tokenization and detokenization of Unicode blocks, if the code points within a Unicode block are scattered. Mention all the code points of a Unicode block in a file, and add the location of the file in the unicode.properties file. CADP for Java reads the file and generate the output from the same Unicode block.

To tokenize and detokenize Unicode input character using the FromFile parameter, specify all the code points of a Unicode block in a file. Specify one value per line in the input file. The input value must be in hexadecimal format. Provide the absolute path of the input file in the Unicode.FromFile parameter of the unicode.properties file.

To tokenize and detokenize, ensure that the below parameter is set in unicode.properties file:

Unicode.Type.Specifier = FromFile

For details, refer to the unicode.properties file and the appropriate Unicode sample file bundled with CADP for Java software package.

Tokenize and detokenize Unicode blocks using Unicode.properties file

Tokenization and detokenization using Range parameter

Tokenization and detokenization using FromFile parameter

On this page

Suggest A Change