Tokenize and detokenize Unicode blocks using Unicode.properties file
Unlike AlgoSpec, tokenization and detokenization of Unicode blocks using unicode.properties
file is not limited to the Unicode blocks mentioned in AlgoSpec. The unicode.properties
file enables user to perform tokenization and detokenization of Unicode characters ranging from 0000-FFFF
. Tokenization and detokenization of Unicode blocks using unicode.properties
file can be performed by specifying the path of unicode.properties
file in the UnicodeCodePointProperties
parameter of the SafeNetTokenVaultless.properties
file.
Tokenization and detokenization can be achieved using:
Tokenization and detokenization using Range parameter
Use Range parameter for tokenization and detokenization of Unicode blocks, if there is continuous range of code points within a Unicode block. Specifying the scope and undefined range of a Unicode block in unicode.properties
file is easy. To tokenize and detokenize Unicode input character using Range, specify input in a sequential range of Unicode input characters. Specify only one input range per line for tokenization and detokenization.
For example, if the start and end range of the scope is n
and m
respectively, then the input value to Scope.Range=n-m
.
Similarly, to exclude the undefined range within the scope, enter the start and end range of the undefined characters in Undefined.Range0
parameter of unicode.properties
file.
To tokenize and detokenize Unicode block using Range parameter, make sure that the below parameter is set in unicode.properties
file:
Unicode.Type.Specifier = Range
Tokenization and detokenization using FromFile parameter
Use FromFile
parameter for tokenization and detokenization of Unicode blocks, if the code points within a Unicode block are scattered. Mention all the code points of a Unicode block in a file, and add the location of the file in the unicode.properties
file. CADP for Java reads the file and generate the output from the same Unicode block.
To tokenize and detokenize Unicode input character using the FromFile
parameter, specify all the code points of a Unicode block in a file. Specify one value per line in the input file. The input value must be in hexadecimal format. Provide the absolute path of the input file in the Unicode.FromFile
parameter of the unicode.properties
file.
To tokenize and detokenize, ensure that the below parameter is set in unicode.properties
file:
Unicode.Type.Specifier = FromFile
For details, refer to the unicode.properties
file and the appropriate Unicode sample file bundled with CADP for Java software package.