Token Formats
This topic explains about the token formats, that determine the structure of the token. Some formats retain some of the original plain text, others always create values that fail the Luhn check.
CT-V provides the following token formats.
This topic also covered the Maximum Number of Tokens Supported for a Format and an Overview of Sequential Tokens.
You can also create customized token formats, for information on creating your own formats, see Creating New Token Formats.
Alphanumerics
There are nine formats that create tokens based on alphanumeric input. Letters are replaced with letters, numbers are replaced with numbers. Spaces, dashes, and special characters are maintained, except as noted below.
RANDOM_TOKEN: creates a token of random characters from the following pool: a-z A-Z 0-9. The length of token is same as the length of the original input data. Special characters and case of the characters are retained.
RANDOM_ALPHANUMERIC_TOKEN: creates a token of random characters from the following pool: a-z A- Z 0-9 @ % + \ / ' ! # $ ^ ? : , ( ) { } [ ] ~ ` - _ where the length of the token is the length of the original input data. Special characters are not retained.
ALPHANUMERIC_TOKEN: creates a token of random characters from the following pool: a-z A-Z 0-9 where the length of the token is the length of the original input data. Special characters are not retained.
SEQUENTIAL_TOKEN: creates a sequential token. The sequence is a function of the database. The token vault must be created with the Is Token Sequential check box enabled.
LAST_FOUR_TOKEN: creates a token that retains the last 4 original values.
FIRST_SIX_TOKEN: creates a token that retains the first 6 characters.
FIRST_TWO_LAST_FOUR_TOKEN: creates a token that retains the first 2 and last 4 characters.
FIRST_SIX_LAST_FOUR_TOKEN: creates a token that retains the first 6 and last 4 characters.
FIXED_NINETEEN_TOKEN: creates a token that is a 19 digit value where the first 5 digits are 7s. TOKEN column size must be 19, otherwise an exception is thrown.
Note
Whenever a Luhn check must be performed on the token, the input value cannot contain alphanumerics and must contain 10 digits or more.
Here are some examples of common and uncommon use-cases:
RANDOM_TOKEN
All characters are replaced with random characters. Special characters, spaces, and hyphens remain intact.
Letters beget letters. Numbers beget numbers.
Plaintext Value | RANDOM_TOKEN |
---|---|
111111ssssss | 890109huelnr |
890109huelnr | 526255jfozji |
!@*& ^$%-891 728-937%% | !@*& ^$%-763 278-026%% |
RANDOM_ALPHANUMERIC_TOKEN
The token format RANDOM_ALPHANUMERIC_TOKEN replaces digits, letters and special characters with characters from the pool a-z A-Z 0-9 @ % + \ / ' ! # $ ^ ? : , ( ) { } [ ] ~ ` - _ . The length of the token is same as that of plaintext. Do not use Luhn check validation with this format, as letters and special characters in the output force a Luhn check failure.
Plaintext Value | RANDOM_ALPHANUMERIC_TOKEN |
---|---|
747abc@#1 | %bz@W3490 |
ALPHANUMERIC_TOKEN
The token format ALPHANUMERIC_TOKEN replaces digits, letters and special characters with characters from the pool a-z A-Z 0-9. Do not use Luhn check validation with this format, as letters in the output force a Luhn check failure.
Plaintext Value | ALPHANUMERIC_TOKEN |
---|---|
545abc@#1 | abzAW3456 |
SEQUENTIAL_TOKEN
Returns numeric, sequential values. For the first value, the CT-V fills the entire token with 1’s. The sequence is then incremented for every successful token request. Spaces and hyphens are ignored and not included in the output. The resulting token is always numeric, regardless of input.
Plaintext Value | SEQUENTIAL_TOKEN |
---|---|
545454545 | 1111111111 |
565656565 | 1111111112 |
575 757-575 | 1111111113 |
!@*& GGG-664 455-332%% | 1111111114 |
LAST_FOUR_TOKEN
The last four characters are left intact. Spaces and hyphens remain intact, regardless of their location. When a space or hyphen is in one of the retained positions, it is not ignored, but is considered part of the retained characters. Special characters remain untouched, regardless of their location.
Letters beget letters. Numbers beget numbers.
Plaintext Value | LAST_FOUR_TOKEN |
---|---|
545454545 | 1111111111 |
222222dddddd | 833339fldddd |
222 222 ddddd-dd | 745 775 cjzwd-dd |
!@*& ^$%-345 345-345%% | !@*& ^$%-320 707-145%% |
FIRST_SIX_TOKEN
The first six characters are left intact. Spaces and hyphens remain intact, regardless of their location. When a space or hyphen is in one of the retained positions, it is not ignored, but is considered part of the retained characters. Notice that a space is one of the six retained characters. Special characters remain untouched, regardless of their location.
Letters beget letters. Numbers beget numbers.
Plaintext Value | FIRST_SIX_TOKEN |
---|---|
333333wwwwww | 333333nrtbqk |
333 334 www-www | 333 332 lub-hjt |
!@*& ^$%-989 898-989%% | !@*& ^$%-163 655-682%% |
FIRST_TWO_LAST_FOUR_TOKEN
The first two and the last four characters are left intact. Spaces and hyphens remain intact, regardless of their location. When a space or hyphen is in one of the retained positions, it is not ignored, but is considered part of the retained characters. Notice that a hyphen is one of the last four retained characters. Special characters remain untouched, regardless of their location.
Letters beget letters. Numbers beget numbers.
Plaintext Value | FIRST_TWO_LAST_FOUR_TOKEN |
---|---|
444444vvvvvv | 444975zbvvvv |
444 444 vvv-vvv | 444 447 tcy-vvv |
!@*& ^$%-212 212-212%% | !@*& ^$%-009 337-112%% |
FIRST_SIX_LAST_FOUR_TOKEN
Spaces and hyphens remain intact, regardless of their location. When a space or hyphen is in one of the retained positions, it is not ignored, but is considered part of the retained characters. Notice that a hyphen is one of the last four retained characters, and a space is one of the first six. Special characters remain untouched, regardless of their location.
Letters beget letters. Numbers beget numbers.
Plaintext Value | FIRST_SIX_LAST_FOUR_TOKEN |
---|---|
555555yyyyyy | 555555hoyyyy |
444 444 vvv-vvv | 444 447 tcy-vvv |
!@*& ^$%-212 212-212%% | !@*& ^$%-009 337-112%% |
555 555 yyy-yyy | 555 553 spw-yyy |
!@*& ^$%-545 545-545%% | !@*& ^$%-408 019-745%% |
FIXED_NINETEEN_TOKEN
Given any number of digits, returns a random 19 digit value where the first five are masked with 7’s - to identify the value as a token. Letters are converted to numbers. Dashes and spaces used to format the original value are not maintained. Special characters are converted to numbers.
Plaintext Value | FIXED_NINETEEN_TOKEN |
---|---|
666 666-666 | 7777767458700084773 |
666666rrrrrr | 7777724481595254022 |
!@*& ^$%-744 744-744%% | 7777755409271995278 |
Email Address
The EMAIL_ADDRESS_TOKEN format creates a token in an email format, for example: something@email.com could become mhipdwxme@ljrte.aab.
CT-V tokenizes the original value but keeps the @ and dot (.) characters in place. Letters become different letters, special characters become letters, and numbers become different numbers.
You should note that:
This token format does not validate that the input is a valid email. If you send a numeric value, that value will be tokenized without error.
The location of the @ and . values is irrelevant. They are persistent regardless of location.
The following special characters become letters: ~‘!#$%^&*()-=_+[]{},\:;’?/
Here are some examples of common and uncommon use-cases:
Plaintext Value | EMAIL_ADDRESS_TOKEN |
---|---|
something@email.com | mhipdwxme@ljrte.aab |
0920593450475029 | 7413178463094594 |
@.@.@.@.@.@. | @.@.@.@.@.@. |
you@company.com | vse@vtawaad.wxt |
your.name@company.com | iugt.cgvf@rrxojnm.tcf |
<> | @email.com |
Dates
There are three formats that create tokens in the form of dates:
DATE_MMDDYYYY_TOKEN: creates a token that is a valid date in the format mm.dd.yyyy.
DATE_DDMMYYYY_TOKEN: creates a token that is a valid date in the format dd.mm.yyyy.
DATE_YYYYMMDD_TOKEN: creates a token that is a valid date in the format yyyy.mm.dd.
You should note that:
All token dates are between January 01, 1800 and December 31, 2500.
The input must be at least 10 characters, but the CT-V does not check that the input is a valid date. We assume that your application has already done this. The output is always 10 characters.
Each token format ignores whatever value is in the delimiter position - whatever is used to separate the day, month and year, is left intact.
Here are some examples of common and uncommon use-cases:
DATE_MMDDYYYY_TOKEN
Send CT-V a valid date and it will create a valid token. Use different delimiters, and CT-V considers this totally different input. You’ll get a different token. Send CT-V a date in the incorrect format, and you’ll still get a valid token - for the format you should have sent.
Valid Date | DATE_MMDDYYYY_TOKEN |
---|---|
01.01.2011 | 08.05.1986 |
1989-12-12 | 0580412498 |
99.99.9999 | 02.09.2013 |
CT-V-doesn't validate the date | 10-24e2359 |
~`!@#$%^&*()_+-=[] | 04!30$1865 |
01.01.2011 | 08.05.1986 |
DATE_DDMMYYYY_TOKEN
Send CT-V a valid date and it will create a valid token. Use different delimiters, and CT-V considers this totally different input. You’ll get a different token. Send CT-V a date in the incorrect format, and you’ll still get a valid token - for the format you should have send.
Valid Date | DATE_DDMMYYYY_TOKEN |
---|---|
02.02.2011 | 23.01.2149 |
02@02W2011 | 16@12W2415 |
1965-04-04 | 0861002220 |
00.00.0000 | 10.09.2239 |
CT-V-doesn't validate the date | 20-05e1804 |
~`!@#$%^&*()_+-= | 10!02$2159 |
DATE_YYYYMMDD_TOKEN
Send CT-V a valid date and it will create a valid token. Use different delimiters, and CT-V considers this totally different input. You’ll get a different token. Send CT-V a date in the incorrect format, and you’ll still get a valid token - for the format you should have send. Send CT-V an invalid date and it will create a valid token.
Valid Date | DATE_YYYYMMDD_TOKEN |
---|---|
1942-10-18 | 2316-09-06 |
1942@10W18 | 2421Q03W16 |
10.18.1942 | 1834804921 |
CT-V doesn't validate the date! | 2120o03n17 |
~`!@#$%^&*()-=_+ | 2425#03^09 |
Numeric Input
There are three formats that create tokens in the form of digits that require numeric input:
FIRST_SIX_LAST_FOUR_FAIL_LUHN_TOKEN: creates a token that retains the first 6 and last four original values, and fails the Luhn check.
FIXED_FIRST_TWO_LAST_FOUR_FAIL_LUHN_TOKEN: creates a token that replaces the first two digits with ones (1), retains the last four original values, and fails the Luhn check.
FIXED_TWENTY_LAST_FOUR_TOKEN: creates a token of 20 characters that retains the last 4 characters and replaces the rest of the value with a 16 digit value and fails the Luhn check. TOKEN column size must be 20, otherwise an exception is thrown.
You must note that:
Because all of these formats create values that must undergo a Luhn check, input must be numeric. Because if Luhn check occurs, there must be at least 10 digits in the input. FIRST_SIX_LAST_FOUR_FAIL_LUHN_TOKEN maintains 10 of the original digits, so it needs at least one more digit to force the Luhn check failure.
If the input includes letters, the JVM throws the following exception at run time: “Alpha characters not allowed when using the Luhn check.”
If the input does not include 10 digits, a minimum number of digits are required for the Luhn check, the JVM throws the following exception at run time:
“Input data must have at least 10 digits when the output token value must pass the Luhn check.”
Here are some examples of common and uncommon use-cases:
Plaintext values that are maintained are in bold
Plaintext Value | FIRST_SIX_LAST_FOUR_FAIL_LUHN_TOKEN |
---|---|
111122223333444455556666 | 111122880122130412306666 |
11#1222#333344445555666# | 11#1224#308437867993666# |
Special characters are ignored. They are kept when part of the retained characters, and they are skipped during the Luhn check.
Plaintext Value | FIXED_FIRST_TWO_LAST_FOUR_FAIL_LUHN_TOKEN |
---|---|
9988776644 | 1184716644 |
#98877#556644## | 110027#432044## |
Special characters are ignored. They are kept when part of the retained characters, and they are skipped during the Luhn check.
FIXED_TWENTY_LAST_FOUR_TOKEN
Given any number of digits, returns a random 20 digit value that keeps the last four digits and always fails the Luhn check. No alpha characters are allowed in the original value. There is no masking. Dashes and spaces used to format the original value are not maintained, unless they appear in the last four places. Special characters are converted to numbers, unless they appear in the last four places
The last four characters are kept and appear in the last four places:
Plaintext Value | FIXED_TWENTY_LAST_FOUR_TOKEN |
---|---|
777777000000 | 42427655476099080000 |
777 777 000-000 | 8675872662171826-000 |
:!@*& ^$%1335 335-335%% | 559527351925023035%% |
SHA2-Based Tokens
There are six formats that create tokens by applying SHA2 functions.
SHA2 refers to a set of cryptographic hash functions (SHA-224, SHA-256, SHA-384, SHA-512) designed by the National Security Agency (NSA) and published by the NIST as a U.S. Federal Information Processing Standard. SHA stands for Secure Hash Algorithm.
These formats rely on three existing baseline formats: SHA2_256, SHA2_384, and SHA2_512. With these tokens, each of these SHA2 functions produces results encoded in Base16 or in Base64, as follows:
SHA2_256_BASE16_TOKEN: internal format id: 16
SHA2_384_BASE16_TOKEN: internal format id: 17
SHA2_512_BASE16_TOKEN: internal format id:18
SHA2_256_ BASE64_ TOKEN: internal format id: 19
SHA2_384_ BASE64_ TOKEN: internal format id: 20
SHA2_512_ BASE64_ TOKEN: internal format id: 21
Requirements and Restrictions
- The Token vault TOKEN column must be large enough to accept the resulting token value, as described in the following table:
Token Format | Minimum TOKEN Column Size |
---|---|
SHA2_256_BASE16_TOKEN | 64 (32 bytes, Base16-encoded) |
SHA2_384_BASE16_TOKEN | 96 (48 bytes, Base16-encoded) |
SHA2_512_BASE16_TOKEN | 128 (64 bytes, Base16-encoded) |
SHA2_256_BASE64_TOKEN | 44 (32 bytes, Base64-encoded) |
SHA2_384_BASE64_TOKEN | 64 (48 bytes, Base64-encoded) |
SHA2_512_BASE64_TOKEN | 88 (64 bytes, Base64-encoded) |
- Use “Minimum Token Size” input value when configuring token vaults to specify token column size according to the table above.
Maximum Number of Tokens Supported for a Format
The formula for calculating the number of all possible tokens is 10^(number of random positions) based on the assumption that token created are of 16 characters in length:
RANDOM_TOKEN: 16 random positions – 10,000,000,000,000,000 possible tokens
RANDOM_ALPHANUMERIC_TOKEN: 16 random positions - 10,000,000,000,000,000 possible tokens
ALPHANUMERIC_TOKEN: 16 random positions - 10,000,000,000,000,000 possible tokens
LAST_FOUR_TOKEN: 12 random positions - 1,000,000,000,000 possible tokens
FIRST_SIX_TOKEN: 10 random positions - 10,000,000,000 possible tokens
FIRST_TWO_LAST_FOUR_TOKEN: 10 random positions - 10,000,000,000 possible tokens
FIRST_SIX_LAST_FOUR_TOKEN: 6 random positions - 1,000,000 possible tokens
FIXED_NINETEEN_TOKEN: 14 random positions - 100,000,000,000,000 possible tokens
FIXED_TWENTY_LAST_FOUR_TOKEN: 15 random positions plus one position allows only 9 digits in order to fail Luhn check - 1,000,000,000,000,000 x 9 = 9,000,000,000,000,000 possible tokens
FIRST_SIX_LAST_FOUR_FAIL_LUHN_TOKEN: 5 random positions plus one position allows only 9 digits in order to fail Luhn check - 100,000 x 9 = 900,000 possible tokens
FIXED_FIRST_TWO_LAST_FOUR_FAIL_LUHN_TOKEN: 9 random positions plus one position allows only 9 digits in order to fail Luhn check - 1,000,000,000 x 9 = 9,000,000,000 possible tokens
Overview of Sequential Tokens
When the sequential token format is used, the CT-V returns a token value based on a sequence maintained in the database. A sequential token is numeric, so any letters or special characters (including spaces and characters used for formatting) are ignored.
For example, the following plaintext/token combinations could occur:
Plaintext | Token |
---|---|
545454545 | 1111111111 |
575 757-575 | 1111111112 |
!@*& GGG-664 455-332%% | 1111111113 |
Because the CT-V relies on the sequence-managing abilities of the database, the steps for creating a token vault are slightly different, though the process is the same.