Data Transformation
This section provides an overview of the data transformation process. It covers the following information:
-
The Data Transformation Process
-
Data Transformation Techniques
-
Protection Policies
The Data Transformation Process
Data transformation is the process of transforming GuardPoint data from:
-
Plaintext (clear) to encrypted with a key
-
Encrypted with a key to plaintext
-
Encrypted with one (old) key to encrypted with another (new) key
Tip
Refer to CTE Agent Data Transformation for detailed instructions on how to use CTE on clients to transform GuardPoint data from clear text to encrypted text or from encrypted text to clear text.
This section covers the following information:
-
Uses of Data Transformation
-
How CTE Protects Files
-
Components of the CTE Solution
-
Properties of Data Transformation
Uses of Data Transformation
-
Initial Data transformation: Encrypting GuardPoint data for the first time.
-
Rekeying: Changing the encryption key for GuardPoint data, also called key rotation.
-
Reverse Transformation: Decrypting GuardPoint data to clear text (not a common procedure).
-
Convert: Non-LDT GuardPoints to LDT GuardPoints.
Note
Data transformation is complex and disruptive to data center operations. It is strongly recommended that you read and understand this section before you proceed to the initial data transformation and rekey.
How CTE Protects Files
The CTE Agent encrypts the data within a file one block at a time. It does not encrypt file metadata such a file’s name or size. This enables administrators to manage files without being able to view or modify their contents. Whether initially encrypting files, rekeying them, or decrypting them, the CTE Agent must therefore:
-
Read each block of file data to be transformed.
-
Transform the block by encrypting, decrypting, or rekeying it.
-
Write the transformed block, either to its original location, or to an alternate one.
Components of the CTE Solution
CTE protects data either at the file level or at the storage device level. CTE file-level protection consists of two main components:
-
CipherTrust Data Security Platform Service
An appliance that manages a database of the file sets protected by CTE GuardPoints, the encryption keys that protect them, policies that specify access rights and encryption protections that can be applied to GuardPoints. The CipherTrust Data Security Platform Service is also a central point for logging events related to accessing protected files.
-
CTE Agents
Software components that run on clients with file sets to be protected. A CTE Agent manages the files behind a GuardPoint by enforcing the policy associated with it, and communicates data access events to the CipherTrust Data Security Platform Service for logging.
A GuardPoint is usually associated with a Linux mount point or a Windows volume, but may also be associated with a directory subtree. The CTE Agent sits between applications and the file system that clients files within the GuardPoint. It intercepts every file access request, and enforces the access and encryption rules defined in the GuardPoint’s policy.
Properties of Data Transformation
For large file sets (hundreds of GBs or more), bulk transformation is time-consuming. Managing transformation time is important, because file set content must be frozen (inaccessible to applications) throughout the transformation process. Once transformation starts, it must continue until complete. So transformation time determines the window of data unavailability. Two major components contribute to transformation time:
-
Number of Blocks of File Data
Because CTE must read, transform, and rewrite each block of file data, this component can be estimated by multiplying the number of file blocks to be transformed by the average read, transformation, and write time for a block.
-
Number of Files
Because the CTE Agent transforms data file by file, each file must be "looked up," opened, and closed during transformation, using underlying file system mechanisms. This typically requires multiple disk accesses. Therefore, file sets that consist of many small files, per file overhead, can actually exceed file block transformation time.
Other factors, such as file system fragmentation, and load from concurrent applications, may also affect transformation time. Mainly, the number of blocks and number of files to be transformed are fundamental because they cannot be reduced or eliminated.
Data Transformation Techniques
Three methods to initially encrypt and rekey files are:
-
The Copy/Restore Method: Using the operating system file copy utility, the client administrator can copy unprotected files to a location protected by a CTE GuardPoint with a standard/production policy.
-
The CTE Dataxform Utility Method: Every CTE Agent includes a utility program that can encrypt or transform protected files. The
dataxform
utility encrypts, rekeys, or decrypts data in-place. Refer to the CTE Agent Data Transformation for details. -
The Live Data Transformation (LDT): The CipherTrust Data Security Platform Service Security Administrators can encrypt or rekey GuardPoint data without blocking the user or application access to that data. Refer to the CTE-Live Data Transformation with CipherTrust Data Security Platform Service for details.
These methods have advantages and limitations that make them suitable in different scenarios. These are discussed in the subsequent sections.
Apart from encrypting data, you can reverse the transformation, that is, decrypt the data to plaintext. To decrypt protected files, copy them to an unprotected location.
Note
CTE can also be configured to protect data at the disk level. For data protected in this way, only the copy transformation technique is available for encryption.
The Copy Method
Properties of the Copy Method
The copy method performs initial encryption, rekeying, and decryption by copying data from one directory, or GuardPoint, to another directory or GuardPoint.
-
Initial Encryption
The client administrator encrypts a file set by copying it to a directory protected by a CTE GuardPoint with a standard policy. Encryption is transparent to the copy utility.
-
Rekeying Protected Data
Encrypted files protected by a CTE GuardPoint are rekeyed by copying them to a directory protected by another GuardPoint with a different encryption key. Both decryption and re-encryption are transparent to copy utilities.
-
Decrypting Data by Copying
Decrypt a protected file set by copying files from their protected location to unprotected directories. The CTE Agent decrypts file blocks before delivery to the copy utility for rewriting.
Caution
If the governing policy does not authorize the copy utility user to access data, CTE delivers encrypted file blocks to it.
Advantages of the Copy Method
-
Simplicity
After an Agent is installed and GuardPoints are activated on a client, the client’s administrator can encrypt, decrypt, or rekey file sets simply by copying them from one location to another. There are no procedures to learn, and no requirements to coordinate with the CipherTrust Data Security Platform Service Security Administrator. Data transformation is simply another routine administrative task.
-
Recoverability
If a copy-based transformation is interrupted, for example, by a power failure or a system crash, the transformation resumes at or prior to the point of interruption. This is because the source files remain available and can be recopied, overwriting files at the destination that may have been only partially re-encrypted.
Limitations of the Copy Method
-
Storage Resource Consumption
Copying a file set requires that both source and destination files exist simultaneously. Storage capacity sufficient for both must be available during initial encryption. For very large protected data sets, "extra" temporary storage may be a significant expense. However, a greater concern is likely to be the impact of moving production file sets as they are transformed. File data is unprotected while in the copy utility’s buffers.
-
Impact on Operating Procedures
Original and copied file sets have different path names and/or network addresses. After transformation, either both file sets must be renamed (the old path to a new name, and the new path to the old name), or applications must be adapted to process the transformed data set at the new directory. For a small data center with a few protected file sets, some combination of these options is usually practical. For data centers with hundreds of protected file sets, the administrative complexity and consequent chance of error make copying a complex option.
The Restore Method
A variation of the copy method is to make a backup of the files for transformation and restore the backup to the destination location. This works because:
-
Backing up data causes it to be read and decrypted.
-
Restoring data causes it to be written (re-encrypting it with an alternative key).
-
CTE protection is transparent to backup programs.
This technique also creates a backup of the data set. However, a disadvantage is the time required to copy data twice (once from the source location to backup, and once from backup to destination location).
These considerations suggest that copying data to transform it is more suitable for initial encryption (and final decryption), and less so for rekeying. Additionally, the simplicity of recovering an interrupted transformation makes the copy/restore method useful in situations where the probability of interruption during transformation is significant.
The Dataxform Utility Method
The dataxform
utility transforms data-in-place and contains two components:
-
User mode that controls the overall operation
-
Kernel mode that transforms files block-by-block
Advantages of the Dataxform Utility
Transforming data in place has two advantages:
-
Minimal Storage Requirements
Because the
dataxform
utility transforms files in place, where they reside, it does not require temporary file storage. However, the utility does need storage in which to create a list of files for transformation. -
Security
The period of time that the data transformed by the
dataxform
utility appears in memory, outside the GuardPoint and therefore, unprotected, is shorter than with copying. This is significant for rekeying (compared to copying), which holds clear file data in memory between reading and rewriting. Moreover,dataxform
requires coordination between the client and CipherTrust Data Security Platform Service Security Administrators, so that no one individual can subvert security during transformation.
Limitations of the Dataxform Utility
Offsetting the advantages of the dataxform
utility method is the complexity of recovering from an interrupted dataxform
run. Because dataxform
transforms files in-place, data in a file undergoing transformation at the time of a failure may be only partly transformed. There is no way to determine which blocks have been transformed and which have not. These files must be recreated after the dataxform
utility runs from a backup copy. The client administrator must:
-
Determine (by examining the
dataxform
logs) which files may have been incompletely transformed. -
Delete them from the transformed file set.
-
Recreate them by selective copying from a backup.
Live Data Transformation
The CipherTrust Data Security Platform Service security administrators can encrypt or rekey the GuardPoint data without blocking the user or application access to that data.
After enabling GuardPoints, LDT performs initial encryption or rekeying in the background, unnoticed by users. The data stays live and available. This accelerates CTE deployments and eliminates the need to block application and user access to data during encryption or rekey operations, which can seriously inconvenience users and affect operational efficiency.
Phases of LDT runtime operation:
-
Initial data transformation starts or key expires
Live Data Transformation begins when an LDT policy is first applied to a GuardPoint, or when a current key version expires. The CipherTrust Data Security Platform Service pushes the new policy, or the notification of a key version change, to the clients that are protected by those policies.
-
New key version triggers a rekey on the affected GuardPoints
On each client, CTE determines which GuardPoints are using the key that has just rotated to a new version. CTE starts an LDT rekey on each of those GuardPoints. (If another rekey is already underway on that GuardPoint, the new rekey is rejected).
-
Scan for files
On each GuardPoint where CTE has started a rekey, LDT determines which files to transform. LDT takes inventory of files encrypted with earlier versions of the rotated key and makes a persistent list of the files for transformation. During this phase, the rekey status of the GuardPoint becomes starting, then scanning.
The scan phase might be interrupted, such as by a client reboot. In this case, when the client reboots and the GuardPoint is enabled again, the scan operation starts over from the beginning.
-
Rekey/Key Rotation
-
Each file, from the persistent list of files, is decrypted using the old version of the key. The old key is applied to each file and then re-encrypted using the new version of the key. Note that new files created during the LDT process do not need to be rekeyed, as they inherit the new version of the key. Multiple files and multiple regions of files are rekeyed simultaneously.
-
The LDT extended attribute of each file is updated.
-
The LDT rekey operation can be suspended and resumed manually, or through the QoS schedule.
This manages the impact LDT has on other applications and processes. During this phase, the rekey status of the GuardPoint is rekeying or suspended.
If system errors occur during rekeying, such as IO errors or crashes, LDT can manage and recover from them after the system error is fixed.
-
-
Finish
When all of the required files in the GuardPoint have been rekeyed, the system and storage resources used by LDT are released, except for the storage required for the extended attributes. Upon completion of rekey, the rekey status of the GuardPoint is rekeyed.
Summary
The following table summarizes the strengths and weaknesses of the two file set transformation methods.
Factor | Copy Method | The dataxform Utility Method | Live Data Transformation |
---|---|---|---|
Temporary storage required | Equal to size of file set. | Sufficient to hold a list of path names of files in file set. | Additional space is required to store LDT metadata only. |
Security | File data is unprotected while in copy utility’s buffers. | File data is never outside the CTE GuardPoint. | File data is never outside the CTE GuardPoint. |
Initial encryption | Files can be copied directly from source directory to a CTE-protected directory. | Files must be in a protected location before transformation. | Files must be in a protected location before transformation. |
Operational impact | No access to files during transformation. Path names or operating procedures must be adjusted after transformation. | No access to files during transformation. | No other impact on operating procedures. |
Recoverability | Restart copy operation at, or prior to, point of failure. | Files undergoing transformation at the point of failure must be discovered from the dataxform logs and restored from backup. |
Transformation process automatically restarts from the point of failure. |
Protection Policies
The basic unit of CTE data protection policy application is the GuardPoint. GuardPoints are typically associated with file system mount points, but may also be associated with directory sub-trees.
Note
Nested mount points within a directory, or mount points protected by a GuardPoint, are also protected in Linux environments.
All files in the directory hierarchy, below a GuardPoint, are subject to the GuardPoint’s policy, which consists of rules that specify:
-
Protected files: Filenames or filename patterns (example: *.dat) to which the policy applies.
-
Authorized users: User(s) group(s), and application(s) permitted to access the protected files.
-
Permissions: Actions permitted to users (example: create/delete, read/write, rename, decrypt).
Policies also specify the name of an encryption algorithm and a key for encrypting protected files. For example, a policy might specify that all Excel workbooks protected by a GuardPoint be encrypted using an AES256 key called EXCEL-KEY. Additionally, only users in group 128 have access to the files. All other files that are not encrypted, are freely accessible to all users.
Types of Policies
CTE Agents use two types of policies:
-
Initial Data Transformation
The data transformation policies contain the elements listed above, plus a data transformation key, used by the
dataxform
utility to rekey file data. Transformation policies contain strict access control rules that prevent application and user access to files during transformation. CTE only uses the data transformation policies for the initial transformation. Afterwards, you replace it with a production policy.The
dataxform
utility operates on a per-GuardPoint basis. For initial encryption, the data transformation policy specifies a clear production key (meaning that the utility does not decrypt data because the data is unencrypted) and a new data transformation key to encrypt the data. -
Production/Standard
Production policies contain the elements listed above. They protect data within GuardPoint(s) during day-to-day IT operations.
For decryption, the policy specifies a clear data transformation key (that is, the utility does not re-encrypt files as it rewrites them) and the current production key.
A rekeying transformation policy specifies both a current production ("old") key and a transformation ("new") key.