dataxform and Sparse Files
Sparse files are files in which storage space is allocated in file block addresses, into which data is written. Most Linux and AIX file systems do not allocate space for file blocks until the blocks are actually written. Thus, if an application creates a file and writes the first and 1000th blocks, the second through 999th blocks are represented as a hole in the file system’s data structures. When an application reads file blocks that have never been written, the file system returns zeros. Application reads from holes are thus indistinguishable from reads of file blocks that actually contain zeros. When an application writes data to file block addresses in the midst of a hole, the file system allocates storage space for the data, subdividing the hole into two smaller ones if necessary.
The CTE Agent and the dataxform utility cannot distinguish a file block that contains zeros from a hole—both return blocks containing zeros when read. When dataxform decrypts and re-encrypts such blocks and writes them back to their original locations, file systems allocate storage space for blocks that may previously have been holes. For large, mostly sparse files, this can result in run times and storage consumption that far exceed expectations based on pre-transformation file sizes. Therefore, dataxform provides administrators with two options for dealing with sparse files:
-
Recognize holes — A protected host administrator can configure dataxform to detect and bypass the processing of file blocks that contain all zeros. The result is that holes remain holes, and file blocks that contained zeros prior to transformation continue to contain zeroed after transformation. Application reads of either return decrypted zeroed, and applications’ first writes cause the file system to allocate storage and write encrypted data to them. See the
--preserve_sparse_files
option in the dataxform Examples and Full Command Syntax. -
Ignore holes — Alternatively, a protected host administrator can configure dataxform to ignore holes. The utility decrypts, re-encrypts, and rewrites all file blocks. Any file blocks that previously corresponded to holes have storage allocated for them, and thus, after transformation, files are no longer sparse. See the
--encrypt_sparse_file_holes
option in the dataxform Examples and Full Command Syntax.
In most cases, it is advantageous for the protected host administrator to configure dataxform to recognize holes, so that sparse files remain sparse. Failure to do so can result in longer than expected transformation times and posttransformation file sets that consume significantly more storage space than prior to transformation.