Using dataxform_status Files
Several dataxform_status_*
files are used to run dataxform. One file you generate. The other files are generated by dataxform to track the dataxform session. Each dataxform_status
file contains specialized records of the last or current dataxform session. The files are placed in /var/log/vormetric
by default, but you can specify an alternate location using the --dir_recovery
argument to dataxform.
The files are:
where, gp
is the full path to the GuardPoint. The slash (/) and backslash () path delimiters are replaced with underscores (_).
where /opt/apps/lib/dx1
and /opt/apps/lib/dx2/aa_dir
are the actual GuardPoint paths.
The format and use of each file are described below.
dataxform_status-_gp
and dataxform_status_alt-_gp
are used together to monitor data file processing. These status files indicate the threads that are started, the threads that are running, their status, and sequence.
For a dataxform failure or interruption, the dataxform_status-_gp
and dataxform_status_alt-_gp
files can be used to guesstimate the files that have been completed and the files that still must be transformed. Some manual verification will be needed to verify precisely the files that have and have not been transformed. You can use this information to create a file list and manually resume the dataxform session, or, you can just leave these status files in place and resume automatic dataxform. Automatic dataxform will determine where to resume.
The structure of the dataxform_status-_gp
and dataxform_status_alt-_gp
files is:
where:
-
version=n
is an internally used version number and can be ignored. -
status
is the current dataxform processing status. Status can bedone
,in-progress
,stopped
, orundone
. Anin-progress
status indicates that dataxform is still processing data files. Do not work in, or access, the GuardPoint while dataxform is stillin-progress
. Otherwise, you risk corrupting data. Astopped
status indicates that the last dataxform session was interrupted before it could complete. Thedone
status means that dataxform completed. Note that, though dataxform completes, not all files are necessarily transformed. Check the log files for data files that have not been transformed.undone
indicates that rekey from a file list had been performed. -
operation=action
is the operation specified on the dataxform command line. It can be eitherrekey
orrekey_list
. Automatic dataxform is implicitly configured to operate withrekey
. -
current=file
is the full path name of the file currently being processed. Whenstatus
isdone
, thecurrent
parameter is blank. In every other case,current
will be set to some value, such as the default unset value-1
. -
n in-progress files
is the total number of files being concurrently processed bydataxform
. Each file is transformed as a separate sub-process, or thread.n
is the value of the--thd
paramerter passed to the dataxform command, and it can be between 1 and 32. The default number of threads is 8 or the number of CPUs, whichever is less. For specifics on the-thd
parameter option, see dataxform Examples and Full Command Syntax. -
seqno=n
is the sequence number. There are two status files,dataxform_status-_gp
anddataxform_status_alt-_gp
. The dataxform utility writes status information to one file. When a sub-process completes and a new data file is opened for transformation, the current status file is closed, and the other status file is opened with the new data file as thecurrent=file
file. Theseqno
number increments each time dataxform begins to process the next data file. If a dataxform session fails or is interrupted, check theseqno
number. Use the status file with the higherseqno
number to determine the files that were being processed when dataxform stopped. The status file with the lowerseqno
number indicates the last fully processed data file in thecurrent=file
parameter. The status file with the higherseqno
number indicates the last opened data file in thecurrent=file
parameter. Most likely the last opened data file was not successfully transformed. -
hmac=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
is a check value used to ensure the integrity of the status files, and it can be ignored.
By default, the log files are generated by dataxform during manual and automatic data transformation and placed in /var/log/vormetric
on Linux systems. An alternate location can be specified using the --dir_recovery
argument to dataxform
.
A successful dataxform status file looks like:
Note that the status
is done
, there is no current
file, and there are 0 files being processed. The seqno
number is the number of files in the GuardPoint. There is no corresponding dataxform_status_alt-_gp
file because that file is deleted when dataxform completes successfully. Also, look for a dataxform_status_skip-_gp
file to see what files in the GuardPoint, if any, were detected but not transformed.
The dataxform_status-_gp
and dataxform_status_alt-_gp
files for a dataxform session are shown below. Some things to note:
-
The number that precedes every file path in the status file is the offset into the dataxform file list. It is used to quickly locate entries if dataxform needs to be restarted. This number of can be ignored.
-
The dataxform utility was running 8 sub-processes, as shown in the
8 in-process files
field. The 8 files being processed are shown beneath this entry. -
The
seqno
field indicates thatdataxform_status_alt-_Guard
is the true snapshot of what dataxform was doing while the files were captured or dataxform failed because it has the higher sequence number. The other status file,dataxform_status-_Guard
, is older because it has the lowerseqno
number.This means that all of the files shown in
dataxform_status_alt-_Guard
were being processed when dataxform stopped. -
The final line with the prefix
hmac=
is a checksum field. This is used to verify that the file content is intact should the transform be interrupted and need to be restarted.All data in the status files after this line are 'noise'. This occurs for efficiency reasons. The dataxform program writes the status files using a simple block write of all the status information into the existing file, overwriting any entries already present. If, as often happens, the existing status file is larger than the data being written, then some of the stale entries may be visible after the final checksum line. The additional effort of removing this stale data for each file transformation would add a significant additional amount of time to the overall transformation, so this 'noise' is simply left and can be ignored.