dataxform Execution Time
During transformation, a file set must remain static. Therefore, it is inaccessible to users and applications. Once started, transformation must complete before admins can permit applications access to the transformed files. This includes any restarts and manual recovery of incorrectly transformed files. In effect, the transformation process determines the duration of the outage. There are two elements to consider when choosing a window of time during which transforming a data set does not adversely affect the business function it supports:
-
Length of run — The run time, assuming that the run is problem-free.
-
Success of run — Maximizing the chance of success (and therefore minimizing the need for time-consuming manual recovery).
Length of Run
The dataxform utility includes a dry run capability that uses a combination of sampling and calculation to estimate the duration of a problem-free run against a given file set. A dry run can execute while data is online, however, this results in less accurate runtime estimates. It counts both the number of files in the set and the amount of data they contain. It also performs some sample transformations of dummy files to estimate how long an actual transformation would run. The result of a dry run is an estimate of dataxform run time. In most cases, however, the estimate is conservative, provided that other system activity is minimal during the actual transformation. See Estimating the dataxform Runtime Period.
Success of Run
To maximize the chances of successful transformation, the protected host administrator should ensure that the required resources will be available during the run:
-
Storage space — See dataxform Space Requirements.
-
Kernel threads — The utility uses kernel threads for actual file data transformation. The protected host administrator can specify as many as 512 concurrent file and data chunk transformation threads. The admin must also ensure that sufficient kernel threads are pre-configured in the operating system (usually at system startup time) so that the specification can be met in the presence of other system activity occurring during transformation. See Multithreading in the dataxform Utility.
-
Processing power and I/O bandwidth — You can maximize the power and bandwidth by limiting other system activity during transformation. I/O bandwidth, particularly disk accesses, is especially important for sets containing large numbers of files, because each file must be located and opened (both of which require disk accesses) in addition to having its data read and overwritten.
Notes
- Minimizing dataxform run time should be a priority because once the utility starts, it must transform the entire data set before users and applications can access it again.
- Do not stop dataxform after it is started, as this may cause issues.
- Make sure that files are not open before starting dataxform. If a file is busy, data might become corrupted.
- Minimizing dataxform run time should be a priority because once the utility starts, it must transform the entire data set before users and applications can access it again.