Troubleshooting
Common DFS(R) Configuration Mistakes
DFS(R) uses a staging area quota when processing replication tasks. If the allocated space is too small, performance is negatively impacted. An improperly sized staging area may also cause a replication loop among downstream nodes. Due to the intensive nature of encryption processing DFS data, you should evaluate your environment readiness. For more information about proper sizing techniques, see How to Determine the Minimum Staging Area DFSR Needs for a Replicated Folder.
Review the following configuration mistakes as part of site readiness preparations:
Improper or Untested Seeding
To save downtime, administrators may choose to pre-seed new member replicas with DFS data before configuring all of the members in a replication group. In this manner, the initial synch consists of mostly delta changes rather than entire object transactions. See Replacing DFSR Member Hardware or OS (Part 2: Pre-seeding) for information on the advantages of pre-seeding.
Common issues include:
- ACL mismatch between source and target
- Changes were made to the files after they were copied to the new member
- No UAT testing was done to verify the pre-seeding process worked as expected
High DFS(R) Backlog
Customer DFS(R) deployments should be relatively up-to-date in replicating files across multiple nodes. High backlogs, especially over an extended period of time, mean that considerable amounts of data is out-of-sync. Unwanted conflict resolution may occur during these periods. Introducing encryption in this scenario would severely degrade performance and, likely require troubleshooting.
Hub Node – Single Point of Failure
The DFS primary active node is a single-point-of-failure in a Hub-and-Spoke topology. Fortunately, if the hub server goes down, all spoke servers retain the last-known-good data. All local changes are recorded locally and will not replicate until the primary node is back online. Spoke servers retain their own copies of their deltas and do not share them with others in the topology. You can confirm this risk, and confirm that offline backups are taken nightly to ensure optimal RTO (Recovery Time Objective).
Jet Database
DFS(R) maintains one Jet database per volume. As a result, placing all of your replicated folders on the same volume puts them all in the same Jet database. If that Jet database has a problem that requires repair or recovery of the database, all of the replicated folders on that drive are affected. It is better to spread replicated folders around, using as many drives as possible to provide maximum uptime for the data.
Windows Server patch-level
Prepare for the deployment by checking for the latest software update for their DFS(R) servers. For replication, always make sure DFS(R) and NTFS are at least at the latest version listed. Proactively patching the DFS(R) servers is advisable, even if everything is running normally, as it will prevent your servers from being affected by a known issue.
DFS(R) as Backup
DFS(R) is not a bona fide backup solution. To be fully protected, customers must backup their data offline. DFS(R) was not designed as a backup solution. One of DFS(R)’s design goals is to be part of an enterprise backup strategy in that it gets your geographically distributed data to a centralized site for backup, restoration and archiving. Multiple members do offer protection from server failure; however, this does not protect your data from accidental deletions. Encrypting customer data should never occur without a full backup.
Stopping DFS(R) Replication
Sometimes, you may need to temporarily stop replication. Changing the replication status has consequences. The proper method is to set the schedule to no replication for the Replication Group in question. The DFS(R) service must be running to be able to read updates in the journal. Additionally, do not stop the DFS(R) service for long periods of time (days, weeks). Doing so may cause a journal wrap to occur (if many files are modified, added, or deleted in the meantime). DFS(R) will recover from the journal wrap, but in large deployments, this takes a long time and replication does not occur, or happens very slowly, during the journal wrap recovery. Monitor and prepare the environment prior to encryption.
File System Policies
Do not configure file system policies on replicated folders. The file system policy reapplies NTFS permissions at every Group Policy refresh interval. This can result in sharing violations because an open file does not replicate until the file is closed.
Backup Software
Having DFS data on other servers helps protect the data against a catastrophic failure, but does nothing to protect against data corruption. If a file becomes corrupted, the corruption gets replicated to other targets. Because the data should be identical on each DFS replica, backing up only one of the replicas is usually sufficient. Thales recommends that you backup at least the primary active node or hub.
Another important consideration regarding the backup process is that it is very critical to configure the backup software to not update the archival bit. The reason for this is that file replication is triggered by file version change or a modified time stamp. Therefore, there is a chance that updating the archive bit may cause issues that trigger a replication storm.
Troubleshooting DFS(R)
Unexpected problems may arise while encrypting replicated DFS data. Following are some of the common tasks recommended when configuring CipherTrust Transparent Encryption with DFS(R).
Encrypted Files Under DFSRPrivate Folder
In situations where the LDT Exclusion Registry key was deleted, and the LDT rekeyed files are under the DFSRPrivate folder, following are the steps to reverse this.
On all nodes:
-
Stop DFS(R) service first on both nodes.
-
Add
LDTExclusionGPList
tovmmgmt/Parameters
with the path toDFSRPrivate
if not already there. -
Disable the
DFSRPrivate
GuardPoint. -
Open the command line as an administrator.
-
Copy the
DFSRPrivate
directory to a temporary backup, type:xcopy "E:\DFSRTest\DFSRPrivate" c:\backupOfDFSRPrivate /E /Y /H /Q /O /K
-
Delete all files in the
DFSRPrivate
, type:del /S /Q E:\DFSRTest\DFSRPrivate
-
Guard
DFSRPrivate
again. -
Test replication of new files.
Double Encryption
A common cause for data corruption is that the data may have become double-encrypted. This can happen if existing encrypted data is written into a GuardPoint because it gets encrypted a second time. You can check for this by copying the data out of the GuardPoint into a clear location with a user who has 'apply_key' rights. Next, mount a GuardPoint on top of the copied data in the clear location using the same policy as the original GuardPoint. If the data then becomes viewable inside that newly mounted GuardPoint, this means that the data was double-encrypted.
To recover the data:
-
Copy all of the double encrypted data into a clear location.
-
Disable the original GuardPoint.
-
Copy the data back into the original location that is now unguarded.
-
Re-guard the original GuardPoint. Data should now be viewable in the original GuardPoint.
Logs
Always download and parse the domain logs before doing anything else. Note the timestamp of certain DFS(R)-related error messages and compare them against similar timestamped log entries on the DFS servers. You may also run agenthealth
on the DFS server to gather more extensive detail of the domain information.
The agentinfo
support collection script resides in one of the following paths on systems where CTE agent is installed, depending on version:
C:\program files\vormetric\DataSecurityExpert\agent\vmd\bin
C:\program files\vormetric\DataSecurityExpert\agent\shared\bin
The customers DFS environment may detail a lot of issues related to encrypting data, especially in relation to replicated, malformed or open handled data.
Different Keys for Different Folders
You may require that some folders be encrypted with a different key from others, perhaps due to a required SLA. Fencing the data is an excellent method for keeping different enterprise data separate. Normally, DFS nests target folders under one namespace. To encrypt the data, you would use a single key applied to the operational policy. However, if you want to use many keys, target folders must exist under a separate namespace. Windows 2012 and subsequent versions support multiple namespaces. Plan accordingly, allocating resources where necessary and staging target folders appropriately. This is also known as root scalability.