DDC Architecture Guidelines
This guide describes the recommended modes of deployment for DDC. We are primarily looking from a security and performance aspect but also consider the amount of data flow between different components which can be important when considering on-prem vs cloud environments.
We will describe:
- General architecture and steps for scanning/generating reports
- Security aspects vs data flow
- Location of agents for scanning
- Agent permissions
- Performance aspects / time taken for scans
General Architecture
First, we will cover the general DDC architecture and data flow during scanning and report generation to understand the main components.
A DDC administrator performs the following actions in the CipherTrust Manager (CM) in order to get a report about what data DDC finds in requested data stores:
- Create a Classification Profile (CP) if existing CPs are not sufficient.
- Add one or more data stores.
- Schedule a scan using the required CP and one of the added data stores.
- Generate a report based on one or more scans.
CipherTrust and TDP Nodes
The CipherTrust Manager is an active-active cluster that can scale up to 12 nodes. Please note that at this time DDC can only operate on one node which we call the active node. Despite this temporary limitation, we still recommend at least 2 CM nodes to ensure that key and configuration information is replicated for H/A purposes. We recommend that the Thales Data Platform (TDP) cluster has 5 nodes (2 name nodes and 3 data nodes).
DDC stores scan results in TDP and generates reports from scan data. This results in significant communication between CM and TDP. Therefore, all nodes for CM and TDP should be within the same subnet.
Branch Locations
A branch location specifies a site where the data centers, file servers, databases and other data stores, that contain the data to scan, are located. You can think of branch locations as the locations where sensitive data will be scanned but not leave. For example, in the following figure, consider the “Madrid” branch. The DDC proxy agent located in this branch location, scans data in the data stores shown in the Madrid branch location but the actual data is never sent from the agent to the CipherTrust Manager.
A branch location has the following properties:
- Site Name
- Country
- State/Province
- City
- Description
You must choose a branch location when adding a data store. You will also need to specify a sensitivity level that should match the data that you are scanning.
We show branch locations not just from a security perspective but they have bearing in where to place agents when running specific scans since there will be significant data flow between the data stores and the agent (all data being scanned). We will discuss this in the next section and has particular relevance to scanning cloud-based data stores.
Data Flow Considerations and Where to Install the DDC Agents?
Depending on the type of data store and/or what part of the data store is being scanned, there is likely to be a significant amount of data travelling from the data store to the agent as the following figure shows.
However, once a scan is complete, the amount of data that the agent sends back to DDC/CM is not significant (this will be a maximum of 64 MB per scan). Note that a single DDC scan can be split into multiple scans (one per data store path).
Generally speaking, there are two places that you should install agents:
Inside the server to which the storage devices are attached if you are scanning local Windows or Linux storage.
- You can also install an agent on a DB server for performance reasons. Since data will not hit the network, the scan will be faster and more secure.
On a proxy server that sits on the same subnet as the data store.
- We recommend that if you cannot install an agent on every subnet, you should contact your corporate security team to ensure you do not break any corporate policy.
- Although DDC automatically selects the agent to use for a particular data store, please note you should block connectivity at network layer from agents you do not want DDC to select for security or performance reasons.
The goal is to minimize data flowing across networks. This both reduces network traffic but also limits the flow of sensitive data. Consider the following figure:
This figure shows the best deployment architecture for both local and cloud-based scans. In the “AWS use-west” branch location, we are scanning AWS S3 object storage. Since there can be a significant amount of data flowing between S3 and the DDC proxy agent, we recommend that you run the agent in the same cloud region to minimize data flow (cost and security reasons).
For on-prem storage you can use local and/or proxy agents depending on the type of storage being scanned.
System Requirements for Agents
This section describes requirements for the DDC agents. Actual requirements can depend on a number of factors from type of CPUs, amount of RAM, type of storage, amount of data being scanned and network connectivity between all components. Below, we list guidelines but recommend that you test in your own environment to best understand requirements.
CPU and Memory
This section describes requirements for local and proxy agents. DDCs scans are single-threaded and require less than 1GB of RAM. Agents do not launch concurrent Local Storage scans but can run several Proxy scans in parallel, up to one scan per Data Store being scanned.
If you want to install a VM to be just a DDC scanning agent doing proxy scans, the recommendation is to start with four cores and eight GBs of RAM, as the operating system requires some CPU and RAM itself. As the proxy agents need to receive the information to scan from the data store, the rate at which the data store is able to send the information or the network speed usually determine the scan duration, so please select a VM with fast network interfaces. CPU single-thread performance will only become the bottleneck if the scan contains many infotypes and the network is faster than what the agent can scan.
The recommendation for local agents installed on critical servers is to block the connectivity to other data stores to ensure DDC does not select it as a proxy agent to scan them, in order to minimize the CPU and RAM used by the DDC agent. Assuming the Data Store is able to feed the agent fast enough, the CPU single-thread performance will determine the scan duration.
In both scenarios, please monitor the agent resource consumption while the scans are running and ensure the agent has enough CPU, the system does not swap and proxy agents do not have the network usage at 100%.
Agent Permissions
For local scans, the DDC agents should be installed by/as an OS user with read-only permissions to access all the data store objects to scan.
Similarly, for remote scans we require the user that installed the agent to have read-only access to the remote data store.
ACL-Capable File Systems
For ACL capable file systems, you can set read and directory traversal permission to a DDC user by following the instructions below:
- Ensure all your partitions are mounted with the
acl
option - Execute
mount
and check your desired partition includesacl
in the list of options displayed in the last parenthesis
If acl
is not there:
- Edit
/etc/fstab
and addacl
to the options as shown below - Execute
mount –a
to remount all the partitions with thefstab
configuration changes
Ensure ACL management tools are installed on your system. Depending on your system:
- Debian:
apt-get install setfacl getfacl
- RedHat:
yum install setfacl getfacl
Change the default permission, so DDC user can access any new files and folders:
setfacl –R -m d:u:”DDC User”:rx /
Grant read access to any existing file to DDC user:
setfacl –R -m u:”DDC User”:rx /
If your file system does not support ACLs, please contact your system administrator to define the read access without breaking the existing configuration.