Overview
The Hadoop Distributed File System (HDFS) is a file system that supports large files and directory structures distributed across hundreds, or even thousands, of commodity DataNode hosts in a cluster. Previously, CTE could only protect directories and files on the local file system rather than the actual HDFS files and directories. Now, CTE can protect HDFS files and directories.
A CipherTrust Manager can:
-
Define an encryption policy for HDFS files and directories in HDFS name space.
-
Selectively encrypt HDFS folders with different keys providing multi-tenancy support.
-
Define user-based I/O access control rules for HDFS files in HDFS name space.
At the heart of an HDFS cluster is the NameNode that provides the framework to support a traditional hierarchical file and directory organization. The NameNode is a master server that manages the HDFS name space and regulates access to files by clients.
HDFS files are split into one or more data blocks that are distributed across DataNode hosts in a cluster. The NameNode maintains the namespace tree and the mapping of data blocks to DataNodes. To deploy CTE, install CTE on all of the NameNode and DataNode hosts in a cluster.
Overview of CTE on HDFS
The following sections list the high-level steps for implementing CTE protection on your HDFS. The process requires that the HDFS Administrator and CipherTrust Manager to work in tandem to complete separate tasks.
You can keep the HDFS cluster alive a active if you enable HDFS data replication and activate the nodes individually. Following are the high-level steps:
HDFS Administrator
-
Compile a list of directories specified by
dfs.datanode.data.dir
. If these directories do not already exist in the NameNode local file system, create them. -
Pass the directory list to the CipherTrust Manager.
-
Ask CipherTrust Manager to:
-
Add the NameNode to the HDFS host/client group.
-
Create a GuardPoint for the HDFS host/client group on each of these directories.
-
Administrator
-
Create an HDFS host/client group to contain the HDFS nodes.
-
Create a host/client group GuardPoint on each of the datanode directories obtained from the previous step.
-
Add the NameNode to the HDFS host/client group.
HDFS Administrator
-
For each DataNode, take the node offline and perform a data transformation.
-
Ask the Administrator to add the DataNode to the host group. (After the DataNode is added to the host group, it can be brought online.)
-
Repeat this process until all of the nodes have been encrypted and added to the HDFS Host Group.
-
Modify the host group policies to protect specific HDFS files and directories, as needed.