Adding a New DataNode to a CTE-protected HDFS
Use the following procedure to add a new DataNode to a CTE-protected HDFS. If not followed, HDFS encrypted files could be exposed in cleartext.
If you already have CTE installed on the cluster nodes before Ambari installs the Hadoop software, see Install CTE on the Cluster Nodes before Ambari Installs Hadoop.
-
Install the HDFS client on the host. This option is available in Ambari when adding a new DataNode to the cluster.
-
Add the new node to the CipherTrust Manager database and make sure that host/client settings of the new node is the same as existing nodes in the cluster. See the CTE Installation for Hadoop chapter in the CTE Installation and Configuration Guide.
-
Install CTE on the new node, register to CipherTrust Manager, and run
config-hadoop.sh
to prepare the libraries. See the Configuring Hadoop to use CTE section in the CTE Installation and Configuration Guide. -
Make sure that the data directories (specified in
dfs.datanode.data.dir
property) exist on the new node. They must have the same permission and ownership as the other existing nodes in the cluster. If necessary, create them. -
Add the host/client to the HDFS Host Group that is guarding the cluster. This is important: Do not rely on the DataNode to create the data directories as the data replication can occur before the GuardPoints are in effect.
-
Add the DataNode service to the new node. Again this option is available through Ambari.
-
If using Kerberos, check that the keytab files are created correctly.
-
Start the DataNode service on the new node.
-
Execute some
hdfs dfs
shell commands to ensure that encryption/decryption of data works correctly.
Install CTE on the Cluster Nodes before Ambari Installs Hadoop
If CTE is already installed on the cluster nodes before Ambari installs the Hadoop software, Ambari can mistakenly pick up the .sec
directory in configuration steps to store the HDFS data. Make sure the following properties do not contain the .sec
directory:
-
DataNode data directory
-
NameNode data directory
-
Secondary NameNode checkpoint directory
-
Zookeeper directory
-
yarn.nodemanager.local-dirs
-
yarn.nodemanager.log-dirs
-
yarn.timeline-service.leveldb-timeline-store.path
-
yarn.timeline-service.leveldb-state-store.path
This list is not exhaustive. Depending on the Hadoop ecosystem packages installed, there can be others.