HDFS Upgrade with CTE
To upgrade Hadoop, configure CTE to integrate with the new HDFS instance.
Upgrading one node at a time
Once CTE is installed and configured on the node:
-
Make sure that HDFS services are shut down on the node.
-
Upgrade Hadoop.
-
Type:
/opt/vormetric/DataSecurityExpert/agent/secfs/hadoop/bin/config-hadoop.sh -i -y
-
Start HDFS services on the node.
Upgrade CTE with LDT in an HDFS Cluster
If you are using LDT with HDFS cluster, follow these steps when upgrading CTE, in order to maintain your LDT GuardPoints.
-
Suspend rekey on all data nodes.
-
Shutdown your name nodes/data nodes.
-
Upgrade CTE in the name node first.
Note
Always upgrade name nodes before data nodes.
-
After CTE upgrade succeeds, type:
config-hadoop.sh -i -y
-
On the Ambari admin console, start the name node.
-
Verify that the CipherTrust java process successfully launched in the name node. (You should not see an error message.) Type:
ps -ef | grep java | grep vormetric
-
Check the CipherTrust Manager status. It should show LDT rekeyed status.
-
Check the name node status. It should display the GuardPoint status and match the state before upgrade. Type:
secfsd -status guard GuardPoint Policy Type ConfigState Status Reason ---------- ------ ---- ----------- ------ ------ /hadoop/hdfs/data LDT_HDFS_Sanity local guarded guarded N/A
-
Repeat the above steps for all of the data nodes in the HDFS cluster.
Rolling Upgrades
Hortonworks Data Platform has introduced rolling upgrades to automate the Hadoop upgrade process (http://bit.ly/2pQrFo3
). The upgrade process is controlled by the Upgrade Pack (http://bit.ly/2rkutvF
) that is predefined and certified by Hortonworks.
To integrate CTE with the upgrade, you need to temporarily change the Ambari scripts before performing the rolling upgrades and then restore the scripts after the upgrades.
-
On Ambari server machine, type:
cd /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts
-
Copy the utils.py file, type:
cp utils.py utils.py.org
-
Using a text editor, add the following commands to
utils.py
:if action == "start": if name == "namenode" or name == "datanode": Execute(format("secfs/hadoop/bin/c onfig-hadoop.sh -i -h {hadoop_bin}/../ -j <java home> -p hdp -d", not_if=service_is_up, user=params.root_user) # For Redhat 6.x, uncomment the following command # Execute(format("/etc/init.d/secfs secfsd restart"), not_if=service_is_up, user=params.root_user) # For Redhat 7.x, uncomment the following command # Execute(format("/secfs restart"), not_if=service_is_up, user=params.root_user) before Execute(daemon_cmd, not_if=service_is_up, environment=hadoop_env_exports The Java home of your HDFS instance should be used to replace <java home>: if action == "start": if name == "namenode" or name == "datanode": Execute(format("secfs/hadoop/bin/config-hadoop.sh -i -h {hadoop_bin} /../ -j <java home> -p hdp -d"), not_if=service_is_up, user=params.root_user) # For Redhat 6.x, uncomment the following command # Execute(format("/etc/init.d/secfs secfsd restart"), not_if=service_is_up, user=params.root_user) # For Redhat 7.x, uncomment the following command # Execute(format("/secfs restart"), not_if=service_is_up, user=params.root_user) Execute(daemon_cmd, not_if=service_is_up, environment=hadoop_env_exports)
-
Type:
ambari-server restart
-
Perform rolling upgrades.
-
During the upgrade process, many of the intermediate service status checks can fail. Skip over them by clicking on Proceed to Upgrade.
-
Click Finalize to complete the upgrade. If the active NameNode fails to activate due to the incompatible HDFS layout version, manually start the NameNode with '
-upgrade
' option to correct the layout version file./var/lib/ambari-server/ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode -upgrade'
-
If there are excessive under-replicated blocks, run the following command to isolate them and manually start the replication:
su - <$hdfs_user> # hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files # for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done
-
Restart the HDFS services. Wait for the replication to complete and the NameNodes to exit safe mode.
-
When Hbase is restarted after upgrades, it tries to rename from:
/apps/hbase/data/.tmp/data/hbase/namespace
to:/apps/hbase/data/data/hbase/namespace
, which may cause key conflict if the GuardPoint is set incorrectly (for example,/apps/hbase/data/data
is guarded, but not/apps/hbase/data/.tmp
). This results in Hbase shutting down.Before re-starting Hbase, make sure that the GuardPoint policies on the Hbase files are set correctly to cover all Hbase-related files. A broader GuardPoint (
/apps/hbase/data
instead of just/apps/hbase/data/data
and other folders) could fix this issue. -
Check cluster upgrade by verifying the
hadoop version
. -
Run a few map reduce jobs and Hbase commands to make sure that the entire Hadoop stack is working properly.
-
Rename
utils.py.org
toutils.py