HDFS Upgrade with CTE

To upgrade Hadoop, configure CTE to integrate with the new HDFS instance.

Upgrading one node at a time

Once CTE is installed and configured on the node:

Make sure that HDFS services are shut down on the node.
Upgrade Hadoop.

Type:

/opt/vormetric/DataSecurityExpert/agent/secfs/hadoop/bin/config-hadoop.sh -i -y

Start HDFS services on the node.

Upgrade CTE with LDT in an HDFS Cluster

If you are using LDT with HDFS cluster, follow these steps when upgrading CTE, in order to maintain your LDT GuardPoints.

Suspend rekey on all data nodes.
Shutdown your name nodes/data nodes.
Upgrade CTE in the name node first.

Note

Always upgrade name nodes before data nodes.
After CTE upgrade succeeds, type:
```
 config-hadoop.sh -i -y
```
On the Ambari admin console, start the name node.
Verify that the CipherTrust java process successfully launched in the name node. (You should not see an error message.) Type:
```
 ps -ef | grep java | grep vormetric
```
Check the CipherTrust Manager status. It should show LDT rekeyed status.

Check the name node status. It should display the GuardPoint status and match the state before upgrade. Type:

 secfsd -status guard
GuardPoint         Policy           Type   ConfigState  Status   Reason
----------         ------           ----   -----------  ------   ------
/hadoop/hdfs/data  LDT_HDFS_Sanity  local  guarded     guarded   N/A

Repeat the above steps for all of the data nodes in the HDFS cluster.

Rolling Upgrades

Hortonworks Data Platform has introduced rolling upgrades to automate the Hadoop upgrade process (http://bit.ly/2pQrFo3). The upgrade process is controlled by the Upgrade Pack (http://bit.ly/2rkutvF) that is predefined and certified by Hortonworks.

To integrate CTE with the upgrade, you need to temporarily change the Ambari scripts before performing the rolling upgrades and then restore the scripts after the upgrades.

On Ambari server machine, type:

 cd /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts

Copy the utils.py file, type:
```
 cp utils.py utils.py.org
```

Using a text editor, add the following commands to utils.py:

if action == "start":
if name == "namenode" or name == "datanode":
Execute(format("secfs/hadoop/bin/c onfig-hadoop.sh
-i -h {hadoop_bin}/../ -j <java home> -p hdp -d", not_if=service_is_up,
user=params.root_user)
# For Redhat 6.x, uncomment the following command
# Execute(format("/etc/init.d/secfs secfsd restart"), 
not_if=service_is_up, user=params.root_user)
# For Redhat 7.x, uncomment the following command
# Execute(format("/secfs restart"), not_if=service_is_up, 
user=params.root_user) 
before
Execute(daemon_cmd, not_if=service_is_up, environment=hadoop_env_exports
The Java home of your HDFS instance should be used to replace <java home>:
if action == "start":   if name == "namenode" or name == "datanode":
Execute(format("secfs/hadoop/bin/config-hadoop.sh -i -h {hadoop_bin}
/../ -j <java home> -p hdp -d"), not_if=service_is_up,
user=params.root_user) 
# For Redhat 6.x, uncomment the following command 
# Execute(format("/etc/init.d/secfs secfsd restart"), 
not_if=service_is_up, user=params.root_user) 
# For Redhat 7.x, uncomment the following command 
# Execute(format("/secfs restart"), not_if=service_is_up,
user=params.root_user)
Execute(daemon_cmd, 
    not_if=service_is_up,
    environment=hadoop_env_exports)

Type:
```
ambari-server restart
```
Perform rolling upgrades.
During the upgrade process, many of the intermediate service status checks can fail. Skip over them by clicking on Proceed to Upgrade.

Click Finalize to complete the upgrade. If the active NameNode fails to activate due to the incompatible HDFS layout version, manually start the NameNode with '-upgrade' option to correct the layout version file.

 /var/lib/ambari-server/ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode -upgrade'

If there are excessive under-replicated blocks, run the following command to isolate them and manually start the replication:

 su - <$hdfs_user>
# hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files
# for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile  :" ;  hadoop fs -setrep 3 $hdfsfile; done

Restart the HDFS services. Wait for the replication to complete and the NameNodes to exit safe mode.
When Hbase is restarted after upgrades, it tries to rename from: /apps/hbase/data/.tmp/data/hbase/namespace to: /apps/hbase/data/data/hbase/namespace, which may cause key conflict if the GuardPoint is set incorrectly (for example, /apps/hbase/data/data is guarded, but not /apps/hbase/data/.tmp). This results in Hbase shutting down.

Before re-starting Hbase, make sure that the GuardPoint policies on the Hbase files are set correctly to cover all Hbase-related files. A broader GuardPoint (/apps/hbase/data instead of just /apps/hbase/data/data and other folders) could fix this issue.
Check cluster upgrade by verifying the hadoop version.
Run a few map reduce jobs and Hbase commands to make sure that the entire Hadoop stack is working properly.
Rename utils.py.org to utils.py

Suggest A Change