Ambari Additional Configuration

Livy configuration

To execute the instructions below you will need to access Ambari, please refer to Accessing Ambari for further information.

Update Livy configuration for Spark2

In the Ambari toolbar on the left, expand Services, then click Spark2.
Select the CONFIGS tab, then below it click ADVANCED.
Expand the Custom spark2-defaults section, then click Add Property.... In the Add Property popup, click the "multiple tags" icon to enable the Bulk property add mode. Then enter the following text, replacing <zookeeper-node-hostname> with the zookeper node IP.
```
spark.yarn.appMasterEnv.ZK_URL_DDC = <zookeeper-node-hostname>:2181
```
Expand the Custom livy2-conf section, then click Add Property.... In the Add Property popup, click the "multiple tags" icon to enable the Bulk property add mode. Enter the following text.
```
livy.server.session.state-retain.sec = 24h
```
Expand the Advanced livy2-conf section. Update the following entry:
- livy.server.csrf_protection.enabled: false
Click SAVE and then restart Spark2.
At the top of the screen it will tell you that a restart is required and there is an orange RESTART button. Click that button and select Restart All Affected.

Update Livy configuration for Knox

In the Ambari toolbar on the left, expand Services, then click Knox.
Select the CONFIGS tab.
Expand the Advanced topology section.

For Spark/Livy configuration, add this entry one line before </topology>:

For single node Spark2 Server:

<service>
    <role>LIVYSERVER</role>
    <url>http://<Livy-node>:8999</url>
</service>

For multiple node Spark2 Servers:

<service>
    <role>LIVYSERVER</role>
    <url>http://<Livy-node1>:8999</url>
    <url>http://<Livy-node2>:8999</url>
    <url>http://<Livy-node3>:8999</url>
    ...
</service>

Click SAVE then restart Knox.
At the top of the screen it will tell you that a restart is required and there is an orange RESTART button. Click that button and select Restart All Affected.

Update Zookeeper configuration for Hbase

In the Ambari toolbar on the left, expand Services, then click Hbase.
Select the CONFIGS tab, then below it click ADVANCED.
Expand the Advanced hbase-site section. Update the following entry:
- ZooKeeper Znode Parent: /hbase
Click SAVE then restart all affected components.
At the top of the screen it will tell you that a restart is required and there is an orange RESTART button. Click that button and select Restart All Affected.

Updating HDFS folder permissions

SSH to the TDP instance and log in as root.
Switch to the hdfs user, who has permissions to create and destroy folders:
```
su - hdfs
```
Assign the ownership of the /user folder to the hdfs user and ensure that no other user can create subfolders:
```
hdfs dfs -chown hdfs:hdfs /user
hdfs dfs -chmod 755 /user
```
Check if the /user/admin folder exists:
```
hdfs dfs -ls /user/admin
```
If the folder does not exist, create it:
```
hdfs dfs -mkdir /user/admin
```
Repair the folder permissions:
```
hdfs dfs -chmod 755 /user/admin
```
Assign the folder ownership to the admin user:
```
hdfs dfs -chown -R admin:hdfs /user/admin
```
After entering the last command, enter exit to return to the root prompt.

Note

In the above steps, replace 'admin' with the username configured in the Kylo Connection Manager for accessing your TDP cluster.

You can optionally use the user interface to browse through HDFS directories. Please refer to Browsing HDFS via User Interface

Updating HBase Site Configuration in Spark

To update the HBase site configuration copy the hbase-site.xml to Spark.

Run the following command on Hbase Master node:

cp /etc/hbase/3.1.5.0-316/0/hbase-site.xml /etc/spark2/3.1.5.1-316/0/

Also run the following command on all nodes where Hbase Master server and regionservers are:

cp /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/

Spark Tuning

Spark can be configured by adjusting some properties via Ambari. There are plenty of properties that configure every aspect of the Spark behavior on the Spark official documentation, but this section only just covers some of the most important properties.

Property Name	Default value	Purpose
spark.driver.cores	1	Number of cores to use for the driver process, only in cluster mode.
spark.driver.memory	1 GB	Amount of memory to use for the driver process.
spark.executor.memory	1 GB	Amount of memory to use per executor process, in MiB unless otherwise specified.
spark.executor.cores	1 in Yarn mode	The number of cores to use on each executor. In standalone and Mesos coarse-grained modes, for more detail, see this description.
spark.task.cpus	1	Number of cores to allocate for each task.
spark.executor.instances	2	The number of executors for static allocation. With spark.dynamicAllocation.enabled, the initial set of executors will be at least this large.

Caution

The following instructions are recommended if you have at least 8 CPU / 32GB RAM on board per cluster node.

To increase the resources dedicated to the Spark jobs, you will need to access Ambari, please refer to Accessing Ambari for further information.

In the Ambari toolbar on the left, expand Services, then click Spark2.
Select the CONFIGS tab, then below it click ADVANCED.
Expand Custom spark2-defaults and then click Add Property....

Add the following properties in the text box:

spark.driver.cores=3
spark.driver.memory=3g
spark.executor.cores=3
spark.executor.memory=3g
spark.executor.instances=3

Ambari Additional Configuration

Livy configuration

Update Livy configuration for Spark2

Update Livy configuration for Knox

Update Zookeeper configuration for Hbase

Updating HDFS folder permissions

Updating HBase Site Configuration in Spark

Spark Tuning

On this page

Suggest A Change