Ambari Additional Configuration
Livy configuration
To execute the instructions below you will need to access Ambari, please refer to Accessing Ambari for further information.
Update Livy configuration for Spark2
In the Ambari toolbar on the left, expand Services, then click Spark2.
Select the CONFIGS tab, then below it click ADVANCED.
Expand the Custom spark2-defaults section, then click Add Property.... In the Add Property popup, click the "multiple tags" icon to enable the Bulk property add mode. Then enter the following text, replacing
<zookeeper-node-hostname>
with the zookeper node IP.spark.yarn.appMasterEnv.ZK_URL_DDC = <zookeeper-node-hostname>:2181
Expand the Custom livy2-conf section, then click Add Property.... In the Add Property popup, click the "multiple tags" icon to enable the Bulk property add mode. Enter the following text.
livy.server.session.state-retain.sec = 24h
Expand the Advanced livy2-conf section. Update the following entry:
- livy.server.csrf_protection.enabled:
false
- livy.server.csrf_protection.enabled:
Click SAVE and then restart Spark2.
At the top of the screen it will tell you that a restart is required and there is an orange RESTART button. Click that button and select Restart All Affected.
Update Livy configuration for Knox
In the Ambari toolbar on the left, expand Services, then click Knox.
Select the CONFIGS tab.
Expand the Advanced topology section.
For Spark/Livy configuration, add this entry one line before
</topology>
:For single node Spark2 Server:
<service> <role>LIVYSERVER</role> <url>http://<Livy-node>:8999</url> </service>
For multiple node Spark2 Servers:
<service> <role>LIVYSERVER</role> <url>http://<Livy-node1>:8999</url> <url>http://<Livy-node2>:8999</url> <url>http://<Livy-node3>:8999</url> ... </service>
Click SAVE then restart Knox.
At the top of the screen it will tell you that a restart is required and there is an orange RESTART button. Click that button and select Restart All Affected.
Update Zookeeper configuration for Hbase
In the Ambari toolbar on the left, expand Services, then click Hbase.
Select the CONFIGS tab, then below it click ADVANCED.
Expand the Advanced hbase-site section. Update the following entry:
- ZooKeeper Znode Parent:
/hbase
- ZooKeeper Znode Parent:
Click SAVE then restart all affected components.
At the top of the screen it will tell you that a restart is required and there is an orange RESTART button. Click that button and select Restart All Affected.
Updating HDFS folder permissions
SSH to the TDP instance and log in as
root
.Switch to the
hdfs
user, who has permissions to create and destroy folders:su - hdfs
Assign the ownership of the
/user
folder to the hdfs user and ensure that no other user can create subfolders:hdfs dfs -chown hdfs:hdfs /user hdfs dfs -chmod 755 /user
Check if the
/user/admin
folder exists:hdfs dfs -ls /user/admin
If the folder does not exist, create it:
hdfs dfs -mkdir /user/admin
Repair the folder permissions:
hdfs dfs -chmod 755 /user/admin
Assign the folder ownership to the admin user:
hdfs dfs -chown -R admin:hdfs /user/admin
After entering the last command, enter
exit
to return to the root prompt.
Note
In the above steps, replace 'admin' with the username configured in the Kylo Connection Manager for accessing your TDP cluster.
You can optionally use the user interface to browse through HDFS directories. Please refer to Browsing HDFS via User Interface
Updating HBase Site Configuration in Spark
To update the HBase site configuration copy the hbase-site.xml
to Spark.
Run the following command on Hbase Master node:
cp /etc/hbase/3.1.5.0-316/0/hbase-site.xml /etc/spark2/3.1.5.1-316/0/
Also run the following command on all nodes where Hbase Master server and regionservers are:
cp /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/
Spark Tuning
Spark can be configured by adjusting some properties via Ambari. There are plenty of properties that configure every aspect of the Spark behavior on the Spark official documentation, but this section only just covers some of the most important properties.
Property Name | Default value | Purpose |
---|---|---|
spark.driver.cores | 1 | Number of cores to use for the driver process, only in cluster mode. |
spark.driver.memory | 1 GB | Amount of memory to use for the driver process. |
spark.executor.memory | 1 GB | Amount of memory to use per executor process, in MiB unless otherwise specified. |
spark.executor.cores | 1 in Yarn mode | The number of cores to use on each executor. In standalone and Mesos coarse-grained modes, for more detail, see this description. |
spark.task.cpus | 1 | Number of cores to allocate for each task. |
spark.executor.instances | 2 | The number of executors for static allocation. With spark.dynamicAllocation.enabled, the initial set of executors will be at least this large. |
Caution
The following instructions are recommended if you have at least 8 CPU / 32GB RAM on board per cluster node.
To increase the resources dedicated to the Spark jobs, you will need to access Ambari, please refer to Accessing Ambari for further information.
In the Ambari toolbar on the left, expand Services, then click Spark2.
Select the CONFIGS tab, then below it click ADVANCED.
Expand Custom spark2-defaults and then click Add Property....
Add the following properties in the text box:
spark.driver.cores=3 spark.driver.memory=3g spark.executor.cores=3 spark.executor.memory=3g spark.executor.instances=3