Your suggested change has been received. Thank you.

close

Suggest A Change

https://thales.na.market.dpondemand.io/docs/dpod/services/kmo….

back

Upgrade Existing Thales Data Platform

Ambari Additional Configuration

search

Ambari Additional Configuration

Livy configuration

To execute the instructions below you will need to access Ambari, please refer to Accessing Ambari for further information.

Update Livy configuration for Spark2

  1. In the Ambari toolbar on the left, expand Services, then click Spark2.

  2. Select the CONFIGS tab, then below it click ADVANCED.

  3. Expand the Custom spark2-defaults section, then click Add Property.... In the Add Property popup, click the "multiple tags" icon to enable the Bulk property add mode. Then enter the following text, replacing <zookeeper-node-hostname> with the zookeper node IP.

    1
    spark.yarn.appMasterEnv.ZK_URL_DDC = <zookeeper-node-hostname>:2181
    
  4. Expand the Custom livy2-conf section, then click Add Property.... In the Add Property popup, click the "multiple tags" icon to enable the Bulk property add mode. Enter the following text.

    1
    livy.server.session.state-retain.sec = 24h
    
  5. Expand the Advanced livy2-conf section. Update the following entry:

    • livy.server.csrf_protection.enabled: false
  6. Click SAVE and then restart Spark2.

    At the top of the screen it will tell you that a restart is required and there is an orange RESTART button. Click that button and select Restart All Affected.

Update Livy configuration for Knox

  1. In the Ambari toolbar on the left, expand Services, then click Knox.

  2. Select the CONFIGS tab.

  3. Expand the Advanced topology section.

  4. For Spark/Livy configuration, add this entry one line before </topology>:

    • For single node Spark2 Server:

      1
      2
      3
      4
      <service>
          <role>LIVYSERVER</role>
          <url>http://<Livy-node>:8999</url>
      </service>
      
    • For multiple node Spark2 Servers:

      1
      2
      3
      4
      5
      6
      7
      <service>
          <role>LIVYSERVER</role>
          <url>http://<Livy-node1>:8999</url>
          <url>http://<Livy-node2>:8999</url>
          <url>http://<Livy-node3>:8999</url>
          ...
      </service>
      
  5. Click SAVE then restart Knox.

    At the top of the screen it will tell you that a restart is required and there is an orange RESTART button. Click that button and select Restart All Affected.

Update Zookeeper configuration for Hbase

  1. In the Ambari toolbar on the left, expand Services, then click Hbase.

  2. Select the CONFIGS tab, then below it click ADVANCED.

  3. Expand the Advanced hbase-site section. Update the following entry:

    • ZooKeeper Znode Parent: /hbase
  4. Click SAVE then restart all affected components.

    At the top of the screen it will tell you that a restart is required and there is an orange RESTART button. Click that button and select Restart All Affected.

Updating HDFS folder permissions

  1. SSH to the TDP instance and log in as root.

  2. Switch to the hdfs user, who has permissions to create and destroy folders:

    1
    su - hdfs
    
  3. Assign the ownership of the /user folder to the hdfs user and ensure that no other user can create subfolders:

    1
    2
    hdfs dfs -chown hdfs:hdfs /user
    hdfs dfs -chmod 755 /user
    
  4. Check if the /user/admin folder exists:

    1
    hdfs dfs -ls /user/admin
    
  5. If the folder does not exist, create it:

    1
    hdfs dfs -mkdir /user/admin
    
  6. Repair the folder permissions:

    1
    hdfs dfs -chmod 755 /user/admin
    
  7. Assign the folder ownership to the admin user:

    1
    hdfs dfs -chown -R admin:hdfs /user/admin
    
  8. After entering the last command, enter exit to return to the root prompt.

In the above steps, replace 'admin' with the username configured in the Kylo Connection Manager for accessing your TDP cluster.

You can optionally use the user interface to browse through HDFS directories. Please refer to Browsing HDFS via User Interface

Updating HBase Site Configuration in Spark

To update the HBase site configuration copy the hbase-site.xml to Spark.

Run the following command on Hbase Master node:

1
cp /etc/hbase/3.1.5.0-316/0/hbase-site.xml /etc/spark2/3.1.5.1-316/0/

Also run the following command on all nodes where Hbase Master server and regionservers are:

1
cp /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/

Spark Tuning

Spark can be configured by adjusting some properties via Ambari. There are plenty of properties that configure every aspect of the Spark behavior on the Spark official documentation, but this section only just covers some of the most important properties.

Property NameDefault valuePurpose
spark.driver.cores1Number of cores to use for the driver process, only in cluster mode.
spark.driver.memory1 GBAmount of memory to use for the driver process.
spark.executor.memory1 GBAmount of memory to use per executor process, in MiB unless otherwise specified.
spark.executor.cores1 in Yarn modeThe number of cores to use on each executor. In standalone and Mesos coarse-grained modes, for more detail, see this description.
spark.task.cpus1Number of cores to allocate for each task.
spark.executor.instances2The number of executors for static allocation. With spark.dynamicAllocation.enabled, the initial set of executors will be at least this large.

The following instructions are recommended if you have at least 8 CPU / 32GB RAM on board per cluster node.

To increase the resources dedicated to the Spark jobs, you will need to access Ambari, please refer to Accessing Ambari for further information.

  1. In the Ambari toolbar on the left, expand Services, then click Spark2.

  2. Select the CONFIGS tab, then below it click ADVANCED.

  3. Expand Custom spark2-defaults and then click Add Property....

  4. Add the following properties in the text box:

    1
    2
    3
    4
    5
    spark.driver.cores=3
    spark.driver.memory=3g
    spark.executor.cores=3
    spark.executor.memory=3g
    spark.executor.instances=3