Your suggested change has been received. Thank you.

close

Suggest A Change

https://thales.na.market.dpondemand.io/docs/dpod/services/kmo….

back

Thales Data Platform Administration

Performance Tuning

search

Performance Tuning

This section contains information on tuning the TDP performance. Keep in mind, however, that each case must be studied separately as you may have have different volumes and time requirements.

How Do I Know What To Do To Improve TDP Performance?

Obtaining a perfect configuration requires some trial and error testing by adjusting the properties described above. However, as a general rule, you can choose to increase the default configuration if you observe any of those signals:

  • You begin to notice out of memory exceptions during any DDC spark job, specially, the ScanProcessorDDC.

  • On the executors view, during an execution, you observe that the jobs uses all memory booked, and the shuffle read / write grows over the nominal values. In this case, you can add more nodes and increase spark.executor.instances value to improve the job distribution.

  • You observe that YARN only can run a job at a time. This can indicate that you allocated too much resources and YARN can't run more that one job at the same time. You can solve this problem by decreasing the memory / cores allocated for the jobs or you can add more executors so YARN have more resources to use on all the jobs.

Keep in mind that, keeping the same number of executors (without horizontal scaling), increasing the resources (memory and cores) for each job you are decreasing the number of concurrent jobs, so you need to test in order to find the perfect balance according with our scenario.

Spark Tuning

Spark can be configured by adjusting some properties via Ambari. There are plenty of properties that configure every aspect of the Spark behavior on the Spark official documentation, but this section only just covers some of the most important properties.

Property NameDefault valuePurpose
spark.driver.cores1Number of cores to use for the driver process, only in cluster mode.
spark.driver.memory1 GBAmount of memory to use for the driver process.
spark.executor.memory1 GBAmount of memory to use per executor process, in MiB unless otherwise specified.
spark.executor.cores1 in Yarn modeThe number of cores to use on each executor. In standalone and Mesos coarse-grained modes, for more detail, see this description.
spark.task.cpus1Number of cores to allocate for each task.
spark.executor.instances2The number of executors for static allocation. With spark.dynamicAllocation.enabled, the initial set of executors will be at least this large.

The following instructions are recommended if you have at least 8 CPU / 32GB RAM on board per cluster node.

To increase the resources dedicated to the Spark jobs, you will need to access Ambari, please refer to Accessing Ambari for further information.

  1. In the Ambari toolbar on the left, expand Services, then click Spark2.

  2. Select the CONFIGS tab, then below it click ADVANCED.

  3. Expand Custom spark2-defaults and then click Add Property....

  4. Add the following properties in the text box:

    1
    2
    3
    4
    5
    spark.driver.cores=3
    spark.driver.memory=3g
    spark.executor.cores=3
    spark.executor.memory=3g
    spark.executor.instances=3