Setting environment or session variables
You can set environment variables or session-level variables in Databricks that are accessible to a UDF. By using these methods, you can ensure that the variables are set automatically for the users and cannot be modified directly.
To manage this, you can use the following properties:
Using Spark Configuration Properties
You can set key-value pairs in the Spark session configuration, which can be accessed by your UDF. This prevents users from directly modifying these variables.
Set the variable in the notebook at the session level using
spark.conf.set
. For example:spark.conf.set("myapp.secretKey", "super_secret_value")
Access this configuration property using the
SparkContext
orSparkSession
.
Example in Java UDF:
import org.apache.spark.sql.api.java.UDF1; import org.apache.spark.sql.SparkSession; public class MyUDF implements UDF1<String, String> { @Override public String call(String input) throws Exception { SparkSession spark = SparkSession.builder().getOrCreate(); String secretKey = spark.conf().get("myapp.secretKey"); // Use the secretKey in your logic return "Processed with secret: " + secretKey; } }
- Users of the notebook cannot modify the
spark.conf
settings unless they have admin-level access to the notebook. If you manage cluster configurations or the session scope carefully, they won't have direct access tospark.conf
.
Using Databricks Secrets
Databricks provides Databricks Secrets, which are a secure way to store sensitive information like API keys and database credentials, and are not visible to the end user. You can configure environment-specific secrets and access them within UDFs without showing the actual values.
In Databricks, you can store secrets in a secrets scope using the UI or the CLI:
databricks secrets create-scope --scope myapp_secrets databricks secrets put --scope myapp_secrets --key secretKey
You can access the secret in your UDF via the
dbutils.secrets.get
method.Example in Python:
secretKey = dbutils.secrets.get(scope="myapp_secrets", key="secretKey") In a Java UDF, you could pass the value of the secret from Python as an argument or configure it in spark.conf like this: python spark.conf.set("myapp.secretKey", dbutils.secrets.get("myapp_secrets", "secretKey"))
Using Global Variables in Notebooks (if controlled by admin)
You can define certain the global or environment variables directly within the notebooks by setting them at the beginning of the notebook, and the users can use them without having permission to modify them.
Example:
python # Define global constants SECRET_KEY = "super_secret_value" ENDPOINT_URL = "https://api.example.com" # Use them in the notebook without users being able to modify them print(f"Using secret key {SECRET_KEY}")`
However, this approach is less secure than using
spark.conf
or Databricks secrets.Control Cluster Settings
If you are managing a cluster that multiple users are accessing, you can set environment variables or configuration settings at the cluster level via the Databricks cluster configuration UI. For instance, in the Advanced Options > Spark Config section, you can add key-value pairs:
spark.myapp.secretKey=super_secret_value
.
These configuration settings can be accessed by UDFs and are not visible to users.