Use Cases
In Databricks, Unity Catalog Use Case and Non-Unity Catalog Use Case refers to how data governance, access control, and cataloging of data assets are managed within the Databricks environment.
Non-Unity Catalog Use Case
Refers to legacy or non-governed setups where Unity Catalog is not enabled.
Define and Register the UDF in Databricks Notepad
Databricks has a nice feature that even though each CSP has its own unique attributes to create functions and gateways there is only one method to know with Databricks. Listed below is a summary of the entire process.
Set Up Your Environment by using a Databricks compute cluster of 14.3 or higher.
Note
Databricks supports Java 1.8.
Create the Thales Function in Java.
Package the Java Code as a JAR.
Upload the JAR to Databricks. You can do this through the Databricks UI:
Register the UDF in Databricks using the following input values:
Key Sample data Description datatype char Datatype of column (char or nbr) mode protect Mode of operation (protect,or reveal) Data 345-23-3345 Can be protected or sensitive information.
Register Examples in Databricks Notebook
Character Data
Encryption
%scala
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.expressions.UserDefinedFunction
val ThalesencryptCharUDF: UserDefinedFunction = udf((data: String) => {
try {
example.ThalesDataBricksCADPFPE.thales_cadp_udf(data,"encrypt","char")
} catch {
case e: Exception => null
}
})
spark.udf.register("ThalesencryptCharUDF", ThalesencryptCharUDF)
// Do not need if using UC. spark.sql("CREATE or replace FUNCTION ThalesencryptCharUDF AS 'example.ThalesencryptCharUDF'")
spark.sql("SELECT ThalesencryptCharUDF('thisisatest')").show()
Decryption
%scala
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.expressions.UserDefinedFunction
val ThalesdecryptCharUDF: UserDefinedFunction = udf((data: String) => {
try {
example.ThalesDataBricksCADPFPE.thales_cadp_udf(data,"decrypt","char")
} catch {
case e: Exception => null
}
})
// Register the UDF
spark.udf.register("ThalesdecryptCharUDF", ThalesdecryptCharUDF)
// Do not need if using UC. spark.sql("CREATE or replace FUNCTION ThalesdecryptCharUDF AS 'example.ThalesdecryptCharUDF'")
spark.sql("SELECT ThalesdecryptCharUDF('Ma9zhQvxnEX')").show()
Number Data
Encryption
%scala
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.expressions.UserDefinedFunction
val ThalesencryptNbrUDF: UserDefinedFunction = udf((data: String) => {
try {
example.ThalesDataBricksCADPFPE.thales_cadp_udf(data,"encrypt","nbr")
} catch {
case e: Exception => null
}
})
spark.udf.register("ThalesencryptNbrUDF", ThalesencryptNbrUDF)
// Do not need if using UC. spark.sql("CREATE or replace FUNCTION ThalesencryptNbrUDF AS 'example.ThalesencryptNbrUDF'")
spark.sql("SELECT ThalesencryptNbrUDF('-46')").show()
Decryption
%scala
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.expressions.UserDefinedFunction
val ThalesdecryptNbrUDF: UserDefinedFunction = udf((data: String) => {
try {
example.ThalesDataBricksCADPFPE.thales_cadp_udf(data,"decrypt","nbr")
} catch {
case e: Exception => null
}
})
spark.udf.register("ThalesdecryptNbrUDF", ThalesdecryptNbrUDF)
// Do not need if using UC. spark.sql("CREATE or replace FUNCTION ThalesdecryptNbrUDF AS 'example.ThalesdecryptNbrUDF'")
spark.sql("SELECT ThalesdecryptNbrUDF('914171854902')").show()
Unity Catalog Use case
Unity Catalog is a unified governance solution for all data and assets in Databricks. It provides:
Centralized data governance across all workspaces
Fine-grained access control at the table, column, and row level
Native support for data lineage and auditing
Consistent naming conventions using a three-level namespace: catalog.schema.table
The process to deploy the jar file is same as that of Non-Unity Catalog Use casw. The only difference is how the code registers and uses the UDFs.
Character Data in Unity Catalog
Here is a Scala example for character data.
%scala
import example.ThalesDataBricksCADPFPE
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{udf, explode, array}
import scala.collection.JavaConverters._
// Initialize Spark Session
val spark = SparkSession.builder()
.appName("Thales UDF Example")
.getOrCreate()
// Define the UDF
val ThalesencryptCharBulkCADPUDF = udf((data: Seq[String]) => {
try {
val javaList = data.asJava
val result = example.ThalesDatabricksCADPChunkBulkFPE.thales_cadp_bulk_udf(javaList, "encrypt", "char")
result.asScala.toSeq.map(_.toString)
} catch {
case e: Exception => Seq.empty[String]
}
})
// Register the UDF
spark.udf.register("ThalesencryptCharBulkCADPUDF", ThalesencryptCharBulkCADPUDF)
// Create a DataFrame with your reveal_data
val revealDF = spark.sql("SELECT c_name FROM samples.tpch.customer limit 5")
// Apply the UDF and explode the array result into rows
val encryptedDF = revealDF
.withColumn("encrypted_values", ThalesencryptCharBulkCADPUDF(array($"c_name")))
.select($"c_name", explode($"encrypted_values").as("encrypted_value"))
c_name encrypted_value
Customer#000412445 eD1NaSHf#ZI32WJP9V
Customer#000412446 2XStFXRG#grlmNhPSg
Customer#000412447 krZegz8N#VWS14Wstf
Customer#000412448 phkK6D8d#JiemF4MS1
Customer#000412449 W47dqkzw#v2rWjLcyK
Here is a Python example:
%python
spark.sql("SELECT c_name, explode(ThalesencryptCharBulkCADPUDF(array(c_name))) AS encrypted_value FROM samples.tpch.customer limit 5").show()
c_name| encrypted_value|
|Customer#000412445|eD1NaSHf#ZI32WJP9V|
|Customer#000412446|2XStFXRG#grlmNhPSg|
|Customer#000412447|krZegz8N#VWS14Wstf|
|Customer#000412448|phkK6D8d#JiemF4MS1|
|Customer#000412449|W47dqkzw#v2rWjLcyK|
+------------------+------------------+
Here is a SQL example.
%sql
SELECT c_name, explode(ThalesencryptCharBulkCADPUDF(array(c_name))) AS encrypted_value
c_name encrypted_value
Customer#000412445 eD1NaSHf#ZI32WJP9V
Customer#000412446 2XStFXRG#grlmNhPSg
Customer#000412447 krZegz8N#VWS14Wstf
Customer#000412448 phkK6D8d#JiemF4MS1
Customer#000412449 W47dqkzw#v2rWjLcyK
Number Data in Unity Catalog
Here is a scala example for number data:
%scala
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.udf
import scala.collection.JavaConverters._
// Initialize Spark Session
val spark = SparkSession.builder()
.appName("Thales CADP Decrypt Bulk Numbers UDF Example")
.getOrCreate()
// Define the UDF for decrypting numbers in bulk
val ThalesdecryptNumberBulkCADPUDF = udf((data: Seq[Long]) => {
try {
val stringList = data.map(_.toString).asJava // Convert Longs to Strings
val result = example.ThalesDatabricksCADPChunkBulkFPE.thales_cadp_bulk_udf(stringList, "decrypt", "nbr") // Use "number"
result.asScala.toSeq.map(_.toString.toLong) // Convert Strings back to Longs
} catch {
case e: Exception => Seq.empty[Long]
}
})
// Register the UDF
spark.udf.register("ThalesdecryptNumberBulkCADPUDF", ThalesdecryptNumberBulkCADPUDF)
// Example encrypting numbers in bulk.
val ThalesencryptNumberBulkCADPUDF = udf((data: Seq[Long]) => {
try {
val stringList = data.map(_.toString).asJava // Convert Longs to Strings
val result = example.ThalesDatabricksCADPChunkBulkFPE.thales_cadp_bulk_udf(stringList, "encrypt", "nbr") // Use "number"
result.asScala.toSeq.map(_.toString.toLong) // Convert Strings back to Longs
} catch {
case e: Exception => Seq.empty[Long]
}
})
spark.udf.register("ThalesencryptNumberBulkCADPUDF", ThalesencryptNumberBulkCADPUDF)
%scala
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{udf, expr, collect_list}
// Query to get the values from the customer table
val customerData = spark.sql("SELECT c_custkey FROM my_catalog.my_schema.customer LIMIT 5")
// Collect the custkeys into an array
val customerDataArray = customerData.agg(collect_list("c_custkey").as("custkeys"))
// Encrypt the custkeys using the UDF
val encryptedCustKeysDF = customerDataArray.withColumn("encrypted_custkeys", expr("ThalesencryptNumberBulkCADPUDF(custkeys)"))
// Decrypt the encrypted custkeys using the UDF
val decryptedCustKeysDF = encryptedCustKeysDF.withColumn("decrypted_custkeys", expr("ThalesdecryptNumberBulkCADPUDF(encrypted_custkeys)"))
// Collect the data into arrays (correct data types)
val custKeys = customerData.collect().map(row => row.getLong(0)).toSeq
val encryptedCustKeys = encryptedCustKeysDF.collect().flatMap(row => row.getAs[Seq[Long]]("encrypted_custkeys")).toSeq
val decryptedCustKeys = decryptedCustKeysDF.collect().flatMap(row => row.getAs[Seq[Long]]("decrypted_custkeys")).toSeq
// Print the results
println("Original Custkeys:")
custKeys.foreach(println)
println("\nEncrypted Custkeys:")
encryptedCustKeys.foreach(println)
println("\nDecrypted Custkeys:")
decryptedCustKeys.foreach(println)
Output
Original Custkeys: 412445 412446 412447 412448 412449
Encrypted Custkeys: 595143 765354 36526 973910 680671
Decrypted Custkeys: 412445 412446 27895 412448 412449