Big Data Data Stores
DDC supports two types of Big Data data stores:
- Hadoop Cluster
- Teradata (Teradata 14.10.00.02 and above)
Hadoop Cluster Considerations
Nodes where data blocks distributed by HDFS are stored are called DataNodes. DataNodes are treated as “slaves” in a Hadoop cluster.
A node that maintains the index of directories and files and manages data blocks stored on DataNodes is called a NameNode. A NameNode is treated as “master” in a Hadoop cluster.
Teradata Considerations
Teradata data stores require Teradata Tools and Utilities 16.10.xx to be installed on the Agent. These utilities are also mandatory:
- ODBC Driver for Teradata
- FastExport
You may have to restart the Agent after the installation.
A scan of a Teradata data store may create temporary tables named erecon_fexp_<YYYYMMDDHHMMSS><PID><RANDOM>. Do not remove these tables while the scan is in progress. They are automatically removed when a scan completes. If a scan fails or is interrupted by an error, the temporary tables may remain in the database. In this case, it is safe to delete the temporary tables.
Scanning of binary data is now supported for Teradata. For a list of binary file types supported for Teradata scans, refer to the table in Binary Large Objects.
Adding Big Data Stores
Use the Add Data Store wizard to add a big data type data store. Adding a Big Data data store involves the following steps:
1. Select Store Type
In the Select Store Type screen of the wizard select Big Data in the Select Data Store Category.
From the Select Database Type drop-down list select Hadoop Cluster or Teradata.
Click Next to go on to the Configure Connection screen.
2. Configure Connection
In the Configure Connection screen of the wizard, provide the following configuration details for your data store:
HADOOP CLUSTER
Hostname/IP - Specify Hostname/IP of the Hadoop cluster's active NameNode. Specify a valid hostname, IP address, or Uniform Resource Identifier (URI). The hostname must be longer than two characters. This is a mandatory field.
Port - Default 8020. This is a mandatory field.
TERADATA
Hostname - Specify a valid Hostname of the Teradata server. The hostname must be longer than two characters. This is a mandatory field.
Port - Default 1025. This is a mandatory field.
User - The name of the Teradata user.
Due to known Teradata limitations DDC cannot use the following internal Teradata users to scan:
Warning
DBC, tdwm, LockLogShredder, External_AP, TDPUSER, SysAdmin, SystemFe, TDMaps, Crashdumps, Sys_Calendar, viewpoint, console.
Password - The password of the Teradata user.
Scroll down to the Agent Selection section, and in the Add Label: field, add an agent label, by entering a label or removing and existing label. Agent labels represent the agent capabilities.
Click Next to go to the General Info screen.
3. General Info
Configure the General Info part per the information in General Info.
Click Next to go to the Add Tags & Access Control screen.
4. Add Tags & Access Control
Configure the Tags & Access Control par per the information in Tags & Access Control.
Click Save. The newly created data store appears on the Data Stores page. By default, data stores are displayed in alphabetic order by name. Depending on the number of entries per page, you might need to navigate to other pages to view the newly created data store.