Big Data Data Stores
DDC supports two types of Big Data data stores:
Hadoop Cluster - Apache Hadoop provides a software framework for distributed storage and processing of big data by means of MapReduce.
Teradata (Teradata 14.10.00.02 and above)
Hadoop Cluster Considerations and Requirements
Nodes where data blocks distributed by Hadoop Distributed File System (HDFS) are stored are called DataNodes. DataNodes are treated as “slaves” in a Hadoop cluster.
A node that maintains the index of directories and files and manages data blocks stored on DataNodes is called a NameNode. A NameNode is treated as “master” in a Hadoop cluster.
To be able to scan a Hadoop cluster with HDFS, you must have:
A Target NameNode running Apache Hadoop 2.7.3, Cloudera Distribution for Hadoop (CDH), or similar.
A Proxy host running the Linux 3 Agent with database runtime components for Linux systems.
A valid Kerberos ticket if Kerberos authentication is enabled. Refer to, Generating Kerberos Authentication Ticket.
Teradata Considerations
Teradata data stores require Teradata Tools and Utilities 16.10.xx to be installed on the Agent. These utilities are also mandatory:
- ODBC Driver for Teradata
- FastExport
You may have to restart the Agent after the installation.
A scan of a Teradata data store may create temporary tables named erecon_fexp_<YYYYMMDDHHMMSS><PID><RANDOM>. Do not remove these tables while the scan is in progress. They are automatically removed when a scan completes. If a scan fails or is interrupted by an error, the temporary tables may remain in the database. In this case, it is safe to delete the temporary tables.
Scanning of large binary objects is now supported for Teradata. For a list of binary file types supported for Teradata scans, refer to the table in Binary Large Objects.
Adding Big Data Stores
Use the Add Data Store wizard to add a big data type data store. Adding a Big Data data store involves the following steps:
1. Select Store Type
In the Select Store Type screen of the wizard select Big Data in the Select Data Store Category.
From the Select Database Type drop-down list select Hadoop Cluster or Teradata.
Click Next to go on to the Configure Connection screen.
2. Configure Connection
In the Configure Connection screen of the wizard, provide the following configuration details for your data store:
HADOOP CLUSTER
Hostname/IP - Specify Hostname/IP of the Hadoop cluster's active NameNode. Specify a valid hostname, IP address, or Uniform Resource Identifier (URI). The hostname must be longer than two characters. For example, if your HDFS share path is
hdfs://hadoop-server-name/share-name
, the host name of the Name Node isshare-name
. This is a mandatory field.Port - The port on which the NameNode is accessed. Default is 8020. This is a mandatory field.
TERADATA
Hostname - Specify a valid Hostname of the Teradata server. The hostname must be longer than two characters. This is a mandatory field.
Port - Default 1025. This is a mandatory field.
User - The name of the Teradata user.
Warning
Due to known Teradata limitations DDC cannot use the following internal Teradata users to scan:
DBC, tdwm, LockLogShredder, External_AP, TDPUSER, SysAdmin, SystemFe, TDMaps, Crashdumps, Sys_Calendar, viewpoint, console.
Password - The password of the Teradata user.
Scroll down to the Agent Selection section, and in the Add Label: field, add an agent label, by entering a label or removing and existing label. Agent labels represent the agent capabilities.
Click Next to go to the General Info screen.
3. General Info
Specify the following details:
Name: Name for the data store.
Description (Optional): Description for the data store.
Location: Location of the data store. Refer to Managing Branch Locations for details.
Sensitivity Level (Optional): Sensitivity level for the data store. Refer to Sensitivity Levels for details.
Enable Data Store: Whether to enable the newly added data store. Select the check box to enable the data store.
Click Next.
4. Add Tags & Access Control
(Optional) Grant the
All groups (default)
access for reports. Alternatively, select a group.Click Save.
The data store is added to the Data stores page. If the Ready to Scan column shows Ready, then data store is properly configured.
For more information on tags and access control, expand the section below.
Tags and Access Control
The Add Tags & Access Control screen in the Add Data Store wizard allows you to grant access rights to your data store and add tags. More details below:
ACCESS - select user groups that can access the data store. Access to a data store provides ability to see reports that include scans of that data store. The available options are:
All groups: All groups of users can access the data store through reports. This is the default setting.
Selected group/s: Specified user defined groups can access the data store through reports. When this option is selected, select a group from the drop-down list. This list shows existing user defined groups. The user defined groups must already exist on CipherTrust Manager. If no user defined groups exist, ask the administrator to create a group. If needed, you can select multiple groups. Start typing the name of the desired group and select from the suggested groups.
TAGS - select a tag from the Add Tag drop-down list. Please check the list of prebuilt tags in Predefined Tags.
Tip
New tags can also be added. Start typing a new tag, and click the New: <new_tag> link that appears below the drop-down list.
Add as many tags as needed.
To remove a tag, click the close icon in the tag name.
In the General Info screen of the wizard, specify the name, description, branch location, and sensitivity level for your data store. See "Configuring a Data Store - General Information" for details.
In the Add Tags & Access Control screen of the wizard, grant access rights to your data store and add metadata. See "Configuring a Data Store – Tags and Access Control" for details.
Click Save to create the data store. At any time during the configuration you can click Back to go to any of the previous wizard screens to update the configuration. The newly created data store appears on the Data Stores page. By default, data stores are displayed in alphabetic order by name. Depending on the number of entries per page, you might need to navigate to other pages to view the newly created data store.
Generating Kerberos Authentication Ticket
To generate a Kerberos authentication ticket for your HDFS cluster, run these commands in a terminal on the designated Proxy Agent host.
To check if a valid Kerberos ticket has been issued for the principal user, do:
klist
To generate a Kerberos ticket as a principal user:
# kinit <username>@<domain> kinit DDCuser@example.com