Cloud Data Stores
DDC supports these types of Cloud storages as data stores:
AWS S3 - AWS (Amazon Web Services) is an on-demand cloud computing platform and API.
Azure Blobs - Microsoft Azure Blobs (used to store unstructured text and binary data).
Azure Table - lets programs store structured text in partitioned collections of entities that are accessed by partition key and primary key.
Office 365 Sharepoint Online - Sharepoint Online is a document management and storage system delivered as part of Microsoft Online Services suite.
Office 365 Exchange Online - Exchange Online is Exchange Server delivered as a cloud service hosted by Microsoft.
Office 365 OneDrive for Business - OneDrive for Business is a managed cloud storage for business users that replaces SharePoint Workspace.
G-Suite (G-Mail and G-Drive)
Salesforce - Marketing Cloud offers marketing automation and analytics software for email, mobile, social and online marketing.
Note
Before adding any Cloud data store, make sure that you have the required user credentials handy.
Adding Cloud Data Stores
Use the Add Data Store wizard to add a big data type data store. Adding a Big Data data store involves the following steps:
1. Select Store Type
In the Select Store Type screen of the wizard select Cloud in the Select Data Store Category.
From the Select Database Type drop-down list select:
AWS S3
Azure Blobs
Azure Table
Office 365: Sharepoint Online
Office 365: Exchange Online
Office 365: OneDrive for Business
G-Suite
Salesforce
Click Next to go on to the Configure Connection screen.
2. Configure Connection
In the Configure Connection screen of the wizard, provide the following configuration details for your data store:
AWS S3
Provide the user security credentials, which consist of an Access Key ID and a Secret Access Key.
Access Key ID: Enter the Access Key ID that you obtained from your storage account administrator. For example:
AKIAABCDEFGHIEXAMPLE
Secret Access Key: Enter the Secret Access Key as obtained from your storage account administrator. For example:
aBcDeFGHiJKLM/A1NOPQR/wxYzdcbAEXAMPLEKEYd
Select the Show Secret Access Key checkbox if you want to view the secret access key.
To set up an Amazon S3 as a target for a scan, use the following format:
Whole Bucket - <BucketName>
Specific folder in a Bucket - <BucketName/folder>
Specific file in a Bucket - <BucketName[/folder]/file.txt>
Note
Each Amazon S3 Bucket included in a scan consumes one Amazon S3 Bucket license. Make sure to use credentials that have access to all Amazon S3 Buckets that are selected for a scan to avoid consuming licenses for inaccessible Buckets.
AZURE BLOBS
In the Configure Connection step, provide the following information:
Account Name: The name of your Azure Storage account.
User: The name of your Azure Storage account.
Active Access Key: Enter key1 or key2, which is your primary or secondary Azure account access key. If you do not know what they are, follow the steps in Obtaining the Azure Account Access Keys.
Tip
You should ask your Azure Storage account administrator which access key is currently active, since only one access key can be active at a time.
AZURE TABLE
Account Name: Enter your Azure account name.
User: Enter your Azure Storage account name.
Password: Your Azure password.
OFFICE 365: SHAREPOINT ONLINE
Domain: Enter your SharePoint Online organization name. For example, if you access SharePoint Online at https://mycompany.sharepoint.com, enter mycompany.
Client ID - Enter the Client ID for the registered SharePoint Add-in. Example:
1234abcd-56ef-78gh-90ij-1234clientid
You can generate the Client ID and Client Secret Key when you register the SharePoint Add-in. You need to note down these values after you generate the SharePoint Add-in.
Client Secret Key - Enter the Client Secret key for the registered SharePoint Add-in. Example:
abcdefghij0123456789klmnopqrst0clientsecret
Tenant ID - Enter the Tenant ID key for the registered SharePoint Add-in. Example:
12345678-abcd-9012-efgh-ijkltenantid
You can get it when you go to the tenant administration site (for example, at https://mycompany-admin.sharepoint.com/_layouts/15/appinv.aspx) using your administrator account to grant permissions to the registered SharePoint Add-in. The Tenant Id can be obtained from the App Identifier value, which has the following format:
i:0i.t|ms.sp.ext|<client ID>@<tenant ID>
. In this example,i:0i.t|ms.sp.ext|1234abcd-56ef-78gh-90ij-1234clientid@12345678-abcd-9012-efgh-ijkltenantid
, the Tenant ID is12345678-abcd-9012-efgh-ijkltenantid
.
For more information on these configuration parameters, refer to the Sharepoint Online documentation.
OFFICE 365: EXCHANGE ONLINE
Exchange Online Domain: Enter a domain to scan mailboxes that reside on that domain. This is usually the domain component of the email address, or the Windows Domain.
Client ID: Enter your Exchange Online client ID (application ID).
Client Secret Key: Enter your Exchange Online client secret key. Select the Show Client Secret Key check-box to view the key.
Tenant ID: Enter your Office 365: Exchange Online tenant ID. Your Microsoft 365 tenant ID is a globally unique identifier (GUID) that is different than your organization name or domain.
For full details of the configuration of the Office365: Exchange Online data store, refer to Office365: Exchange Online
OFFICE 365: ONEDRIVE FOR BUSINESS
OneDrive for Business Domain - Enter the Microsoft 365 domain. Example:
example.onmicrosoft.com
Warning
An Office365: OneDrive for Business data store will get created successfully even with a wrong domain. This is a known issue.
Client ID - Enter the Client ID. Example:
clientid-1234-5678-abcd-6d05bf28c2bf
You generate the Client ID and Tenant ID in the Azure app registration portal. After you register your application you can view the Client ID and Tenant ID Key. You need to note down these values.
Client Secret Key - Enter the Client Secret key. Example:
client~secret.key-CHvV1B5YQfr~6zDjEyv
It is the Client Secret that you set in the Azure app registration portal in the Certificates & Secrets page. Make sure that you save your Client Secret key in a secure location as you will not be able to retrieve it later.
Tenant ID - Enter the Tenant ID. Example:
tenantid-1234-abcd-5678-02011df316f4
For more information on these configuration parameters, refer to the OneDrive for Business Online documentation.
G-SUITE
Domain: The G-Suite domain that you want to scan in the G Suite Domain field. For example, if your G-Suite administrator email is admin@example.com, your G-Suite domain is example.com.
Admin User: The G-Suite administrator account email address. Use the same administrator account used to Enable APIs and Set up Domain-Wide Delegation.
Service Account: Your Service account ID, for example, ddc-service-account@vertical-tuner-322508.iam.gserviceaccount.com.
IP12 Key: Upload the P12 key associated with your Service account ID.
For details, refer to the information in Configuring a G-Suite Account.
SALESFORCE
Account Name: Salesforce Account. Use the correct syntax for the Salesforce Account according to Salesforce site.
Production
Syntax: <email_address>
Example: admin@example.com
Sandbox
Syntax: sandbox:<email_address>
Example: sandbox:admin@example.comConsumer Key: Enter the Consumer Key obtained in Creating Connected App. For example:
9tzQREbH3MVG_SvurP17mjK2py_jS6lfqit1_ss50PkRmNIZnd7yM92zOBnU3IQPvSyu5PQIV2dsqyQiw0T5
Select the Show Consumer Key check-box to view the key.
Private Key: Use the Browse Private Key button to upload the private key file obtained from Generating Certificate and Private Key. For example, er-salesforce.key.
The Agent Selection section allows you to specify the minimum and maximum number of proxy agents when adding a datastore. Employing a group of agents instead of a single agent to run the scan should improve the scan execution time.
Note
The multiple agent functionality is not supported for the Office 365 Sharepoint Online datastore.
In the Select Number of Agents menu set the number of agents for the datastore:
Minimum: - Set the minimun number of agents to use to scan the datastore. At least that number of proxy agents must be able to connect to the datastore.
Maximum: - Set the maximum number agents to use to scan the datastore.
Warning
• As there is no limit on the number of minimum and maximum agents that you can set, you should exercise caution so that you do not impact the system performance by using too many resouces for a single scan.
• You will not be able to add a datastore if the minimum number of agents cannot be assigned.
• A scan will fail if the assigned agent is unavailable after adding the datastore.
• The minimum number of agents must be less than or equal to the maximum number of agents.In the Add Label: field, add an agent label, by entering a label or removing and existing label. Agent labels represent the agent capabilities.
Click Next to move on to the General Info step of the wizard.
3. General Info
In the General Info screen of the wizard, specify the name, description, branch location, and sensitivity level for your data store. See "Configuring a Data Store - General Information" for details.
Configure the General Info part per the information in General Info.
Click Next to go to the Add Tags & Access Control screen.
4. Add Tags & Access Control
In the Add Tags & Access Control screen of the wizard, grant access rights to your data store and add metadata. See "Configuring a Data Store – Tags and Access Control" for details.
Configure the Tags & Access Control par per the information in Tags & Access Control.
Click Save. The newly created data store appears on the Data Stores page. By default, data stores are displayed in alphabetic order by name. Depending on the number of entries per page, you might need to navigate to other pages to view the newly created data store.
At any time during the configuration you can click Back to go to any of the previous wizard screens to update the configuration.
The newly created data store appears on the Data Stores page. By default, data stores are displayed in alphabetic order by name. Depending on the number of entries per page, you might need to navigate to other pages to view the newly created data store.
Recommended Least Privilege User Approach:
Note
To reduce the risk of data loss or privileged account abuse, the Target credentials provided for the intended Target should only be granted read-only access to the exact resources and data that require scanning. Never grant full user access privileges or unrestricted data access to any application if it is not required.
Click Save to create the data store. At any time during the configuration you can click Back to go to any of the previous wizard screens to update the configuration.
The newly created data store appears on the Data Stores page. By default, data stores are displayed in alphabetic order by name. Depending on the number of entries per page, you might need to navigate to other pages to view the newly created data store.
Obtaining the Azure Account Access Keys
If you need to find out what your Azure account access keys are:
Log into your Azure account.
Navigate to All resources > [Storage account].
Click Access keys under Settings.
Note down the key1 (primary) and key2 (secondary).
The primary and secondary access keys are used to make rolling key changes. Only one access key can be active at a time. Ask your Azure Storage account administrator which access key is currently active, and use that key to connect DDC to your Azure Storage account.
Configuring a G-Suite Account
Because the Google API imposes certain restrictions on software attempting to access data on their services setting up a G-Suite account as a data store requires more work than other cloud services. You perform the procedure via the Google Cloud Platform console.
Before you can add a G-Suite product as Data Store, you must have a G-Suite administrator account for the target G-Suite domain. It must be a G-Suite account for personal Google accounts are not supported.
To configure your G-Suite account for scanning you need to:
Selecting a Project
Log into the Google Developers Console.
Click the Select a project menu to expand it. The Select dialog box opens and displays a list of existing projects.
In the Select dialog box, you can:
Select an existing project.
Create a new project (recommended).
Select a project in the Google Developers Console to enable G Suite APIs.
To select an existing project:
Click a project.
Click OPEN.
To create a new project:
Click the NEW PROJECT button.
In the New Project page, enter your Project name and click CREATE.
After creating or selecting a project you are taken to its APIs & Services page. In it you can click the +ENABLE APIS AND SERVICES button (at the top of the screen).
Enabling APIs
To enable G-Suite APIs:
Select a Project.
In the project Dashboard, click +ENABLE APIS AND SERVICES. This displays the API Library.
Enable the Admin SDK API.
a. Under G-Suite APIs, click Admin SDK.
b. Click the ENABLE button when the Admin SDK API is displayed. The Admin SDK API statistics screen is displayed.
Repeat the above steps to enable also these APIs:
Gmail API (for Google Mail)
Google Drive API (for Google Drive)
Tip
If you are lost after the previous step, and are not sure where to enable these, click the toolbar icon on the top left side of the Dashboard (the three horizontal dashes), then Home, then Go to APIs overview in the APIs panel (the link at the bottom of it).
Creating a Service Account
To create a service account in the Google Developers Console for use with DDC scans:
Click the menu on the upper-left corner of the Google Developers Console (the three horizontal dashes on the top left side of the Dashboard).
Select IAM & Admin -> Service Accounts.
Click the +CREATE SERVICE ACCOUNT button.
In the Create service account dialog box, enter the following:
a. Service Account Details:
Service account name - Display name for this service account. For example, doc-service-account.
Service account ID - it will be autocompleted with your Service account name (above) and your earlier created postfix. You will need this Service account ID later for configuring DDC. In the Service account for project "such-and-such" you can now see your service account. An example service account ID: ddc-service-account@vertical-tuner-322508.iam.gserviceaccount.com.
Service account description (optional).
b. Click the CREATE AND CONTINUE button.
c. Grant this service account access to project:
- Select a role - click into the text boxt to open a search dialog (Type to filter 'owner', then select 'Owner' Full access to all resources).
d. Click the CONTINUE button.
e. Grant user access to this service account (optional). In this section just click DONE.
To complete the procedure, you need to add a private key. For that:
a. Click the three vertical dots in the Actions column in the Service account for project screen.
b. Then select Manage keys. You will be taken to the Keys screen.
c. Click the ADD KEY pull down menu, and then select the Create new key option.
d. In the Key type, select the P12 radiobutton.
e. Click CREATE. The Service account and key created dialog box displays, and you can save the P12 key to your computer. This key is normally not requiered but you should keep it in a safe location, just in case.
!!! tip The dialog box displays the private key’s password: notasecret - you can disregard this password.
f. Click CLOSE.
Write down the newly created service account’s Service account ID and Key ID.
Setting up Domain-Wide Delegation
To be able to access your G-Suite domain with the Service Account, you must set up and enable domain-wide delegation for it. To set up domain-wide delegation for an existing service account:
Click on the service account name that you created in Creating a Service Account step. To use the same example: ddc-service-account@vertical-tuner-322508.iam.gserviceaccount.com.
On the account DETAILS tab, click the SHOW DOMAIN WIDE DELEGATION link at the bottom to expand that section.
Select the Enable Google Workspace Domain-Wide Delegation checkbox. Additional configuration options will be displayed.
Click inside the Product name for the consent screen field and type in the product name. For example: "DDC-service-account".
Click SAVE and a message "Account 'such-and-such' has been updated" is displayed. Also a new field Client ID appears with a client identifier. Note down the client id as you will need it later.
Go to the G-Suite Admin Console. For that, you need to open a new tab or window in your browser.
a. Log in to your G-Suite Admin Console account. It has to be an administrator account.
b. In the navigation panel on the left, click Security -> API controls. Select Security to manage security features in the G Suite admin console.
c. In the API controls screen that is displayed, scroll down to the Domain wide delegation section (at the bottom of the page).
d. In the Domain wide delegation section, click the MANAGE DOMAIN WIDE DELEGATION link. This will take you to a screen displaying a list of all the service accounts that you already have created (the breadcrumb of this screen: Security -> API Controls -> Domain-wide Delegation).
e. Click the Add new link to create a new service account. A pop-up dialog Add a new client ID opens. You need to provide:
Client ID - This is the client ID that you were supposed to note down in the previous step (Step 5).
OAuth Scopes - type in or copy-paste these URIs (i.e. API Scopes). You can use one line and separating them with commas, or use separate fields for each scope:
- Required for all: https://www.googleapis.com/auth/admin.directory.user.readonly
- Google Mail: https://mail.google.com/
- Google Drive: https://www.googleapis.com/auth/drive.readonly
f. Click AUTHORIZE to complete the acccount configuration and a new service account with its scopes is displayed in the list (as "DDC-service-account" to use the same example as in Creating a Service Account).
Navigate back to the API controls screen (click Security -> API controls in the toolbar on the left).
In the App access control panel, click the MANAGE THIRD-PARTY APP ACCESS link to go to a list of Connected apps (breadcrumb: Security -> API controls -> App Access Control).
Click the Configure new app menu and select the option OAuth App Name Or Client ID.
In the Configure an OAuth app screen that opens, search for your client ID. This is the client ID that you were supposed to note down in Step 5. Type it in or paste in the search field and click SEARCH. The search should return your service account name, that is, to use the same example as before, "DDC-service-account".
Click this account in the App name panel and then click the Select button on the right. You will be taken to the app access configuration screen for your client ID.
In the access configuration screen for your client ID:
a. Click to select the checkbox for your client ID.
b. Click SELECT to continue.
c. In the App access screen, you must choose which access type will be applied to your client ID. Select the Trusted radiobutton.
d. Click CONFIGURE to continue.
Now, your service account has "Trusted" in the Access column, in the list of Connected apps and this is what we aimed for.
Configuring a Salesforce Account
To be able to use Salesforce Targets as a data store you will need to generate a certificate and a private key and create a connected app.
Note
The instructions provided in this section are specific to the Salesforce Lightning interface. If you are using Salesforce Classic, you will notice a different interface, in which case you should refer to the Salesforce Classic interface documentation to complete the prerequisites.
Generating Certificate and Private Key
To generate the digital certificate and private key:
Using the Terminal or Windows Command Prompt, install the OpenSSL package and run the following command:
# Syntax: openssl req -x509 -sha256 -nodes -newkey rsa:2048 -days <number of days> -keyout <*.key private key file> -out <*.crt certificate file> openssl req -x509 -sha256 -nodes -newkey rsa:2048 -days 365 -keyout ddc-salesforce.key -out ddc-salesforce.crt
where:
days (optional) - Number of days to certify the certificate for. The default is 30 days.
keyout - Output filename to write the private key to. For example, ddc-salesforce.key.
out - Output filename to write the digital certificate to. For example, ddc-salesforce.crt.
You will need to provide for the following information for openssl:
Country Name (2 letter code) [AU]: Your country's two letter country code (ISO 3166-1 alpha-2).
State or Province Name (full name) [Some-State]: State or province name.
Locality Name (e.g., city) []: City name or name of region.
Organization Name (e.g., company) [Internet Widgits Pty Ltd]: Name of organization.
Organizational Unit Name (e.g., section) []: Name of organizational department.
Common Name (e.g. server FQDN or YOUR name) []: Fully qualified domain name of the Master Server.
Email Address []: Email address of organization's contact person.
The openssl command generates the digital certificate (for example, ddc-salesforce.crt) required to create a connected app for DDC, and the private key (for example, ddc-salesforce.key) required to set up and scan a Salesforce data store.
Creating Connected App
As the administrator, log in to your organization's Salesforce site and go to Setup.
In the Setup > Home tab, enter "App Manager" in the Quick Find box, and select App Manager.
In the Lightning Experience App Manager page, click on New Connected App.
In the Basic Information section, fill in the following fields:
Connected App Name - Enter a descriptive display name for DDC. For example, Data_Discovery_and_Classification.
API Name - Enter a unique identifier to use when referring to the app programmatically. For example, DDC.
Contact Email - Enter an email address that Salesforce can use if they need to contact you about the connected app.
In the API (Enable OAuth Settings) section, select the Enable OAuth Settings checkbox.
In the Callback URL field, enter the URL to redirect to after successful authorization of the connected app. For example, https://example.com/callback-ddc.
Note
The Callback URL is mandatory when setting up a connected app, but is not required for scanning Salesforce Targets as a data store.
Select the Use digital signatures checkbox and click Choose File to upload a digital certificate. For example, ddc-salesforce.crt.
Under Select OAuth Scopes, select and add the following permissions for the "DDC" connected app:
Access the identity URL service (id, profile, email, address, phone)
Manage user data via APIs (api)
Perform requests at any time (refresh_token, offline_access)
Required for probing, scanning and remediating Salesforce Targets.
Click Save and Continue.
In the Manage Connected Apps page, go to API (Enable OAuth Settings) > Consumer Key and click Copy.
The consumer key will be required when you Set Up and Scan a Salesforce Target.
Tip
A consumer key is generated automatically when you create a connected app in Salesforce. Make sure that you store it when you set it, as it is unique and once set, you will not be able to edit or overwrite it.
Click Manage > Edit Policies.
Under OAuth Policies > Permitted Users, select Admin approved users are pre-authorized.
Click Save.
Back in the App Manager page, go to the Profiles section and click Manage Profiles.
In the Application Profile Assignment page, select the profile(s) (e.g. "System Administrator") that you want to allow to access the "DDC" connected app.
Note
The Salesforce account that is specified when you Set Up and Scan a Salesforce Target must be assigned to at least one of the profiles that has:
• Access to the DDC connected app (e.g. "Enterprise_Recon"), and
• Minimum "Read" permissions for the Salesforce Objects to be scanned.
See Salesforce Help - Object Permissions for more information.Click Save.
In the Setup > Home tab, enter "Profiles" in the Quick Find box, and select Profiles.
Go to the profile(s) selected in Step 15 (e.g. "System Administrator") and click Edit.
In the Administrative Permissions section, select the following checkboxes:
- API Enabled
- Query All Files
Note
Enabling the Query All Files permission is an optional step that allows the Salesforce account that is specified when you set up and scan a Salesforce data source to scan all files in your organization's Salesforce site, including those owned / managed by other user accounts.
Without the Query All Files permission, DDC will only be able to scan the files that are owned by / shared to the specified Salesforce account.
For more information about Salesforce behavior when "Query All Files" is enabled, please refer to https://developer.salesforce.com/docs/atlas.en-us.234.0.object_reference.meta/object_reference/sforce_api_objects_contentversion.htm.Click Save.