Skip to main content
Version: Deploy 23.3

Database Anonymizer Tool

This topic describes Data Anonymization, which is the process of protecting private or sensitive information, such as passwords, by deleting or encrypting personally identifiable information.

As organizations store tend to store user information on local or cloud servers for various business requirements, data anonymization becomes a vital requirement to maintain data integrity, and to prevent security breaches.

The Database Anonymizer tool provides the functionality to anonymize the sensitive information by exporting data from the database, and allows you to configure which tables, columns, or values to exclude from the data. By default, all the Users and Passwords fields are excluded.

note

This tool is mainly intended to hide passwords and dictionary values in the Digital.ai Deploy database. However, you can customize it based on your requirements.

Database Anonymizer Configuration File

The Database Anonymizer configuration file (central-config/xld-db-anonymize.yaml) tells you the data from the database you need to export. The configuration file contains three sections that define the rules for exporting.

1.Tables to not export: This section defines the tables that will not be exported. For example, USERS table can contain sensitive information. Therefore, this table is not exported by default.

deploy.db-anonymizer:
tables-to-not-export:
- XL_USERS
tables-to-anonymize:
- table: XLD_DICT_ENTRIES
column: value
value: placeholder
- table: XLD_DICT_ENC_ENTRIES
column: value
value: enc-placeholder
- table: XLD_DB_ARTIFACTS
column: data
value: file
content-to-anonymize: []
encrypted-fields-to-ignore:
- password-regex: "\\{aes:v0\\}.*"
table: XLD_CI_PROPERTIES
column: string_value
value: password
  1. Tables to anonymize: This section defines the content of the specific column within a specific table. The original content will be replaced with the content defined in the value field.
  tables-to-anonymize:
- table: XLD_DICT_ENTRIES
column: value
value: placeholder
- table: XLD_DICT_ENC_ENTRIES
column: value
value: enc-placeholder
- table: XLD_DB_ARTIFACTS
column: data
value: file
  1. Content to anonymize: This section defines the column containing specific content of text that will be replaced with the updated value.
  content-to-anonymize: []
encrypted-fields-to-ignore:
- password-regex: "\\{aes:v0\\}.*"
table: XLD_CI_PROPERTIES
column: string_value
value: password

Caution:

  • Anonymizing the content which is same as the dictionary title will change the key and the dictionary title.
  • Anonymizing the content which is same as the the dictionary type will corrupt the dictionary.

To anonymize the encrypted CI password with the local key store, edit the centralConfiguration/db-anonymizer.yaml file with the following configuration:

"encrypted-fields-to-ignore": [
{
"passwordRegex": "\\{aes:v0\\}.*",
"table": "XLD_CI_PROPERTIES",
"column": "string_value",
"value": "password"
}
]

Export Anonymizing Database

To export anonymized data, run the following command:

./bin/db-anonymizer.sh

When you run the command, the data is dumped in the server home directory with the file named xl-deploy-repository-dump.xml, and its corresponding validation file— xl-deploy-repository-dump.dtd.

important

If you are using two databases (repository and reporting), run the -reports command to export the reporting database data file—xl-deploy-reporting-dump.xml.

Import Anonymizing Database

To import anonymized data, run the following command:

./bin/db-anonymizer.sh -import

Command-specific Flag Options

The following table describes the command-specific flag options when importing data:

Flag
Description
-importImports data to empty database Note: If the file is not specified, the system will try to import file named xl-deploy-repository-dump.xml from the server home directory. To import a specific file from different location, use -import -f <absolute-path-of-file>command. Ensure the xl-deploy-repository-dump.dtd file is available, along with the xl-deploy-repository-dump.xml in the absolute path.
-fImports a specified data file
-refreshRefreshes data in the database Note: Every record will be verified before inserting. Therefore the import time increases.
-batchSizeSpecifies the maximum number of commands in a batch Note: Optimal batch size is different for each specific case and DBMS. However, the default value 100 provides good results in the most cases. If you want to disable batch processing, set the value to 0.
-reportsPerforms import on the reporting database