Database Anonymizer Tool
This topic describes Data Anonymization, which is the process of protecting private or sensitive information, such as passwords, by deleting or encrypting personally identifiable information.
As organizations store tend to store user information on local or cloud servers for various business requirements, data anonymization becomes a vital requirement to maintain data integrity, and to prevent security breaches.
The Database Anonymizer tool provides the functionality to anonymize the sensitive information by exporting data from the database, and allows you to configure which tables, columns, or values to exclude from the data. By default, all the Users and Passwords fields are excluded.
This tool is mainly intended to hide passwords and dictionary values in the Digital.ai Deploy database. However, you can customize it based on your requirements.
Database Anonymizer Configuration File
The Database Anonymizer configuration file (central-config/xld-db-anonymize.yaml
) tells you the data from the database you need to export. The configuration file contains three sections that define the rules for exporting.
1.Tables to not export: This section defines the tables that will not be exported. For example, USERS
table can contain sensitive information. Therefore, this table is not exported by default.
deploy.db-anonymizer:
tables-to-not-export:
- XL_USERS
tables-to-anonymize:
- table: XLD_DICT_ENTRIES
column: value
value: placeholder
- table: XLD_DICT_ENC_ENTRIES
column: value
value: enc-placeholder
- table: XLD_DB_ARTIFACTS
column: data
value: file
content-to-anonymize: []
encrypted-fields-to-ignore:
- password-regex: "\\{aes:v0\\}.*"
table: XLD_CI_PROPERTIES
column: string_value
value: password
- Tables to anonymize: This section defines the content of the specific column within a specific table. The original content will be replaced with the content defined in the
value
field.
tables-to-anonymize:
- table: XLD_DICT_ENTRIES
column: value
value: placeholder
- table: XLD_DICT_ENC_ENTRIES
column: value
value: enc-placeholder
- table: XLD_DB_ARTIFACTS
column: data
value: file
- Content to anonymize: This section defines the column containing specific content of text that will be replaced with the updated value.
content-to-anonymize: []
encrypted-fields-to-ignore:
- password-regex: "\\{aes:v0\\}.*"
table: XLD_CI_PROPERTIES
column: string_value
value: password
Caution:
- Anonymizing the content which is same as the dictionary title will change the key and the dictionary title.
- Anonymizing the content which is same as the the dictionary type will corrupt the dictionary.
To anonymize the encrypted CI password with the local key store, edit the centralConfiguration/db-anonymizer.yaml
file with the following configuration:
"encrypted-fields-to-ignore": [
{
"passwordRegex": "\\{aes:v0\\}.*",
"table": "XLD_CI_PROPERTIES",
"column": "string_value",
"value": "password"
}
]
Export Anonymizing Database
To export anonymized data, run the following command:
./bin/db-anonymizer.sh
When you run the command, the data is dumped in the server home directory with the file named xl-deploy-repository-dump.xml
, and its corresponding validation file— xl-deploy-repository-dump.dtd
.
If you are using two databases (repository and reporting), run the -reports
command to export the reporting database data file—xl-deploy-reporting-dump.xml
.
Import Anonymizing Database
To import anonymized data, run the following command:
./bin/db-anonymizer.sh -import
Command-specific Flag Options
The following table describes the command-specific flag options when importing data:
Flag | Description |
---|---|
-import | Imports data to empty database Note: If the file is not specified, the system will try to import file named xl-deploy-repository-dump.xml from the server home directory. To import a specific file from different location, use -import -f <absolute-path-of-file> command. Ensure the xl-deploy-repository-dump.dtd file is available, along with the xl-deploy-repository-dump.xml in the absolute path. |
-f | Imports a specified data file |
-refresh | Refreshes data in the database Note: Every record will be verified before inserting. Therefore the import time increases. |
-batchSize | Specifies the maximum number of commands in a batch Note: Optimal batch size is different for each specific case and DBMS. However, the default value 100 provides good results in the most cases. If you want to disable batch processing, set the value to 0 . |
-reports | Performs import on the reporting database |