Base and Advanced Datasets
A dataset refers to a collection of data that is organized and stored together. The dataset presents structured data in tabular form, relevant to a particular domain or business process. Each dataset contains attributes and metrics of data, where each row represents a record, and each column represents a data attribute or field. Datasets are broadly classified into two categories:
What is a Base Dataset?
A base dataset is the raw or initial collection of data, often unprocessed or minimally processed, and used for initial analysis. It may contain raw numbers or text data without much or any transformation, extracted from the source directly and stored in the Master Data Store (MDS).
What is an Advanced Dataset?
An advanced dataset refers to data that has undergone transformation or additional processing to provide deeper insights, make it more useful for specific tasks, or reporting, and stored as a Transformed Data Store (XDS).
Blending Base and Advanced Datasets
The advanced dataset, which contains transformed data and derived metrics, is seamlessly integrated with a base dataset, which contains core information using common identifiers known as keys. For example, if you have a base dataset with employee information (employee id, job titles, departments) and an advanced dataset with performance metrics (monthly evaluations, project outcomes), you would use employee IDs as keys to combine them. This would create a dataset that includes both employee details and performance data, providing a comprehensive view for making decisions related to promotions, training, or workload management.
It is important to use combinations that are logically supported by their content when blending datasets. Merging datasets that share the same keys ensures that the merged dataset is coherent and meaningful.
When merging datasets base and advanced datasets, the process relies on matching records based on these primary attributes. A primary attribute is a main feature or variable in a dataset used to identify, organize, or analyze the data. It can be a key (like a unique ID), but it doesn't have to be unique. It helps in linking or blending different datasets. For example, in an employee database, 'Employee ID' is a primary key, while attributes like "Job Title" or "Department" could be considered primary attributes used for analysis but may not be unique.
Blending datasets based on a primary attribute integrates information from multiple tables, enriching the dataset, ensuring accuracy, enabling multi-dimensional analysis, and simplifying reporting by unifying data around a common key for more actionable insights.
Examples of Blending Analytics Datasets
The examples provided below are determined by a specific attribute that aligns with the focus of the analysis or report.
-
Release Base and Release Advanced datasets - You could blend the release base and release advanced dataset on the Release attribute (primary attribute) to analyze data related to a software release which facilitates specific releases merged across datasets, ensuring all release-related data is consolidated.
-
Release Phase Base and Release Phase Advanced datasets - You could blend the release phase base and release phase advanced datasets on the Phase attribute (primary attribute) to analyze data related to release phase which facilitates releases and release phases merged across datasets, to conduct phase by phase analysis.
-
Release Task Base and Release Task Advanced datasets - You could blend the release task base and release task advanced datasets on the Task attribute (primary attribute) to analyze data related to release task which facilitates releases and release tasks merged across datasets, to track task status, such as in-progress, failed, and completed.
The table below outlines the data tables that can be blended. The row header and column header indicate the names of the data tables, while the intersecting cells specify the attributes used for blending.
Datasets | release_base | release_advanced | release_phase_base | release_phase_advanced | release_task_base | release_task_advanced | release_task_type_deployments_base | release_team_member |
---|---|---|---|---|---|---|---|---|
release_base | NA | release_id, __sys_source_id | release_id, __sys_source_id | NA | release_id, __sys_source_id | NA | NA | NA |
release_advanced | NA | NA | release_id, __sys_source_id | release_id, __sys_source_id | release_id, __sys_source_id | release_id, __sys_source_id | NA | NA |
release_phase_advanced | NA | NA | NA | NA | release_id, __sys_source_id | release_id, __sys_source_id | NA | NA |
release_tag_base | release_id, __sys_source_id | release_id, __sys_source_id | NA | NA | NA | NA | NA | NA |
release_task_tag_base | release_id, __sys_source_id | release_id, __sys_source_id | NA | NA | release_id, release_task_id, __sys_source_id | release_id, release_task_id, __sys_source_id | NA | NA |
release_environment_label_base | NA | NA | NA | NA | NA | NA | environment_id, __sys_source_id | NA |
release_team_advance | release_id, __sys_source_id | release_id, __sys_source_id | NA | NA | NA | NA | NA | team_id, __sys_source_id |
release_dependent_advanced | release_folder_id, release_id, __sys_source_id | release_folder_id, release_id, __sys_source_id | release_folder_id, release_id, release_phase_id, __sys_source_id | release_folder_id, release_id, release_phase_id, __sys_source_id | release_folder_id, release_id, release_task_id, __sys_source_id | release_folder_id, release_id, release_task_id, __sys_source_id | NA | NA |
release_dependency_advanced | release_folder_id, release_id, __sys_source_id | release_folder_id, release_id, __sys_source_id | release_folder_id, release_id, release_phase_id, __sys_source_id | release_folder_id, release_id, release_phase_id, __sys_source_id | release_folder_id, release_id, release_task_id, __sys_source_id | release_folder_id, release_id, release_task_id, __sys_source_id | NA | NA |
release_multi_level_dependency_advanced | release_folder_id, release_id, __sys_source_id | release_folder_id, release_id, __sys_source_id | release_folder_id, release_id, release_phase_id, __sys_source_id | release_folder_id, release_id, release_phase_id, __sys_source_id | release_folder_id, release_id, release_task_id, __sys_source_id | release_folder_id, release_id, release_task_id, __sys_source_id | NA | NA |
Stand Alone Datasets
- release_daily_snapshot
- release_weekly_snapshot
- release_task_daily_snapshot
- release_task_weekly_snapshot
- release_phase_daily_snapshot
- release_phase_weekly_snapshot
- release_task_type_deployments_base