⭐ Data Quality

In the Data Quality section, you can create and run data quality tests on various entities of the model. This can help ensure that the data in your model is accurate and reliable.

Create a Data Quality Test

To create a data quality test, click on the "Create Test" button. This will open a form that requires you to provide some required and additional attributes, divided into four steps.

Data Quality Source

Name: MANDATORY - name of the data quality test. The name must be at least 2 characters and no more than 25 characters long. It must start with a letter or underscore, and can be followed by letters, digits, and underscores.
Source System: MANDATORY - source system on which the data quality test will be performed.
Data Quality on a Hub or a Link: MANDATORY - select either a Hub or a Link, the entity that will be tested.
Select Hub (or Link): MANDATORY - selection of the related hub or link.
Select Snapshot:MANDATORY- select the snapshot to be used. If a snapshot does not yet exist, you will be directed to the next step, "Create Snapshot.".
Description: OPTIONAL- provide a description of the data quality test..

Create Snapshot (optional)

If no snapshot has been created yet, you will need to create one by selecting "Create new snapshot" in the dropdown list. To create a snapshot, refer to the “Snapshots“ sub-section in the Modelization section, folder Entities.

Descriptive Information

In this step, you will need to provide descriptive information about the data quality test.

Descriptive Information: MANDATORY - using the two dropdown lists, select a satellite related to the hub or link previously selected.
Quality Level: MANDATORY - select a quality level from 1 to 4. The quality levels describe the different aspects that will be checked in the database(s) to control the data quality. More information in the Annexe bellow.
Criticality:MANDATORY - to help quality responsible assess the potential quality issues they may face, each DQ control has a criticality level associated (Notification, Warning, or Error). More information in the Annexe bellow.
Data Responsible: MANDATORY - defines who will be in charge of this data quality test, 3 options are currently available:
- Hardcoded: a new text field “Responsible Value” will appear below
- Source System: the responsible will be the same as described in the Source System
- Query: a new dropdown list “IM Query” will appear bellow, requiring to select an existing query appointing the responsible.

The query option allows you to select an SQL script in the section “Information Mart” to get the responsible for a given object.
The information mart’s output should have two text columns in order to work :

hk : the hash key of the hub or the link
name : the name of the responsible of the object

Specify DQ Test

In the final step of creation, the user will specify the query to perform during the dq test.

Satellite: MANDATORY - select one or more Satellites on which data quality test query will be applied.
Column: MANDATORY - select one or more Columns on which data quality test query will be applied.
Data Quality Condition Query: MANDATORY - query editor allowing the user to write down a query.
Resolution: MANDATORY - message that will be displayed in the resolution table in the table containing the results of the data quality test.
Alias: OPTIONAL - option to help the user if he wants to use field with long names and avoid typing it. If the field is not specified, the default alias will be the same as the column name.

Annexe:

Quality Levels

Quality levels	Data Quality Check Scope	Data Quality Description
Level 1	Form	Checks the form of ONE parameter ex: isdate(X)
Level 2	Intra-source	Checks the quality between more than on parameter in one single source ex: value X < Value Y
Level 3	Inter-source	Checks the quality between more than on parameter in more than one source ex: Value (source1.X) < Value (source2.Y)
Level 4	Business plausibility	All other checks, including the experience of analysts to check for the value of a record ex: Value(X) < 10^6 or ABS (Value (source1.X)- Value (source2.Y))<30

Criticality

In order to help quality responsables in their assessment of the quality issues they could face, each DQ control has a criticality level associated.

Error (high criticality): this is for DQ controls that highlight a system error and an integrity issue. Those errors require an immediate and corrective action because they imply a non-interpretation of the data. It considers “essential“ elements to the realization of the reporting and to the primary reading of the decision tables. High criticality helps prevent "blind spots" related to the lack of information to link the data to the display structure.
- Ex: A value linked to a benchmark that serves as a filter for the reporting. An indicator necessary for decision making.
Warning (medium criticality): this is for DQ controls that highlight conformity issues. Those warnings need to be corrected as soon as possible to avoid misinterpretation and biais into the data that couldn’t tell a single version of the truth. It considers “support” elements for decision-making and understanding of the data. They allow to give a context to the essential elements. Without them, the contextualization is defective.
- Ex: An indicator whose data provide context or perspective only.
Notification (low criticality): this is for DQ controls that highlight minor data quality issues. Those notifications are used as informations that need to be analyzed to provide some improvement actions in the source system. It considers “complementary” elements which are not essential to the performance of reporting (filters, etc.) and which do not impact the decision tables.
- Ex: A comment field that describes the successes if those are not necessary for decisions.