Reference Domains, often referred to as Classifications, are essentially the sets of reference codes and descriptions that are used to categorize or classify entities throughout the EDW data model. They can be divided into two main groups:
Reference Domains, often referred to as Classifications, are essentially the sets of reference codes and descriptions that are used to categorize or classify entities throughout the EDW data model. They can be divided into two main groups:
- Populated Externally by the Source System
- Populated Internally by values determined within the EDW
External
By external I mean that the domains of values are contained within the content of the source data, which is outside of the EDW. It may be that the status for a given bank account is held as a coded value on the Bank Account table in the source; while the master list of the codes and descriptions of what those codes mean is held in a different table. In some cases, the list of codes and descriptions will not exist in a source table at all, and must be found in the source documentation. It is also unfortunately sometimes the case that no documented descriptions exist for coded values. It may be necessary to determine descriptions from the context.
Internal
Some reference domains will not come from the source data at all; their determination will be internal to the data model. For example, you may have a party (i.e., a person or a company) with multiple telephone numbers: a home number and a work number, each of which occupies a separate column in the source table. Assume that these telephone numbers are being modelled in a normalized target warehouse; so, the phone numbers are contained in a phone number table, the party is in a party table, and the relationship between the two is in an associative table. The Party to Telephone Number Relationship Type is a classification, the value for which will not be populated directly from the source. The nature of the relationship in the denormalized source table is actually contained in either the name of the column or the column’s description.
The classification is internal because the need for a reference domain of the type of relationship is driven by the design of the EDW data model.
Structural vs. Descriptive
Another way to group classifications is structural vs. descriptive. Structural values are related to the EDW’s target data structures. For example, the Party Type is structural, acting as a determinant of whether the party is a person or a company. By contrast, the Marital Status Type of a person would be descriptive, being an attribute of the person. Either of those values could come directly from the source tables; or it’s equally possible that Party Type may be derived from the nature of the source (e.g., a list of employees being entered into Party).
Reference domains can be hierarchical, such as Standard Industry Codes. Some classifications seem to extend beyond the conventional pairing of code and description. Frequently, these are masking additional reference domains with related but separate domains of values. It is important to tease out the distinct sets of classifications during the analysis phase of the process.
In the following articles we will look at documenting the collection of classifications, modelling them, and ways of using them as part of initial quality assurance testing and governance.