Even then, Dr Codd advised about data integrity. He wrote about:
- Entity integrity – every table must have a primary key and the column or columns chosen to be the primary key should be unique and not null.
- Referential integrity – consistency between coupled tables. With certain values, there are obvious relationships between tables. The same ZIP code should always refer to the same town, for example.
- Domain integrity – defining the possible values of a value stored in a database, including data type and length. So if the domain is a telephone number, the value shouldn’t be an address.
He put everything else into..…
Even then, Dr Codd advised about data integrity. He wrote about:
- Entity integrity – every table must have a primary key and the column or columns chosen to be the primary key should be unique and not null.
- Referential integrity – consistency between coupled tables. With certain values, there are obvious relationships between tables. The same ZIP code should always refer to the same town, for example.
- Domain integrity – defining the possible values of a value stored in a database, including data type and length. So if the domain is a telephone number, the value shouldn’t be an address.
He put everything else into something he called “business rules” to define specific standards for your company. An example of a business rule would be for companies who store part numbers. The part number field would have a certain length and data shape – domain integrity – but also have certain character combinations to designate the category and type of part – business rules.
The point is, information quality is not something new. It was something that the database pioneers even knew theoretically in the 1970s. In the old days, when the systems were inflexible, you may have been forced to break it.
For example, a programmer who may have worked for you in the past used 99/99/9999 in a date field to designate an inactive account. It all works fine when the data is used within the single application. However, these sorts of shortcuts cause huge headaches for the data governance team as they try to consolidate and move data from silo to enterprise-wide.
To solve these legacy issues, you have to:
- Profile data to realize that some dates contain all 9s – one of the advantages of using data profiling tools in the beginning of the process.
- Figure out what the 9s mean by collaborating with members of the business community.
- Plan what to do to migrate that data over to a data model that makes more sense, like having an active/inactive account table.
If you take that one example and amplify it across thousands of tables in your company, you’ll begin to understand one of the many challenges that data stewards face as they work on migrating legacy data into MDM and data governance programs.