Data Profiling is all about identifying and quantifying the accuracy of the data in a database. How complete is it? How accurate is it? These are must haves in any and all data profiling activities.
However, for any data profiling project to be complete we must look at the metadata. Not only do we need to know how many widgets were sold by analyzing the data, but we need to know when the load job runs, who the business owner is and a few other items that would make your data profiling project a masterpiece.
1. Table Details: Remember each table will have a purpose for existing, what is that purpose. While you’re checking the data within a table don’t forget the metadata on the table. Table physical and logical names and descriptions are essential elements to track. Performing a data profile on data within a large table is great, but if the table was last accessed 4 years ago. Your efforts may yield little ROI. So you will need to note this last statement and go back to the business and ask them – do they still need this data?
Some table details to consider:
Table Name
Table Description
Table Relationships
Table Usage Details (last user, frequency used)
Touch Points
Table Metadata (attribut…
Data Profiling is all about identifying and quantifying the accuracy of the data in a database. How complete is it? How accurate is it? These are must haves in any and all data profiling activities.
However, for any data profiling project to be complete we must look at the metadata. Not only do we need to know how many widgets were sold by analyzing the data, but we need to know when the load job runs, who the business owner is and a few other items that would make your data profiling project a masterpiece.
1. Table Details: Remember each table will have a purpose for existing, what is that purpose. While you’re checking the data within a table don’t forget the metadata on the table. Table physical and logical names and descriptions are essential elements to track. Performing a data profile on data within a large table is great, but if the table was last accessed 4 years ago. Your efforts may yield little ROI. So you will need to note this last statement and go back to the business and ask them – do they still need this data?
Some table details to consider:
Table Name
Table Description
Table Relationships
Table Usage Details (last user, frequency used)
Touch Points
Table Metadata (attribute definitions)
2. Load Details: Whether your data is being loaded through Datastage, mainframe jobs, or other data loader tools, you will need to record every job that touches a table, and how the data is distributed within that same table. For large tables that hold data from multiple business units this can be a monumental task.
Some load details to consider:
Job Name/Number
Job Frequency
Last Load
File Used
File Layouts
Known Failures
Known Corrective Actions
3. Report Details: Reports may not be on everyone’s list for data profiling projects, and not everyone uses reports if you perform ad-hoc analysis. However, they are an excellent way to determine the value of your data. You can analyze who is using the data; how often it is being used; the type of user; the type of decisions being made; how much data is being used and more. Data usage through reporting will be able to identify some ROI.
Some report details to consider:
Report Name
Report Purpose
Report Owner
Report–Table Touch Points
Report (last user, frequency used)
4. Owner Details: Some would argue that ownership has nothing to do with a data profile. I would say you are probably right. However, I would turn around and say, if you are a support analyst and you are asked, “Why the data is wrong or missing?”, and you don’t have the documentation or other support material. Knowing who the owner of the data is and having that in your data profile means you have just discovered speed dialing. You now have someone to discuss the issue with and someone that is responsible and accountable for the data. This information is virtually priceless to second-line support.
Some owner details to consider:
Data Owner
Process Owner
Contact Information
5. Lineage Details: Lineage details offer a more unique perspective about the data. In large organizations, who have legacy systems and silo environments this activity will become very cumbersome. In some situations the data may even pass through in individual PC for modifications, (a cold chill just went down my spine), before it reaches you the data analyst. It would be very common to see in such organizations that a single piece of data when first entered, runs through not 1, not 2, not 3, but 4 legacy systems and subsystems before coming to rest in an Enterprise Data Warehouse. Having this information in your pocket allows you to better communicate and understand different support teams, front-line users, business partners and more. Please note I use the term database below, but it can refer to any decision points in the data process flow/workflow.
Some Lineage Details:
Database/data warehouse name
Database Owner
Database Owner Contact Information
Database table names
Database attribute names (in many cases the attribute your looking at will not have been called the same in it’s lineage)
Attribute details (size, type, definitions)
First Entry Points
Once you have completed your accuracy and completeness verifications and gathered all of this intimate metadata that is often overlooked, then you can say your data profiling project is a 5-star effort. Remember, you will need to ensure that your gathered details contain accurate information.
Having all this information will not only help you the data quality analyst, but business analysts, and even support analysts. Your organization will be the better for it in the long-run.