In many of our engagements with new clients, the old Donald Rumsfeld phrase of ‘We don’t know what we don’t know” is very applicable as these organizations commence their journey into database analytics.  In these situations, there is no clear definable objective or goal when undertaking these projects. In fact, these companies look for outside consultation in the creation of a roadmap which represents both strategy and tactics on what database analytics projects they should undertake. These type  of exercises don’t yield an immediate return on investment which can be a barrier to many organizations who fail to appreciate the long term benefits that a data discovery can yield. It is often very difficult to convince these organizations of the longer term benefits of database analytics as they are focused on achieving  short-term gains to resolve an immediate business need. In many ways, this is the true challenge of a data discovery as we seek to strike a balance between the longer term analytics needs versus establishing an immediate ROI.
 
In undertaking these type of projects, the only common feature is the open-ended nature of the assignment  as an end solution to solving a specific business problem is not necessarily our goal. Instead exploration and discovery is the intent of the project with the goal being to build an analytical roadmap. Yet, even open-ended projects require some process in order to provide guidelines and steps which are necessary for its success. Typically, this process involves four steps which are:
 
·        Preparation
·        Data Audit
·        Preliminary Analysis
·        Recommendations

Preparation
 
The first stage of this project represents the portion of the project where the analytics practitioners attempts to increase their  knowledge of the client’s current business and results.  In data mining and analytics, all experts  agree that analytics projects require both domain knowledge and data mining expertise in order to really optimize a given solution. Domain knowledge is specific knowledge which pertains to that business. It represents knowledge which is both unique for the industry sector(finance,retail,etc.) of the client business but also knowledge which is unique for the mechanics of how that client business runs. The preparation stage of the project allows the practitioner to increase their domain knowledge of this business.  Of course, the domain knowledge of the practitioner will never be as exhaustive as the client but the objective here is to obtain an adequate level of this knowledge in order to conduct an effective discovery exercise.
 
The initial tasks here are to conduct extensive interviews with key business stakeholders from marketing, I/T, analytics(if there is an area), finance, and the executive depending on availability. During these meetings, key business issues and challenges are identified  as well as an understanding of what data is available. Business reports/analyses or any other documents that provide results and meaningful information about their business are shared with the practitioner. At the end of this stage, a data extract is then requested which consists of all the files and fields that will be required for the remainder of this project.
 
Data Audit
 
Data audits have been discussed in the past and are a core prequisite in any data discovery exercise. At this stage of the process, the practitioner attempts  to become “intimate” with the data which describes a much stronger relationship with data than the standard phrase of “ data knowledge”.
 
Once the data extract is received by the practitioner , the data is then loaded into their system whereby  standardized reports are produced that essentially provide the following results:
 
·        Data completeness or coverage  as indicated by the number of missing values in a variable
·        How do  values or outcomes distribute within a  given variable
·        Data inconsistencies and data gaps. Do values change overtime and are there groups of records where certain data anamolies exist  

From these results, the quality of the data can be assessed in terms of determining what information to use in a future analytics exercise. The results also yield files would be linked together since one objective in this phase is to create an analytical file where we have one record per one customer. With these links determined on how the analytical file would be created, the variable creation exercise can commence. This represents the most labor-intensive portion and arguably most important component of the  entire discovery process as it is here that meaningful variables are created which  can be used in any future analytics exercise.
 
Besides the exhaustive reports from the data audit, a summary level report is produced which indicates the major findings from the data audit. This would indicate what are the current gaps within the data environment and how it might be improved. Some of these gaps could to some extent be filled by data overlays. Good examples of this are Stats Can data for business to consumer type analytics and perhaps Dun and Bradstreet or Info-Canada data for business to business type analytics. A good example of how data overlays might fill a gap is if income was a key component in any analysis but simply unavailable within the current data environment. Its use here at the postal area level as opposed to being used at the individual level would be  a secondary option in attempting to derive insights based on income. Once this data audit exercise is completed, some preliminary analysis is then conducted which essentially commences our journey into analytics.

Preliminary Analysis
 
Note how we use the word preliminary here as our objective is to acquire a basic level knowledge or understanding of their current business using our data analytics expertise complemented by our domain knowledge. Obviously, as time goes on and the analytics discipline becomes more entrenched, the level and sophistication of knowledge will increase. 

Our first exercise is to understand at a very basic level how customers are different. A value segmentation or RFM segmentation is a tool we use to stratify customers into deciles with  decile 1 representing the highest performing and  decile 10 representing the lowest performing  customers. From this stratification, we can then arrive value-based or RFM-based segments. Under each scheme, though, we can determine how relevant the 80/20 rule is(i.e. do 20% of my customers account for 80% of my business).
 
This analysis also helps to point out the migration and retention opportunities. For example, a sensitivity analysis using results from this segmentation analysis and a series of different migration(conversion) rates  would determine the $ opportunity in migrating customers from a lower performing segment to a higher performing segment.  Retention analysis would use somewhat the same approach by looking at  a high performing segment and then determine the $ lost when looking at different potential defection rates. Of course, any sensitivity analysis which looks at a range of  migration rates or defection rates should be reasonable. The reasonableness of these numbers, though, would have been confirmed through  results and comments from key stakeholders that were gleamed from the upfront preparation stage.   
 
Once these segments are determined, profiling of each of these segments helps to better  understand the key customer characteristics that comprise a given segment. Profiles can be used in two ways. Strategically, the information can be used to develop communication and channel strategies which are more engaging to the mindset of that particular group. For instance, a particular profile of a high value customer may comprise the following characteristics:
 
·        Older
·         wealthier
·        Tend to engage more in  online media
 
A heavier online strategy speaking to the concerns of wealthier retirees or people nearing retirement would certainly be very appropriate here.
 
The second way is tactically where the information is used to create lists of customers that most resemble the profiled segment. Here, we target those low to medium value customers that tend to be older, wealthier, and more engaged online since these customers exhibit the characteristics of a high value customer.
 
Cohort analysis represents another type of unique analysis as its purpose is to explore  new customer behaviour  over a number of years. Typically, this kind of analysis commences with customers that were new five years ago and then tracks their spending behaviour behaviour in subsequent years. The same analysis is then conducted in subsequent years. Obviously, as we move  forward in time, we have less tracking history with new customers. Through this kind of analysis, we can begin to discern retention,migration,and upsell patterns amongst these new customers. At the same time, we may also find that there are unique behavioural patterns that are associated with a certain cohort group.
 
Depending on the nature of the industry, basket analysis is another form of analysis to better understand the “event” behaviour of the customer’s purchase. The analysis can reveal that the mix of products purchased can vary between customers. For example, this  product mix will in most cases vary depending on whether they are a high medium or low value customer. Migration opportunities can use the insights generated from this analysis by promoting products that are first relevant for the lower value segment and secondly that will migrate them to the higher value segments.
 
New customers will also exhibit different  product mix patterns  when compared to older customers. As indicated above, the same migration type of migration analysis is conducted here by identifying those relevant products which should be promoted  so that new customers are on a path towards becoming higher value customers.
 
Recommendations
 
With the completion of the preliminary analysis, the actual required work for this discovery exercise is now finished. It is now time to consolidate all this work into a comprehensive document . This document contains all the actual details concerning results which are used to support the many findings and insights generated from this exercise. But  despite the comprehensiveness of the report, it is of no use if it does not clearly lay out actionable next steps.  These actionable next steps are focused in two areas. The first area consists of the data strategy and what the organization needs to do to improve its information environment in order to more fully optimize the analytics discipline. The data strategy needs to be proactive in addressing not only the existing  analytical needs but also any potential analytics needs due to the changing nature of analytics.   
 
 
The second area focuses on what specific analytics activities should be undertaken in the first year and in what priority so that the company can effectively commence its analytics journey. The analytics strategy should be short-term in nature so as to provide an initial benchmark from which to commence this discipline. Keeping a  shorter term perspective on analytics just simply recognizes the underlying fact that things can change very quickly within organizations. Obviously, this short-term focus will identify activities that will yield the quickest wins in terms of ROI but also activities that will generate much learning.  It is this second component of learning which ensures that the longer term ROI is not being overlooked.   
 
By having both a data strategy that optimizes the use of analytics complemented with a set of analytics activities that begin the organization’s analytics journey, the organization has a roadmap or path in which to begin this journey.  With this kind of mindset, investing in a data discovery exercise is an easy decision as long as this process is undertaken with the due diligence and discipline as outlined in this article.