To continue the theme of designing for an ever changing set of requirements, I wanted to talk about the second item we all know will happen. Exactly how and when it will happen is open for debate but it will happen, unless you prevent it. The second known is that the users will want to have the data more frequently. For now I am referring to the same data as opposed to increasing the granularity of the data (that comes a bit later). The best way to illustrate this is with a true story from my travels.
To continue the theme of designing for an ever changing set of requirements, I wanted to talk about the second item we all know will happen. Exactly how and when it will happen is open for debate but it will happen, unless you prevent it. The second known is that the users will want to have the data more frequently. For now I am referring to the same data as opposed to increasing the granularity of the data (that comes a bit later). The best way to illustrate this is with a true story from my travels.
I was in South America visiting a retail account. I asked about the frequency of the data they have to analyze. The business lead in the meeting responded that they get data every month. The IT lead responded they get data every day. This is quite a disparity. After a bit of back and forth, the IT lead sighed heavily and said “we load daily data every month”. I had to shake my head as I pointed out that if that was the case, they only had monthly data. Once again we have to take the user perspective and getting May 5th data in June does not help you make decisions on May 7th. So what was the problem in getting data that supported the end user needs?
Editor’s note: Rob Armstrong is an employee of Teradata. Teradata is a sponsor of The Smart Data Collective.
In this case it was a matter of the historically processes put in place by the IT organization. In order to load the data the tables need to be offline. Since the total monthly load took over 48 hours (including the summary table and index maintenance as well) to be accomplished they could not run this job frequently. Much of this was a legacy problem but no one ever bothered to either optimize the processes using newer techniques or calculate that running the job with a much smaller daily amount would not take the long downtime. Again it was simply a case of this is how we do it and it is too hard to change.
So we can reasonably expect that the users will ask for their data to become fresher until it reaches the point that critical business data will be required with “real time”. I actually do not like term as it means something completely different for many different parts of an organization. I like to say that if the frequency of data loading has to be the half life of a user’s ability to respond to it. Therefore, if a user can get a piece of data and take a direct action on it within 2 minutes then they should be getting the data updated every minute. This is not a hard and fast rule but it at least establishes a starting point for the conversations.
Keeping our “exterior” perspective it also must be noted that the loading is not the end point. The data has to be readily available to the end user so any ancillary maintenance to take it from loaded into detail to being accessible to queries must also be included.
This gets to another one of those interesting paradoxes. Things like indexing, summary tables, cubes, and what not are suppose to improve performance for the end users. Unfortunately as the data frequency trends to “real time” those very same “performance optimizations” become a barrier and will actually increase the latency from data available to data accessible.
So as in last installment, I’ll ask a question. If you are in IT, how ready are you for the community to ask that the data become “more frequent”? If you load weekly, are you ready for daily? If you load daily are you ready for hourly?
If you are in the business arena, what could you do to increase the value of your processes if you got the data “more frequently”? Better inventory management, better response to customer problems, decrease the lag enough that you can implement analytics into the call center? Finally, if you are in the executive or leadership roles, what would these improved processes mean in terms of increase revenue or decreased costs?
OK, one more readily known requirement to be designed for and that will be the next blog topic.
Finally, thank you to all those who provided feedback and comments on my original efforts. I trust that over the next few postings I’ll find my voice and give you all a place to pick up some useful ideas and insights.