Bad Data, Bad Decisions
Many of my blog articles have focused on data – from big data to security – and for good reason. Without data, there are no analytics.
But worse than no data, is bad or incomplete data. Using bad or incomplete data gives people confidence to make bold decisions only to find out later they were bad decisions.
The challenge is data in its raw form is often unusable. We’ve all heard the adage “garbage in, garbage out.” If you’ve ever tried to build analytics on raw data, you know that even the most structured ERP systems allow a fair amount of “garbage in!” Even when using data from a data warehouse, some companies assume the data is already clean, which is often not the case, or if it is clean, assume it needs to be combined with dirty data from other sources.
Therefore, a very important step to data integration is – what people in our business call – “transformation.” As part of the Extract/Transform/Load (ETL) process, the “transform” step is where you put in the smarts to recognize that ‘Atlnta’ and ‘Atnalta’ mean ‘Atlanta.’ You may also want to enrich the data to recognize latitude and longitude for ‘Atlanta’ to plot data on a map. Now, apply that logic to multiple fields in the same table, or across multiple tables, and you have scripted potentially hundreds of instances in multiple transformation scripts. But, after seeing the analytics, the user finds someone using ‘Atl’ and realizes that should be included as well. Imagine the nightmare of having to go into each of the hundreds of instances to modify the logic every time they find a new way to misspell ‘Atlanta.’ As well, based on the size of the data and/or need for frequent refreshes, you might need to utilize incremental loading, making the logic even more complex to maintain.
In talking to Jeff Robbins and Larry Aaron, Qlik Consultants who have developed hundreds of ETL scripts for Qlik customers, this is a pretty common scenario. Jeff, like many of our consultants, became tired of developing scripts field-by-field and table-by-table, so he built his own utility to aid in future ETL work. He calls this utility the “Structured ETL Engine for Qlik,” or SEEQ for short. Using a rules-based approach, he sets up the transform logic in an externalized table, then uses the same rules table every time the logic is needed. So, given my example above, instead of taking hours and hours to update hundreds of instances, he could make one small change in the rules table which then ensured every instance was correctly transformed in the next data refresh. No scripting, no fuss!
So, how long does it take to set up SEEQ initially? Larry was working with a customer who wanted to load approximately 300 tables for each of their 200 clients, resulting in load scripts for almost 60,000 tables. After taking a few weeks to build scripts and getting only a handful completed, the customer contacted Qlik and asked for help. Larry came in and in about a week, had the SEEQ logic in place and tested successfully on a subset of the clients. The customer took over and tested the remaining clients, but suffice to say, utilizing SEEQ had the transformation step fully scripted in less than 20% of the time it would have taken to develop it from scratch.
More importantly, the customer had the right infrastructure in place to help them extend and maintain their ETL process with minimal effort going forward.
If you want to learn more about SEEQ or how Qlik Consulting can help you build your ETL efficiently, visit qlik.com/consulting.