Data... is it really that bad?
We complain about bad data , but there is so much we can do with the existing good data! I try to dig a bit further.
Abhishek Dwivedi
6/6/20213 min read


I am very sure all of us compliance professionals (and also others in the banking area) have come across this statement : “Garbage in garbage out”. There is a kind of paranoia around the impact of “bad data”, so much so that before you even start a new implementation, there will be this BIG question mark on this data problem. I can very well understand the concerns as well as the reasoning behind this big question. Having implemented several Transaction Monitoring (TM) systems (across different clients), sourcing data from 40+ back office systems, I think I have seen a fair amount of flavors. Surprisingly enough, I still believe it’s not the data which is a problem, but the way we consume and interpret data.
The core issue
Particularly in the compliance domain, you have to be extra careful while sourcing data. If you take decisions on wrong data you can end up paying heavy penalty on these wrong decisions which were not wrong as such, but were not based on right underlying data. I remember one instance where, for a long time, a team of investigators expected a certain kind of alerts on their customer but never got them. After almost 6 months they escalated this issue. On the face of it, everything seemed normal, both sending and receiving parties confirmed everything worked and was mapped as expected. Upon deep dive I found out the root cause, an extra “-“!! The negative amounts were supposed to be sourced as absolute value (e.g. 1000, 2000) with a sign Credit/Debit, however the back-office system, sourced values as -ve (e.g. -1000, -2200 etc.) values because for them this is the right way to denote debits. All the intermediate systems let the data through as there was no check for the values to be “whole” numbers. After this finding there was a huge “look back” performed to avoid any missed signals.
My example may sound too simplistic but this is the core issue of data problems. All parties, in their defense, map the data as best as possible to their understanding. However the more intermediate systems a data passes through, the more problem/variables it introduces and hence skewing the end results.
Data as an issue, really?
This is where I want to challenge the perception around data. From my experience it’s not the data which is an issue. The core issue is mis-alignment between experts (from different systems) not understanding the end usage of data. Our experts generally get a data spec. sheet (with format, field details etc. ) and are expected to export their data in a specific format and their job is over. Moreover there are so many different layers till the data reaches an end consuming system (e.g. TM), the true meaning of data is transformed. This data may be technically imported error free but the context may be lost completely, kind of like apples and oranges theory. And this is one of the situations when we proudly quote : “Garbage in garbage out”!
What is the solution?
It’s time we take control of the situation in our hands. I will take back to you in my consulting days when, before starting any data mapping exercise, I would get all parties together and show them the end goal/objective. From there onwards aligned everyone on the significance of every data element expected by my TM system. This alignment was most important because the expert from the very first back-office system gets the bigger picture and knows why we want something and what is the intention of doing it.
I would like to summarize some of the tips which may help you:
Try figuring out the real problem - is it the data or the interpretation of data?
Align your experts together and show them the bigger picture. Yes they may just be from the back-office team but unless they understand the significant impact of wrong data, they may treat your compliance/TM system as just any other source
Try to minimize the intermediate layers as much as possible. You cannot afford repeated cycles of data fixing and time lost. I still remember a brain-storming session I had where, after the back office, there were at least 4 intermediate systems through which the data passed before reaching the TM system! And then everyone wondered why a simple field such as “First Name” of the customer gets missed (just as an example)
Last but not least, be bold and decide to reject garbage. Let’s change the phrase to : “No garbage in, just value out”. I know this is ambitious, but why not!
