Replacing qualitative guesswork with hard and fast numbers makes observations about reference data quality more accurate
If your firm has an enterprise data management (EDM) system in place, you may question the need for data quality metrics capability on top of that EDM system.
EDM systems, however, only provide a generalized picture of the quality of data your firm gets. To be able to see trends in data quality, investment firms, both on the buy and sell sides, need data quality metrics. The key to knowing that your data quality metrics are good is whether they accurately show which direction your data quality is going – getting better, getting worse, or holding steady.
What is data quality?
Data quality is the practical application of methods and controls to ensure that data is fit for the processes and data consumers that support business operations.
What are data quality metrics?
Data quality metrics are the units of measure that capture the degree to which that data is fit for purpose and consumption, and the change in levels of quality over time.
How do data quality metrics operate?
Data quality metrics provide a means for data managers on the front lines to relate and quantify data quality, thereby creating greater transparency and, in turn, greater accountability within the firm. When upper level managers make a decision to onboard new data vendors or deploy different data management strategies, data operations managers and data stewards serve as the first line of defense to ensure the highest levels of data quality and accuracy throughout the process. The burden falls on them to establish clear and actionable metrics around data quality. Without such metrics, data accuracy can be shockingly lacking.
The metrics must inherently show quality over time, and which way the quality is trending. Metrics must also provide information that can flow from the instrument or entity level into a security master analysis, where one can compare the quality in a firm’s fixed-income data to that of its equities data. And these metrics can cover everything from field-level business rules to accuracy of CUSIP numbers.
The absence of data quality metrics
Firms in the past, even as recently as March 2019, have learned the hard way what the cost of poor data is in the form of fines and deadlines for reaching data quality thresholds, and the lesson learned is that it’s not enough to manage exceptions. It’s important also to quantify the overall quality level of a data set, make changes, fully test the impact, understand the implications of poor quality, and to address the underlying cause, be that with the data provider or the way the data is being processed and used within the business.
If data quality metrics are not applied, the quality of data gets evaluated on a solely qualitative basis. A qualitative analysis may estimate that reference data is 90% accurate, which should lead one to wonder what state the other 10% is in – it could be adequate, it could be bad, it could be very bad. In this qualitative 90%/10% reference data quality scenario, the quality can quickly drop from “pretty good” to “we don’t know at all.”
Qualitative assessment requires a manual, subjective effort, while quantitative data quality metrics are generated from automated checks and balances, and report the quality of that data in a digestible and actionable way. The end consumers of reference data generated from EDM systems that don’t apply data quality metrics will complain, and do complain, about the quality of that data. Addressing that part of the reference data lifecycle requires a hard fight for data quality, and putting checks and balances in place earlier prevents extra manual effort later, as well as the prospect of bad data getting sent on to downstream applications, clients or regulators.
Considerations and benefits
Certain key areas need to be considered and included in any data quality metrics assessment:
- Timeliness – is the data arriving on time from its source, and if not, what downstream process delays is it causing?
- Completeness – is the data complete enough for its intended use, e.g. to process transactions securities need certain attribute fields to be complete?
- Consistency – how do the data feeds from different vendors compare with the gold copy, what are the differences between the vendors, and do ‘resolved’ issues crop up repeatedly over time?
- Validity – what level of validity is being achieved at the data item and record level, also for logical or dependent combinations of values and for data at rest, such as in a gold copy? At what rate are validity levels decaying over time?
- Processing – which quality issues are preventing the EDM systems from carrying out its processes?
- Distribution – which quality issues, including breaks in messaging for downstream schema, are preventing publishing cycles and delaying downstream processes
Weightings or criticality should be assigned to measures of quality at the data field or rule level. This ensures quality issues can be addressed with an appropriate level of priority or escalation
If these attributes are better measured over time, then firms get a benchmark for how much they need to be or can be improved. Having that kind of measurement to work with helps firms budget their time and staff resources more appropriately by easily identifying problematic or troublesome areas within the data and also tracking improvement over time. In turn, this helps firms better manage their data vendor relationships by being able to view quality metrics for each vendor, and in some cases, to each vendor, side by side to see how much the firm can trust the data being delivered.
Firms can get statistics evaluating their data governance or stewardship, rather than just relying on monthly or quarterly reports, and data steering committee input. With transparent data management, teams can benchmark performance more easily and, from the perspective of cost, the automation put in place to ensure and measure data quality makes firms more efficient.
Practical applications
BCBS 239, the global financial risk data aggregation and risk reporting reporting (RDARR) regulation, includes principles on data quality. So firms, especially on the sell side, know they must be able to track their data quality and report that information if questioned by regulatory authorities. Being prepared about data quality to prevent regulatory penalties is certainly a benefit, with the sell side increasingly focused on data lineage. Buy-side firms are mainly concerned about showing consistency of data quality and reliability of data.
Whichever side of the market you are on, the numerous benefits provided by data quality metrics should make your decision to add this capability to your EDM systems and overall operations an easy choice.