An often-overlooked area of data management is the somewhat specialized concept of domain values. It’s such a fundamental part of maintaining data integrity and helping underscore the success of any data-driven initiative that I think it’s worth taking the time to explore why it’s so vital.
Domain values, sometimes also known as ‘enumerated values’, are the allowable set of attributes given to a specific field within a dataset. They describe, in a non-numeric but still structured form, certain characteristics of financial instruments, legal entities, corporate actions, and so on.
In other words, it’s an attribute that can take one of several pre-defined states which determine a more or less complex aspect of that instrument or entity or action.
Examples of well-known domain’ed attributes are the asset class itself or the coupon type. (Each data provider maintains clear, well-defined lists of asset classes and coupon types.) But there are, in fact, also extremely specialized domain’ed fields such as the ‘collateral aggregate type’ (e.g. whole loan, representative…), or the ‘call premium day-count basis’ (for various types of fixed income instruments).
Standards are also often defined in the form of domain values, such as the numeric or two- or three-character ISO codes for countries, to be used globally. And typically, classification schemes comprise domain values, like the GICS, NAICS, SIC, ISIC, NACE, etc.
So how are domain values different from normal values such as numbers or pre-form text? To put it frankly, data practitioners are absolutely obsessed with ensuring that domain values have highly detailed specificity and accuracy. If it’s not meeting their level of proprietary, well-defined standards, the data really becomes quite useless (not to mention risky).
Domain values are the ultimate quality control check for data. Essentially, a domain value is the common language that helps ensure correctness and usability of data in highly automated systems.
But even the most super-structured, organized sets of domain values can have anomalies – that is, values that shouldn’t exist. This then indicates something between “correct but unexpected” or outright wrong. So, just like you need to vet for accuracy with a numeric value, you need to do the same with a structured domain value. If you don’t, it will inevitably result in continuous problems down the processing chain which are quite challenging to resolve.
This is why domain values require special treatment throughout the entire processing chain – from the source where the data is gathered to the systems to the end-users themselves.