Environmental, social and corporate governance (ESG) operations and compliance can be handled by going outside the typical box and turning to data lakes to retain and manage a firm’s ESG data. However, capabilities that are not native to data lakes are also required to ensure high-quality ESG data, namely matching and mastering related entity and securities data, and support ESG investment goals.
ESG data can come from multiple sources, which may result in discrepancies about similar pieces of information. The modeling of this data can happen at the instrument level, the issuer level or the fund level. The data also needs to be formatted such that it can be associated with different kinds of identifiers, such as ISINs, LEIs (legal entity identifiers) and hierarchies.
What to look for in an ESG data solution
Firms seeking ESG data solutions want these solutions to retrieve ESG data about issuers of securities, which requires matching instrument codes with those issuers. An ESG data solution should maintain, normalize, match, consolidate and merge this data into linked sets, so it can be retrieved using such identifiers. For this reason, firms that have opted for a data lake design for their ESG data management are complementing their data framework with lightweight securities and entity masters. A further refinement of this design is to automate the data pipelines that feed cloud data warehouses, often from cloud data marketplaces.
For a data lake-based solutions to handle all the elements that make up ESG materiality maps (as developed by the Sustainability Accounting Standards Board) and taxonomies, an important design feature is often the data schema/model specific to to those maps and standards. Such schemas can be accommodated in the data lake in the form of a lake house. Again, ensuring native cross-referencing to related data is key to efficiency.
Data Lakes for ESG Investment Goals
A data lake house can cross-reference data from within the lake, including metadata about the ESG information, plus from outside the lake, such as securities and entity data. This way, if one is looking at the airline industry, for example, information relevant to ESG performance by airlines, such as carbon emissions, can be compiled and associated with specific securities from individual issuers.
Other data, outside the data lake house’s normal sources, such as human resources data, can also be incorporated into a data lake-based ESG solution. Such a data pipeline can supply point-in-time internal data about portfolios, traders and accounts. This is useful because investment managers or portfolio managers may come and go but users will still get a consistent record of the securities and the transactions.
Another use case that would require customized setup within a data lake house would be to make it possible to analyze the data to determine if fund objectives, investment objectives or investment mandates are being met – in the present time or at specific points of time in the past. This helps support “what-if” analysis as well.
Aside from all of the aforementioned organizational and analytical uses of data lakes for ESG data, cost savings on data storage are also a consideration. Those looking to reduce what they spend might be able to benefit from the specialized data sets that some vendors on cloud data market places are pricing and delivering in new, granular ways that avoid paying for lots of superfluous data in larger data sets. Those who take a total cost of ownership (TCO) approach to data lakes pursue the same cost savings goal by making the process of managing this data more efficient. Again, efficient data pipeline services play a role here.
Overall, there will be many designs for capabilities to handle ESG data and its associated use cases. Data lakes and ancillary capabilities and services are likely to become a popular approach.