Accelerating Scope 3 Emissions Accounting: Relief Through an LLM

adminMarch 28, 2024

As interest in calculating and disclosing Scope 3 greenhouse gas emissions increases, emissions calculation methods are attracting attention. One of the more common Scope 3 calculation methods used by organizations is the spend-based method, which can be time-consuming and resource-intensive to implement. This article explores an innovative way to streamline Scope 3 GHG emissions estimates by leveraging AI and large language models (LLMs) to help classify financial transaction data for expenditure-based emissions drivers.

Why are Scope 3 emissions difficult to calculate?

Scope 3 emissions, also known as indirect emissions, include greenhouse gas emissions (GHG) that originate in an organization’s value chain and are therefore not under direct operational control or ownership. Simply put, these emissions come from external sources, such as emissions associated with suppliers and customers, and are outside the company’s core operations.

A 2022 CDP study found that for companies reporting to CDP, emissions from their supply chain are on average 11.4 times greater than their operational emissions.

The same study found that 72% of CDP responding companies reported only operational emissions (scope 1 and/or 2). Some companies attempt to estimate Scope 3 emissions by collecting data from suppliers and manually classifying the data, but challenges such as large supplier bases, supply chain depth, complex data collection processes, and significant resource requirements hinder progress. I receive it.

Accelerate time to insight using LLM for Scope 3 emissions estimation

One approach to estimating Scope 3 emissions is to utilize financial transaction data (e.g. expenditures) as a proxy for the emissions associated with purchased goods and/or services. Converting this financial data into a greenhouse gas emissions inventory requires information about the greenhouse gas emissions impact of the products or services you purchase.

USEEIO (United States Environmental Extension Input/Output) is a life cycle assessment (LCA) framework that tracks the economic and environmental flows of goods and services in the United States. USEEIO merges economic IO analysis with environmental data to provide a comprehensive dataset and methodology to estimate the environmental consequences associated with economic activities. Within USEEIO, goods and services are classified into 66 spending categories, called product classes, based on common environmental characteristics. These product classes are associated with emission factors that are used to estimate environmental impact using expenditure data.

The Eora Multi-region input-output (MRRIO) dataset is a globally recognized set of expenditure-based emission factors documenting intersectoral transfers across 15,909 sectors in 190 countries. The Eora factor set was adapted to the USEEIO classification of 66 summary classifications for each country. This involves mapping the 15,909 sectors found across Eora26 categories and a more detailed country sector breakdown to USEEIO 66 expenditure categories.

However, while expenditure-based commodity class level data provides an opportunity to help address challenges associated with Scope 3 emissions accounting, manually mapping large amounts of financial ledger entries to commodity classes is a time-consuming and error-prone process. no see.

This is where LLMs come into play. In recent years, remarkable progress has been made in creating a wide range of foundational language models for natural language processing (NLP). These innovations have demonstrated powerful performance compared to traditional machine learning (ML) models, especially in scenarios where labeled data is scarce. Harnessing the power of large-scale pre-trained NLP models combined with domain adaptation techniques to efficiently use limited data has significant potential to address challenges associated with accounting for Scope 3 environmental impacts.

Our approach includes a fine-tuned based model for recognizing environmentally extended input/output (EEIO) product classes of purchase orders or ledger entries written in natural language. We then calculate the emissions associated with expenditures using EEIO emission factors (emissions per dollar spent) taken from the Supply Chain GHG Emission Factors for US Goods and Industries for the US-centric dataset. Eora MRIO (Multi-region Input-Output) for global datasets. This framework helps companies streamline and simplify the process of calculating Scope 3 emissions.

Figure 1 illustrates the framework for Scope 3 emissions estimation using large-scale language models. This framework consists of four separate modules: data preparation, domain adaptation, classification, and emission computation.

Figure 1: Framework for estimating Scope3 emissions using large-scale language models.

We performed extensive experiments involving several state-of-the-art LLMs, including roberta-base, bert-base-uncased, and distilroberta-base-climate-f. We also explored non-foundational models based on TF-IDF and Word2Vec vectorization approaches. Our goal was to evaluate the potential of fundamental models (FMs) in estimating Scope 3 emissions using financial transaction records as a proxy for goods and services. Experimental results indicate that the fine-tuned LLM exhibits significant improvement over the zero-shot classification approach. It also outperforms existing text mining techniques such as TF-IDF and Word2Vec, providing performance equivalent to domain expert classification.

Figure 2: Comparison results of different approaches

Integrate AI into IBM Envizi ESG Suite to calculate Scope 3 emissions

Using LLM in the process of estimating Scope 3 emissions is a promising new approach.

We took this approach and embedded it into IBM® Envizi™ ESG Suite in the form of an AI-based feature using our NLP engine to help identify product categories in spend transaction descriptions.

As previously explained, Spending data is more readily available to organizations and is a common proxy for quantity of goods/services. However, challenges such as product recognition and mapping can seem difficult to solve. why?

First, because purchased products and services are described in various forms of natural language, product recognition through purchase order/ledger entry is very difficult.
Second, because there are millions of products and services for which expenditure-based emission factors are not available. This makes it very difficult, if not impossible, to manually map products/services to product/service categories.

This is where deep learning-based foundational models for NLP can be efficient for a wide range of NLP classification tasks when the availability of labeled data is insufficient or limited. Leveraging large-scale NLP models pre-trained with domain adaptation using limited data has the potential to support Scope 3 emissions calculations.

finish

In conclusion, calculating Scope 3 emissions with the support of LLM represents a significant advance in data management for sustainability. The promising results that can be achieved through advanced LLM recruitment highlight the potential to accelerate GHG emissions assessment. Practical integration into software like the IBM Envizi ESG Suite can help simplify the process while speeding up the time to insight.

See AI Assist in action within IBM Envizi ESG Suite.

Was this article helpful?

yesno

adminMarch 28, 2024