Cracking the Code: Enhancing Development finance understanding with artificial intelligence

UCA CNRS IRD CERDI
pierre.beaucoral@uca.fr

*Indicates Equal Contribution

Abstract

Analyzing development projects is crucial for understanding donors’ aid strategies, recipients’ priorities, and to assess development finance capacity to adress development issues by on-the-ground actions. In this area, the Organisation for Economic Co-operation and Development’s (OECD) Creditor Reporting System (CRS) dataset is a reference data source. This dataset provides a vast collection of project narratives from various sectors (approximately 5 million projects). While the OECD CRS provides a rich source of information on development strategies, it falls short in informing project purposes due to its reporting process based on donors’ self-declared main objectives and pre-defined industrial sectors. This research employs a novel approach that combines Machine Learning (ML) techniques, specifically Natural Language Processing (NLP), an innovative Python topic modeling technique called BERTopic, to categorise (cluster) and label development projects based on their narrative descriptions. By revealing existing yet hidden topics of development finance, this application of artificial intelligence enables a better understanding of donor priorities and overall development funding and provides methods to analyse public and private projects narratives.

Key Contributions

  • Uncovering Hidden Themes in Development Finance: Machine learning-driven topic modeling reveals 406 distinct topics, offering a finer-grained classification of development finance beyond standard OECD CRS categories.
  • AI-Powered Transparency & Monitoring: Interactive visualizations display trends in climate finance, humanitarian aid, and niche topics like Great Apes conservation projects, enabling data-driven decision-making.
  • More Accurate Climate Finance Estimates: Comparison between OECD Rio markers and text-based classifications shows that conventional methods may overestimate or misclassify climate-related development aid.
  • Donor and Recipient Analysis in Real-Time: Graph-based donor-recipient relationship maps illustrate bilateral aid flows, revealing regional disparities and strategic priorities.
  • Tracking Development Trends Over Time: Temporal analysis identifies shifts in global aid priorities, from traditional sectors like microfinance to emerging areas like Indigenous rights or sustainable agriculture.

This research enhances development finance transparency through interactive AI-driven analytics, providing policymakers and researchers with monitoring tools for a more precise and evidence-based approach to development aid distribution.

Our Methodology: AI-Driven Clustering for Development Finance

This research leverages cutting-edge Natural Language Processing (NLP) and Machine Learning to analyze development finance at an unprecedented level of detail. Instead of relying on self-declared project classifications (such as OECD’s Rio markers), our approach applies AI-driven topic modeling to extract meaningful patterns from millions of project descriptions.

🔍 Step 1: Data Collection & Preprocessing

We analyze 5 million+ project descriptions from the OECD CRS dataset, extracting relevant text fields and cleaning the data to ensure high-quality processing.

🧠 Step 2: AI-Powered Text Embedding

Using a BERT-based transformer model, we convert project descriptions into dense vector representations, capturing contextual meaning across multiple languages.

📊 Step 3: Unsupervised Clustering

The HDBSCAN clustering algorithm is applied to group projects into 406 distinct thematic clusters, enabling a more precise classification than predefined OECD sectors.

🏷 Step 4: Automated Labeling

Each cluster is labeled using Class-based TF-IDF and fine-tuned with Large Language Models (LLMs), ensuring meaningful and interpretable topic descriptions.

📈 Step 5: Interactive Visualization

The results are displayed in dynamic, interactive graphs, allowing users to explore:

  • 📌 Trends Over Time: See how financing for climate, health, and other topics evolves.
  • 🌍 Topic by donors: Map donor-recipient aid relationships and thematic distributions.
  • 📑 Clustered output Analysis: Contrast AI-classified finance with OECD’s traditional classifications.

By combining AI-powered clustering, statistical validation, and interactive visualizations, this project provides an innovative, data-driven framework for analyzing global development finance.

Interactive outputs

BibTeX


        @misc{beaucoral2025crackingcodeenhancingdevelopment,
      title={Cracking the Code: Enhancing Development finance understanding with artificial intelligence}, 
      author={Pierre Beaucoral},
      year={2025},
      eprint={2502.09495},
      archivePrefix={arXiv},
      primaryClass={econ.GN},
      url={https://arxiv.org/abs/2502.09495}, 
}