GenAI in Action: Transforming Unstructured Data into Value for Environmental Science Associates
Maximized the value of over 70 million documents through unstructured data ingestion, classification and GenAI, enabling faster & smarter decision-making across teams of environmental consultants.

The Challenge
Environmental Science Associates (ESA) manages over 70 million documents including contracts, proposals, resumes, and reports that contain highly valuable information and critical institutional knowledge. Despite the richness of this data, limited search and discovery capabilities have made much of this knowledge effectively inaccessible. For example, during the proposal generation and project staffing processes, teams frequently spent significant time recreating existing documents or analyses due to a lack of visibility into previously developed materials. This resulted in inefficiencies, lost productivity, duplicated effort, and delayed timelines, reducing the organization’s ability to fully capitalize on its historical work and expertise. The absence of intelligent document classification and retrieval highlighted the need for a scalable, AI-driven solution that could unlock ESA’s institutional knowledge, and provide teams with timely, context-aware insights.
Our Approach
We started with an unstructured data ingestion pipeline and a classification model that accurately categorized documents into predefined groups. Building on this foundation, we implemented a Retrieval-Augmented Generation (RAG) solution using LangGraph that enabled semantic, context-aware search across the classified document corpus. Users could retrieve relevant documents and specific passages using natural language queries rather than relying on exact keyword matches. The solution retrieved the most relevant documents along with synthesized, citation-backed answers, significantly reducing time spent searching. Our solution was supported by a comprehensive evaluation framework built on MLflow to ensure performance, accuracy, and reliability.
Results
The AI-powered model accurately categorized documents into defined groups with over 90% accuracy, significantly improving organization and enabling efficient search and retrieval.

The RAG solution delivered rapid, context-aware responses, allowing teams to surface relevant information across projects, people, and historical documents, reducing redundancy and saving time.

The team delivered a ground truth builder application to ESA, powered by Databricks Apps and MLflow, to accelerate ground truth dataset creation and enable more robust, consistent, and reliable evaluations over time.
.jpeg)
Key Takeaways
Related case studies
Let’s talk data.
We’ll bring the solutions.
Whether you need advanced AI solutions, strategic data expertise, or tailored insights, our team is here to help.


.jpg)