Databricks + Claude: Building Enterprise AI That Understands Your Data

May 15, 2026

•

0 min read

•

by

Jake Kohler

Oscar Giles

Artificial Intelligence

No items found.

Databricks

Anthropic

TL;DR

Anthropic’s Claude is driving a revolution across industries. One area in which Claude excels is building next-gen chat-based interfaces that customers can use to interact with your product; for example, customer support, data exploration, and workflow automation. To build these capabilities, Claude needs access to enterprise data and backend systems.

This is the concern addressed by Harness Engineering: defining how LLMs interact with your data, tools, and execution layers with explicit security and access boundaries. Databricks provides a strong foundation for harness engineering with Claude, enabling controlled, scalable and secure access to data where it already lives—in the Enterprise Lakehouse.

In a recent client engagement, Aimpoint Digital used Databricks’ Genie Spaces with Claude by introducing an MCP layer over multiple specialized spaces. Genie Spaces take text-based prompts and execute SQL queries to precisely answer user questions. Each space was scoped to a specific domain and isolated per customer, with the MCP server enforcing strict data isolation boundaries. Claude was used as a supervisor agent, selecting and coordinating between multiple spaces to answer client queries.

The result is a system where responsibilities are clearly separated: Databricks provides the harness - data access, constraints, and execution, while Claude provides reasoning and orchestration across domains. This allows a chat-based interface to operate over enterprise data without exposing underlying systems directly, while maintaining the security and structure required in a client-facing setting.

The Gap Between AI Demos and AI Systems

It usually starts the same way.

A team builds a prototype. You can ask a question about your data, and an LLM gives you an answer. It feels like magic until it doesn’t. The model misinterprets a table. It generates SQL that almost works. It misses key business logic that lives outside the raw data. And suddenly, confidence disappears. We see this pattern repeatedly. The issue isn’t that LLMs can’t generate answers. It’s that they don’t have access to production-grade quality data, metadata, and context. That gap becomes especially obvious in complex enterprise environments like SAP, where metadata, relationships, and business logic matter just as much as the raw tables themselves.

That’s exactly the problem we set out to solve in a recent engagement.

The Use Case: Text2SQL Over SAP Metadata at Scale

In this case, the client, a platform focused on SAP custom code migration, needed a way to quickly and accurately retrieve metadata across many isolated customer environments. Which programs have a lot of database write operations? How many custom programs have greater than 1000 lines of code? The answers to these questions determine the complexity and development time of a potential engagement.

At its core, the problem sounds simple: translate natural language into SQL.

Each SAP environment has:

Its own schema
Its own naming conventions
Its own business logic embedded in tables and relationships

And the system needed to:

Work across multiple environments
Respect strict isolation boundaries
Integrate into an existing multi-agent architecture
Provide reliable, explainable output for developers and beta testers

This isn’t just a “chat with your data” use case. It’s a data integration and management challenge that requires translating data into useful business logic while preserving environment-level isolation.

The Architecture: Genie Spaces, Databricks, and Claude

The backbone of the solution was built on Databricks, using Genie Spaces as the abstraction layer over each SAP environment. Each Genie Space wasn’t just a naïve Text2SQL implementation, it was a fully defined semantic layer:

Curated tables and columns
Metric views with embedded business logic and defined table relationships
Example SQL queries and benchmarking questions
Custom instructions for unique business jargon in each domain

These features are essential for an LLM to generate accurate and reliable outputs. This is where Databricks played a critical role. It allowed us to structure and govern metadata across environments, package and deploy reusable Genie Space templates, maintain isolation between client datasets, and create a consistent interface for downstream AI systems. With the data foundation set, we next turned to the reasoning layer: Claude.

Figure 1: Architecture leveraging Claude and Databricks' Genie Spaces

‍

Why Claude Was the Right Fit

In this system, Claude acted as the reasoning engine inside a larger architecture. The challenge wasn’t just generating SQL, it was interpreting user intent, navigating to the correct Genie Space or set of Genie Spaces, and ensuring a thorough answer that could synthesize and reason across multiple Genie Space outputs.

Claude’s strength in handling large, structured and unstructured context made a meaningful difference here. With Claude’s larger context windows, instead of forcing simplification, we could pass the entire context into the model: output tables, natural language answers, and user intent. This allowed the model to generate outputs that were accurate and multi-domain. With Databricks’ native support for Claude via foundation models, we were able to quickly plug into a governed data system with an industry leading LLM.

Closing the Loop: Feedback, Monitoring, and Iteration

One of the most overlooked parts of enterprise AI is what happens after the model generates an answer. In this case, we built custom and leveraged existing mechanisms to capture user feedback directly within the workflow:

Positive and negative feedback logging
Complete conversation tracing persisted to logging tables
Monitoring for admins to ensure stability over time

With logging tables and a monitoring user interface, we set the team up to succeed in the long-term. Conversations with both good and bad responses are available to learn from, monitoring dashboards help Genie Space admins keep performance high, and the product team understands what it takes to continuously improve the product after it launches.

Why This Matters (Beyond This Use Case)

It’s easy to treat this as a niche SAP example. It’s not. The same pattern shows up everywhere:

Complex schemas
Fragmented metadata
High cost of understanding data

What this architecture demonstrates is a repeatable approach:

Use Databricks to organize and govern data + metadata
Use Claude to reason over that context effectively
Use agent frameworks to embed AI into workflows

That combination is what turns AI from an interface into infrastructure.

The Bottom Line

Most AI systems fail not because the model is wrong, but because the system around it is incomplete. The combination of Databricks and Claude allowed us to build a trusted, enterprise-grade AI system where data is structured and governed, context is treated as a first-class object, models can reason, not just generate, and outputs are integrated into real workflows. That’s the difference between a demo and something you can actually deploy. Increasingly, it’s the difference between experimenting with AI, and building with it.

Author