Automating Accounts Payable with Snowflake Cortex

Table of contents
Partner with
Aimpoint Digital
Meet an Expert

Automating Accounts Payable with Snowflake Cortex

Spoiler Alert: Our Snowflake Cortex solution reduced hours spent on manual data entry by 60%.

What you might not know about your favorite Snowflake Data Superhero (for clarity, me) is that alongside my role as Snowflake Solutions Architect at Aimpoint Digital, I double up as Treasurer of The Parochial Church Council of the Ecclesiastical Parish of St. Bartholomew the Great (a mouthful, I know).

St. Bartholomew the Great is the oldest church in the City of London (902 years old, at time of writing) and with a very old building, a growing congregation, plenty of events, fundraising, weddings, funerals, blessings, etc... etc... our poor bookkeeper was a bit overwhelmed.

Hence, I decided to enlist the help of Snowflake’s Cortex. What if AI could read and process all our accounts payable invoices?

Our solution looks a bit like this:

First, I will offer a “meta” explanation, and then deep dive into what exactly is going on here.

Meta-Explanation

1. Invoices are emailed to the church, and Make.com (an orchestration platform) will send that to Google Cloud Platform and trigger a function

2. In Google Cloud, an API Gateway triggers a function that moves the invoice to Snowflake, and there triggers a Python Stored Procedure

3. That Stored Procedure feeds the contents of the invoice, alongside helpful historical data, to Snowflake Cortex and asks it to extract what we need from the invoice:

  • Contact
  • Invoice Reference
  • Invoice Date
  • Due Date
  • Line items, including:
    • Account codes
    • Tax rates

4. This is then returned to Make.com which:

  • Sends the invoice to Xero
  • Sends an email to the finance team

What Makes This Difficult

While Point 3 might sound like some simple text extraction, we do need Cortex to think about the task. The general ledger doesn’t need every single invoice line item, it needs to summarize and attribute to the correct account.

Say, for example, we receive an invoice with the following line items:

However, our general ledger doesn’t need to see the above. Instead, the general ledger needs to see the total credited to each account. And so the above, should look something like:

The above does not represent a realistic scenario but is purely for illustrative purposes.

Given translating the invoice line items to those that ought to be put into the ledger requires tribal knowledge of the business (or charity) around accounts, tax and how things are being tracked, it’s necessary to introduce RAG to the workflow.

Nitty-Gritty Explanation

Step 1:

Nice and simply, when an email is sent to Accounts Payable by a verified sender, Make.com will:

1. Send the invoice to Google Cloud Platform

2. Make an API call to Google’s API Gateway

Steps 2 & 3:

API Gateway triggers a Cloud Function that uses Secret Manager to grab the credentials for the Snowflake account and logs in to Snowflake, and then triggers a stored procedure.

 

Steps 4, 5 & 6

We have data being loaded in from Xero to Snowflake, via Fivetran, and then transformation jobs running in dbt that build up our general ledger.

The general ledger, by definition, has all our historical invoices in ledger format. Meanwhile, Snowflake has a copy of every invoice file in the same internal stage as our new invoice.

Snowflake allows us to create directory tables on our stages – these are metadata tables which give us useful information about our files. Given the files are named by their Xero invoice_id, that gives us a join key we can use to combine our general ledger with the directory table.

Step 7

As the invoice comes in, what we want to do is to use the directory table to grab our new and unprocessed invoice, and then use the PARSE_DOCUMENT() function to retrieve the text of the invoice.

PARSE_DOCUMENT() is a Snowflake Cortex function that provides OCR capabilities within SQL. When calling it, your statement will look something like:

SELECT TO_VARCHAR(
    SNOWFLAKE.CORTEX.PARSE_DOCUMENT(
        '@<database_name>.<schema_name>.<stage_name>',
        '<document_name>.pdf',
        {'mode': 'LAYOUT'})
    ) AS DOCUMENT_TEXT;

In the above, using LAYOUT, I’m extracting the text of the document while hoping to maintain the shape of it (Cortex will output markdown).

Step 8 

We then have a Cortex Search service running on top of the joined table. My colleague, Elizabeth Khan, explains how they work in this blog (https://www.aimpointdigital.com/blog/snowflake-cortex-search-rag-systems), and so here, it should suffice to say that Cortex Search provides a keyword and vector search engine that allows you to “fuzzy” search over data. Normally, such an engine requires developers to handle embedding, but Cortex Search takes care of that for us.

Once we have the document text, we can use Cortex Search to retrieve any invoices it believes are similar in terms of products supplied, or vendor type. This is helpful when dealing with new suppliers for whom we might not have any previous history. 

Steps 9 & 10

A recently released feature, that is very helpful for this task, is structured outputs (https://docs.snowflake.com/en/user-guide/snowflake-cortex/complete-structured-outputs). The Xero API (https://developer.xero.com/documentation/api/accounting/invoices#post-invoices) expects a certain format for any data we send, for example:

{
  "Type": "ACCREC",
  "Contact": {
    "ContactID": "eaa28f49-6028-4b6e-bb12-d8f6278073fc"
  },
  "Date": "\/Date(1518685950940+0000)\/",
  "DateString": "2009-05-27T00:00:00",
  "DueDate": "\/Date(1518685950940+0000)\/",
  "DueDateString": "2009-06-06T00:00:00",
  "LineAmountTypes": "Exclusive",
  "LineItems": [
    {
      "Description": "Consulting services as agreed (20% off standard rate)",
      "Quantity": "10",
      "UnitAmount": "100.00",
      "AccountCode": "200",
      "DiscountRate": "20"
    }
  ]
}

This means we need Cortex to produce an output that conforms to a pre-defined schema. We can do so by providing the above as a Cortex COMPLETE response format. For example:

response_format = {
    "type": "json",
    "schema": {
        "type": "object",
        "properties": {
            "Type": {
                "type": "string",
                "description": "The type of the invoice, e.g., 'ACCREC' or 'ACCPAY'"
            },
            "Contact": {
                "type": "object",
                "properties": {
                    "ContactID": {
                        "type": "string",
                        "description": "Unique identifier for the contact"
                    }
                },
                "required": ["ContactID"]
            },
            "Date": {
                "type": "string",
                "description": "Unix timestamp date in JSON /Date(...)\\/ format"
            },
            "DateString": {
                "type": "string",
                "description": "ISO 8601 date format for invoice date"
            },
            "DueDate": {
                "type": "string",
                "description": "Unix timestamp due date in JSON /Date(...)\\/ format"
            },
            "DueDateString": {
                "type": "string",
                "description": "ISO 8601 format for invoice due date"
            },
            "LineAmountTypes": {
                "type": "string",
                "description": "Whether amounts are 'Inclusive', 'Exclusive', or 'NoTax'"
            },
            "LineItems": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "Description": {
                            "type": "string",
                            "description": "Line item description"
                        },
                        "Quantity": {
                            "type": "string",
                            "description": "Quantity of the item or service"
                        },
                        "UnitAmount": {
                            "type": "string",
                            "description": "Unit amount of the item or service"
                        },
                        "AccountCode": {
                            "type": "string",
                            "description": "Accounting code associated with the line item"
                        },
                        "DiscountRate": {
                            "type": "string",
                            "description": "Percentage discount for the item"
                        }
                    },
                    "required": ["Description", "Quantity", "UnitAmount", "AccountCode", "DiscountRate"]
                }
            }
        },
        "required": ["Type", "Contact", "Date", "DateString", "DueDate", "DueDateString", "LineAmountTypes", "LineItems"]
    }
}

This then ensures that we get the output we need in the format we need.

Steps 11 & 12:

Finishing off, the response is then returned to the Google Cloud Function, which is then returned to the API Gateway, which is then returned to Make.com.

Now that we have the requisite details and line items, we can then have Make.com trigger the Xero API and submit a draft invoice, for the finance team to review.

Once approved, that then goes into Snowflake, via Fivetran, to form part of the historic data accessed by Cortex Search at some point in the future.

 

Error Checking: 

Call me paranoid, but I do like to have some error checking in my orchestration tools.

Concluding

There are a great many repeated processes that can be automated by Snowflake Cortex. While our bookkeeper remains overwhelmed, she is, at least, less overwhelmed today thanks to it.

At Aimpoint Digital, we have many talented engineers who are specialists in GenAI and the Snowflake platform. If you would like a similar system to automate away document processing tasks and increase productivity, get in touch.

Author
Christopher Marland
Christopher Marland
Snowflake Solutions Architect
Read Bio

Let’s talk data.
We’ll bring the solutions.

Whether you need advanced AI solutions, strategic data expertise, or tailored insights, our team is here to help.

Meet an Expert