Next-Level Data Science: Enabling Optimization with Gurobi in Snowflake: Part One

February 26, 2024

•

0 min read

•

William Wirono

Optimization & Simulation

No items found.

Snowflake

Gurobi

Your step-by-step guide to integrating Gurobi, the powerhouse of mathematical optimization, with Snowflake.

Snowflake is a cutting-edge cloud data platform designed to revolutionize how businesses manage and analyze their data. With unmatched scalability, performance, and user-friendliness, Snowflake empowers organizations to extract insights and foster innovation from their data like never before.

Snowpark Container Services (SPCS)

An exciting addition to Snowflake’s features is Snowpark Container Services (SPCS). SPCS, currently in public preview as of February 2024, enables users to package, execute, and optimize containerized workloads as OCI images directly within Snowflake. This breakthrough opens avenues for executing advanced decision science workflows seamlessly within Snowflake, providing a unified architecture for enterprise operations.

This allows you to optimize workflows like:

In this guide, we will cover how to run Gurobi within the Snowflake environment using Snowpark Container Services (SPCS). This method also serves as an alternative approach for those who want to run Python packages not natively available in the Snowpark Anaconda Distribution. For any questions or more information on how Aimpoint can provide decision support by developing state-of-the-art models using Gurobi + Snowflake, please feel free to reach out here.

What you will need

Docker Desktop
Anaconda or VS Code with Python 3.10 installed
Support from your ACCOUNTADMIN to complete the initial account setup

At the end of this guide, you will have:

A Jupyter Notebook service hosted within Snowpark Container Services, with Gurobi installed, to run optimization models within Snowflake

Setting up Snowpark Container Services (SPCS)

Follow Snowflake’s quickstarts guide to complete the initial SPCS setup. The guide provides a detailed overview of the SPCS infrastructure and step-by-step instructions on how to get started.

Some practical tips when provisioning SPCS:

You might encounter steps that require certain privileges. In such cases, you will need to request your ACCOUNTADMIN to grant you the following privileges to execute the commands:
- CREATE DATABASE ON ACCOUNT
- CREATE COMPUTE POOL ON ACCOUNT
- CREATE INTEGRATION ON ACCOUNT
- BIND SERVICE ENDPOINT ON ACCOUNT

At several instances, you will be asked to provide <snowflake_registry_hostname>. It should follow the format <ORG>-<ACCOUNT-NAME>.registry.snowflakecomputing.com. To obtain the <ORG> and <ACCOUNT-NAME>, you can run the following SQL commands, respectively, in a Snowflake SQL worksheet:

SELECT CURRENT_ORGANIZATION_NAME();
SELECT CURRENT_ACCOUNT_NAME();

<repository_hostname> is the same as your <snowflake_registry_hostname>
When creating the dockerfile, make sure to add gurobipy on the RUN pip3 install line with the rest of the packages you need:

FROM python:3.10
LABEL author=""

RUN apt-get update && \
    apt-get install -y python3-pip

RUN pip3 install JPype1 jupyter pandas numpy gurobipy "snowflake-snowpark-python[pandas]" snowflake-connector-python

RUN useradd -ms /bin/bash jupyter   

USER jupyter
WORKDIR /home/jupyter 

EXPOSE 8888   

ENTRYPOINT ["jupyter", "notebook",
            "--allow-root", 
            "--ip=0.0.0.0",
            "--port=8888",
            "--no-browser",
            "--NotebookApp.token=''",
            "--NotebookApp.password=''"]

Once you’ve pushed the Docker image to Snowflake, double-check that the image is in your Snowflake Image Repository by running the following SQL query in a SQL worksheet:

CALL 
SYSTEM$REGISTRY_LIST_IMAGES('/<DB_CONTAINER_NAME>/PUBLIC/IMAGE_REPO');

This will allow you to see the status of the push. Please note that the push can take ~10-15 minutes or more depending on the size of the image.

Once the service is “READY”, you can run the URL (i.e. ingress_url) from the SQL output above in a browser. This allows you to access the Jupyter Notebook Service and run your uploaded notebooks.

Notes on SPCS Container Mounting

In the YAML configuration file provided in the Snowflake quickstart guide, we provided a mounting location to allow a directory to be shared between Snowflake’s internal stage location, which is stored in @volumes of your chosen database’s public schema, and the active container, which is the /home/jupyter directory of the Jupyter Notebook Service. This configuration implies that the @volumes stage will persist and store artifacts in the container.

If you navigate to the @volumes stage in Snowsight, you will observe a table of directories. A new directory, @volumes/jupyter-snowpark, is created when the Jupyter Snowpark Service was established. Consequently, any files uploaded to @volumes/jupyter-snowpark will be accessible within the container’s /home/jupyter directory.

Figure 1. Mounting location in Snowflake which persist in container’s /home/jupyter directory

Uploading Jupyter Notebook to Mounting Location and Running Gurobi Model

Navigate to the @volumes/jupyter-snowpark location as shown in Figure 1. Upload your Jupyter Notebook within the jupyter-snowpark location. Include a Gurobi model in the notebook to run on Snowflake.
If you have an active Gurobi license, you can also upload your gurobi.lic file to the jupyter-snowpark folder. You will need to let Gurobi know where you store the gurobi.lic file. Add the following code in your Jupyter Notebook before you initialize your Gurobi model or environment to explicitly specify the gurobi.lic location:

import os
os.environ[“GRB_LICENSE_FILE”] = “/home/jupyter/gurobi.lic”

3. Run the ingress_url in your browser and login with your credentials. A Jupyter Service with your uploaded Jupyter Notebook should be visible. Open and run your Jupyter Notebook as you would normally run locally.

Figure 2. Gurobi model ran within Jupyter Notebook in a Snowpark Container Service

Notes on Gurobi license: The ability to execute models in SPCS using Gurobi is contingent upon the specific type of license available to you. You can run models with up to 2000 variables or constraints with a demo license that comes out-of-the-box with a gurobipy installation. For comprehensive details regarding licensing terms and permissions, we recommend navigating to this site or contacting Gurobi directly.

Suspending Compute Pool

When the model execution is complete, you can stop the services and suspend the compute pool to avoid incurring additional costs by running the command below in a Snowflake SQL worksheet:

ALTER COMPUTE POOL <COMPUTE_POOL_NAME> SUSPEND;

Before suspending the compute pool, it is vital that you have closed all browsers with the ingress_url. Otherwise, the compute pool will not suspend, especially when the auto-resume setting is set to active.

To continue your journey, read part two, where we discuss how the process can be further streamlined using Snowsight.

Start Your Gurobi Journey on Snowflake Today with Aimpoint Digital

Seize the opportunity to revolutionize your optimization processes with Gurobi on Snowflake now! Whether you’re exploring new optimization solutions or contemplating a shift from Python to Snowpark, our dedicated Applied Optimization and Simulation team is here to guide you. Contact us today through the form below!

Author

William Wirono

Senior Data Scientist

Read Bio

Next-Level Data Science: Enabling Optimization with Gurobi in Snowflake: Part One

Snowpark Container Services (SPCS)

Setting up Snowpark Container Services (SPCS)

Notes on SPCS Container Mounting

Uploading Jupyter Notebook to Mounting Location and Running Gurobi Model

Suspending Compute Pool

Start Your Gurobi Journey on Snowflake Today with Aimpoint Digital

Related reading

From Complexity to Clarity: How Snowflake Notebooks Transforms Optimization Workflows

Using Snowsight: Streamlining Gurobi Powered Decision-Making in Snowflake Part Two

How to Tackle Uncertainty in Your Supply Chain with Mathematical Optimization

Let’s talk data.
We’ll bring the solutions.

Next-Level Data Science: Enabling Optimization with Gurobi in Snowflake: Part One

Snowpark Container Services (SPCS)

Setting up Snowpark Container Services (SPCS)

Notes on SPCS Container Mounting

Uploading Jupyter Notebook to Mounting Location and Running Gurobi Model

Suspending Compute Pool

Start Your Gurobi Journey on Snowflake Today with Aimpoint Digital

Related reading

From Complexity to Clarity: How Snowflake Notebooks Transforms Optimization Workflows

Using Snowsight: Streamlining Gurobi Powered Decision-Making in Snowflake Part Two

How to Tackle Uncertainty in Your Supply Chain with Mathematical Optimization

Let’s talk data. We’ll bring the solutions.

Let’s talk data.
We’ll bring the solutions.