Machine Learning Operations 

One of the most commonly overlooked issues in the data science and ML workstream is ML Ops and a high-quality production plan. We have seen countless clients stuck in “PoC land,” with numerous experimental builds completed by their ML/DS teams. Unfortunately, they never put the builds into a production environment due to a lack of domain knowledge and a plan before launching the build. Aimpoint Digital has significant experience in helping teams bring machine learning projects to a place where they can drive continuous value.  

Language Learning Models (LLM) and ChatGPT

As GPT models and variants become more prevalent, the inevitable question will arise around how to productionize these complex models in a business environment. The below diagram illustrates some of the typical ML Ops considerations:   

These same principles must be applied when thinking about launching a GPT model into production.   

Aimpoint Digital has pushed GPT models into production using various technologies, including the Huggingface model hub and AWS EC2 instance. In one specific use case, we deployed the GPT-J open-source model into production using an EC2 instance.

Remember, that model serving currently has much more overhead cost than typical machine learning projects. Models require GPUs for inference and GPUs or TPUs for training. These costs can rack up quickly. Having a knowledgeable team who understands project latency needs and concurrency requirements will allow you to ensure you are only paying for what you use in a production environment.    

There are a few standard best practices to keep in mind while deploying a GPT model: 

  • Hardware selection: Ensure you have the right inference nodes set up for hosting to minimize cost and technical overhead  
  • Fine-tuning: As the image above illustrates, models must be “refreshed” as new data becomes available. Having a training pipeline to care for needs like this requires more complex tech stacks. Some companies are spinning up to solve issues like this specifically. Being able to quickly and efficiently fine-tune your model in a secure environment is business-critical to ensuring that problems such as overall drift and hallucinations don’t happen more frequently over time. 
  • Evaluating model performance and drift: We need a metric to track our models’ performance in a production environment. Commonly used metrics, such as BLEU, ROUGE, and METEOR, can help follow performance during training and production.   
  • Latency and compute tracking: Building on the first bullet, make sure that you are actively monitoring data streams and computing to keep production costs reasonable   
  • R&D: These models are constantly changing as research explores improvements in training and architecture. You should write the code in a modularized format to easily pull new packages and developments into preexisting code bases. 

Many more considerations need to be put into place when putting these models in a production environment to ensure that they can accurately respond and scale as business users engage with them more frequently. This new technology can potentially revolutionize the business world, but only if you implement proper considerations to keep them refreshed and healthy.

If you would like to learn more about how to productionize a large language model, reach out to one of our LLM experts today!