Find more at GeneAka Marketplace With Recent Update on 21/09

How to Host Your Own API of Open Language Models For Free


A Step-by-Step Tutorial for Creating Inference API and Deploying on Colab

Image by storyset on Freepik

Suppose you’re the one who is passionate about LLM technology, chances are, you’ve already created a few applications to assist your work or daily life, by utilizing commercial APIs like GPT-4 API. In the meantime, with the remarkable improvement in performance, open-source language models such as Llama2 are bound to catch your attention, inviting you to experiment and evaluate them.

Unfortunately, most solo developers don’t afford an expensive GPU to host open models locally and aren’t ready to invest in a dedicated cloud for high usage cost online. In such cases, relying on platforms like Google Colab becomes essential. Google Colab Notebook provides the necessary infrastructure for experimenting and evaluating open-source language models for free or low cost by price calculation on a runtime basis. The notebook design with resources is quite helpful, however, it’s hard to create any application with a decent user interface or even harder to share the access of your runtime with others on Colab.

Then, the idea of making free RESTful APIs for open language models comes to my mind.

1. Project Overview

In this project, we are going to deploy an open-source language model “Dolly-v2–3b” on Colab with a free T4 GPU and port its inference to a RESTful API exposed for online access.

a) API Definition

The desired API definition is as followed:

API Endpoint:

POST /chatbot

Description: This API endpoint allows you to interact with the chatbot model to generate text responses based on a given prompt.

Amazon Most Wished For


  • Method: POST
  • Endpoint:
  • Content-Type: application/json

Request Body:

"llm": str,
"temperature": float,
"top_k": int,
"prompt": str


  • llm (string, required): The model name for the chatbot

Original Post>

//Last UPDATE ON 18/09
Today's deals

Leave a Reply