Friendship ended with Flask, now Deserve is my best friend

Introducing Deserve: A Python nanoframework for serving ML models

3 min readAug 4, 2023

Serving machine learning models is an essential part of deploying AI applications. Traditional web frameworks like Flask and FastAPI have been popular choices for serving ML models due to their versatility and ease of use. However, as model serving becomes more prevalent, machine learning developers are looking for even more lightweight and efficient solutions to minimize the deployment overhead.

In this blog post, I present Deserve, a tool that takes a unique approach to model serving.

The Problem with Traditional Web Frameworks

Flask and FastAPI are both powerful web frameworks with rich features and a vast user base. However, when it comes to machine learning, these frameworks have some drawbacks that may lead to unnecessary complexities:

1. HTTP Concepts Overhead: Traditional web frameworks are built around HTTP concepts, such as handling requests, responses, headers, routes, and resources. While these concepts are useful for general web development, they introduce overhead when serving ML models because the task is more focused on RPC (Remote Procedure Call) rather than RESTful interactions.

2. Unnecessary Dependencies: Flask and FastAPI are comprehensive frameworks with numerous features that are not essential for a simple model serving application. These additional dependencies for tasks like routing and template rendering can increase the project’s size and memory footprint, making it less lightweight.

3. REST Endpoint-Based Design: In Flask and FastAPI, endpoints are defined to handle specific routes and HTTP methods. While this design is flexible, it may not be the most suitable approach for model serving, where you mainly need a single endpoint to handle model predictions.

Introducing Deserve

Deserve is a nanoframework designed to address the specific needs of serving ML models. It takes a minimalistic and efficient approach, offering a straightforward solution that is faster and more lightweight. Let’s explore some key features of Deserve that set it apart:

Remote Procedure Call Architecture

Deserve adopts an RPC architecture that simplifies the model serving process significantly. Unlike traditional web frameworks that rely on endpoints and paths, Deserve only requires the host information to serve a model. This straightforward design eliminates the need to make decisions about routes and resources, streamlining the process of serving ML models.

JSON Data Exchange

Data exchange in Deserve is done using JSON payloads. Clients can send JSON data to the server, and the server responds with JSON as well. This design ensures compatibility with Python objects on the server side. All conversions between Python objects and JSON are handled under the hood, reducing the burden on the developer.

Asynchronous Support

Deserve is built with asynchronous support, allowing it to handle multiple concurrent requests efficiently. Asynchronous programming is essential for scalability, especially in high-demand scenarios.

Lightweight and Simple

Deserve is designed to be as lightweight as possible, pulling in only the essential components required for serving models. This simplicity results in a smaller memory footprint and faster response times compared to more comprehensive web frameworks.

Installation and Quickstart

Installing Deserve is as simple as running a pip command:

$ pip install deserve

You will also need an ASGI server like Uvicorn or Hypercorn. For this example, we’ll use Hypercorn:

$ pip install hypercorn

Let’s now look at a quick example of how to use Deserve to serve a sentiment analysis model using the 🤗 Transformers library:

# Save this as example.py
import deserve
from transformers import pipeline

# Load your model
classifier = pipeline('sentiment-analysis')

@deserve
async def predict(payload: object) -> object:
    return classifier(payload)

To run the server, use the names of your file (example.py) and the function (predict):

$ hypercorn example:predict

[INFO] Running on http://127.0.0.1:8000

You can now make predictions using your preferred client:

$ curl localhost:8000 --data '["Deserve is the simplest.", "You deserve it!"]'

[{"label": "POSITIVE", "score": 0.799}, {"label": "POSITIVE", "score": 0.998}]

Conclusion

If you are looking to simplify your model deployments and reduce unnecessary overhead, give Deserve a try and experience the benefits of a nanoframework built with ML in mind.

Check out the project on GitHub!