Sentence Transformers within the Hugging Face Hub

Over the past few weeks, we have built collaborations with many Open Source frameworks within the machine learning ecosystem. One which gets us particularly excited is Sentence Transformers.

Sentence Transformers is a framework for sentence, paragraph and image embeddings. This enables to derive semantically meaningful embeddings (1) which is beneficial for applications corresponding to semantic search or multi-lingual zero shot classification. As a part of Sentence Transformers v2 release, there are plenty of cool recent features:

Sharing your models within the Hub easily.
Widgets and Inference API for sentence embeddings and sentence similarity.
Higher sentence-embeddings models available (benchmark and models within the Hub).

With over 90 pretrained Sentence Transformers models for greater than 100 languages within the Hub, anyone can profit from them and simply use them. Pre-trained models may be loaded and used directly with few lines of code:

from sentence_transformers import SentenceTransformer
sentences = ["Hello World", "Hallo Welt"]

model = SentenceTransformer('sentence-transformers/paraphrase-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)

But not only this. People will probably wish to either demo their models or play with other models easily, so we’re completely happy to announce the discharge of two recent widgets within the Hub! The primary one is the feature-extraction widget which shows the sentence embedding.

sentence-transformers/distilbert-base-nli-max-tokens

This model is currently loaded and running on the Inference API.

But seeing a bunch of numbers may not be very useful to you (unless you are capable of understand the embeddings from a fast look, which could be impressive!). We’re also introducing a brand new widget for a standard use case of Sentence Transformers: computing sentence similarity.

In fact, on top of the widgets, we also provide API endpoints in our Inference API which you can use to programmatically call your models!

import json
import requests

API_URL = "https://api-inference.huggingface.co/models/sentence-transformers/paraphrase-MiniLM-L6-v2"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

data = query(
    {
        "inputs": {
            "source_sentence": "That may be a completely happy person",
            "sentences": [
                "That is a happy dog",
                "That is a very happy person",
                "Today is a sunny day"
            ]
        }
    }
)

Unleashing the Power of Sharing

So why is that this powerful? In a matter of minutes, you possibly can share your trained models with the entire community.

from sentence_transformers import SentenceTransformer



model.save_to_hub("my_new_model")

Now you’ll have a repository within the Hub which hosts your model. A model card was mechanically created. It describes the architecture by listing the layers and shows use the model with each Sentence Transformers and 🤗 Transformers. It’s also possible to check out the widget and use the Inference API right away!

If this was not exciting enough, your models will even be easily discoverable by filtering for all Sentence Transformers models.

What’s next?

Moving forward, we need to make this integration much more useful. In our roadmap, we expect training and evaluation data to be included within the mechanically created model card, like is the case in transformers from version v4.8.

And what’s next for you? We’re very excited to see your contributions! In case you have already got a Sentence Transformer repo within the Hub, you possibly can now enable the widget and Inference API by changing the model card metadata.

---
tags:
  - sentence-transformers
  - sentence-similarity 
---

In case you haven’t any model within the Hub and wish to learn more about Sentence Transformers, head to www.SBERT.net!

Would you prefer to integrate your library to the Hub?

This integration is feasible due to the huggingface_hub library which has all our widgets and the API for all our supported libraries. In case you would really like to integrate your library to the Hub, we’ve a guide for you!

References

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. https://arxiv.org/abs/1908.10084

Source link

Sentence Transformers within the Hugging Face Hub

Unleashing the Power of Sharing

What’s next?

Would you prefer to integrate your library to the Hub?

References

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Escaping the SQL Jungle

A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How you can Spot Them Early)

Learn how to Measure AI Value

Constructing Robust Credit Scoring Models (Part 3)

Sentence Transformers within the Hugging Face Hub

Unleashing the Power of Sharing

What’s next?

Would you prefer to integrate your library to the Hub?

References

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.