A short time ago I published a playlist generator that I’d built using Sentence Transformers and Gradio, and I followed that up with a reflection on how I attempt to use my projects as effective learning experiences. But how did I actually construct the playlist generator? On this post we’ll break down that project and have a look at two technical details: how the embeddings were generated, and the way the multi-step Gradio demo was built.
As we’ve explored in previous posts on the Hugging Face blog, Sentence Transformers (ST) is a library that offers us tools to generate sentence embeddings, which have quite a lot of uses. Since I had access to a dataset of song lyrics, I made a decision to leverage ST’s semantic search functionality to generate playlists from a given text prompt. Specifically, the goal was to create an embedding from the prompt, use that embedding for a semantic search across a set of pre-generated lyrics embeddings to generate a relevant set of songs. This could all be wrapped up in a Gradio app using the brand new Blocks API, hosted on Hugging Face Spaces.
We’ll be taking a look at a rather advanced use of Gradio, so should you’re a beginner to the library I like to recommend reading the Introduction to Blocks before tackling the Gradio-specific parts of this post. Also, note that while I won’t be releasing the lyrics dataset, the lyrics embeddings can be found on the Hugging Face Hub so that you can mess around with. Let’s jump in! 🪂
Sentence Transformers: Embeddings and Semantic Search
Embeddings are key in Sentence Transformers! We’ve learned about what embeddings are and the way we generate them in a previous article, and I like to recommend checking that out before continuing with this post.
Sentence Transformers offers a big collection of pre-trained embedding models! It even includes tutorials for fine-tuning those models with our own training data, but for a lot of use-cases (such semantic search over a corpus of song lyrics) the pre-trained models will perform excellently right out of the box. With so many embedding models available, though, how will we know which one to make use of?
The ST documentation highlights most of the decisions, together with their evaluation metrics and a few descriptions of their intended use-cases. The MS MARCO models are trained on Bing search engine queries, but since in addition they perform well on other domains I made a decision any one among these could possibly be a great alternative for this project. All we want for the playlist generator is to seek out songs which have some semantic similarity, and since I don’t really care about hitting a specific performance metric I arbitrarily selected sentence-transformers/msmarco-MiniLM-L-6-v3.
Each model in ST has a configurable input sequence length (as much as a maximum), after which your inputs might be truncated. The model I selected had a max sequence length of 512 word pieces, which, as I came upon, is usually not enough to embed entire songs. Luckily, there’s a straightforward way for us to separate lyrics into smaller chunks that the model can digest – verses! Once we’ve chunked our songs into verses and embedded each verse, we’ll find that the search works a lot better.
To really generate the embeddings, you’ll be able to call the .encode() approach to the Sentence Transformers model and pass it an inventory of strings. You then can save the embeddings nonetheless you want – on this case I opted to pickle them.
from sentence_transformers import SentenceTransformer
import pickle
embedder = SentenceTransformer('msmarco-MiniLM-L-6-v3')
verses = [...]
corpus_embeddings = embedder.encode(verses, show_progress_bar=True)
with open('verse-embeddings.pkl', "wb") as fOut:
pickle.dump(corpus_embeddings, fOut)
To have the option to share you embeddings with others, you’ll be able to even upload the Pickle file to a Hugging Face dataset. Read this tutorial to learn more, or visit the Datasets documentation to try it out yourself! In brief, once you have created a brand new Dataset on the Hub, you’ll be able to simply manually upload your Pickle file by clicking the “Add file” button, shown below.
The final thing we want to do now is definitely use the embeddings for semantic search! The next code loads the embeddings, generates a brand new embedding for a given string, and runs a semantic search over the lyrics embeddings to seek out the closest hits. To make it easier to work with the outcomes, I also wish to put them right into a Pandas DataFrame.
from sentence_transformers import util
import pandas as pd
prompt_embedding = embedder.encode(prompt, convert_to_tensor=True)
hits = util.semantic_search(prompt_embedding, corpus_embeddings, top_k=20)
hits = pd.DataFrame(hits[0], columns=['corpus_id', 'score'])
Since we’re trying to find any verse that matches the text prompt, there’s a great probability that the semantic search will find multiple verses from the identical song. After we drop the duplicates, we’d only find yourself with a number of distinct songs. If we increase the variety of verse embeddings that util.semantic_search fetches with the top_k parameter, we will increase the variety of songs that we’ll find. Experimentally, I discovered that after I set top_k=20, I almost all the time get at the least 9 distinct songs.
Making a Multi-Step Gradio App
For the demo, I wanted users to enter a text prompt (or pick from some examples), and conduct a semantic search to seek out the highest 9 most relevant songs. Then, users should have the option to pick out from the resulting songs to have the option to see the lyrics, which could give them some insight into why the actual songs were chosen. Here’s how we will try this!
At the highest of the Gradio demo we load the embeddings, mappings, and lyrics from Hugging Face datasets when the app starts up.
from sentence_transformers import SentenceTransformer, util
from huggingface_hub import hf_hub_download
import os
import pickle
import pandas as pd
corpus_embeddings = pickle.load(open(hf_hub_download("NimaBoscarino/playlist-generator", repo_type="dataset", filename="verse-embeddings.pkl"), "rb"))
songs = pd.read_csv(hf_hub_download("NimaBoscarino/playlist-generator", repo_type="dataset", filename="songs_new.csv"))
verses = pd.read_csv(hf_hub_download("NimaBoscarino/playlist-generator", repo_type="dataset", filename="verses.csv"))
auth_token = os.environ.get("TOKEN_FROM_SECRET")
lyrics = pd.read_csv(hf_hub_download("NimaBoscarino/playlist-generator-private", repo_type="dataset", filename="lyrics_new.csv", use_auth_token=auth_token))
The Gradio Blocks API enables you to construct multi-step interfaces, which implies that you simply’re free to create quite complex sequences in your demos. We’ll take a have a look at some example code snippets here, but try the project code to see all of it in motion. For this project, we would like users to decide on a text prompt after which, after the semantic search is complete, users must have the power to decide on a song from the outcomes to examine the lyrics. With Gradio, this might be built iteratively by taking off with defining the initial input components after which registering a click event on the button. There’s also a Radio component, which can get updated to point out the names of the songs for the playlist.
import gradio as gr
song_prompt = gr.TextArea(
value="Running wild and free",
placeholder="Enter a song prompt, or select an example"
)
fetch_songs = gr.Button(value="Generate Your Playlist!")
song_option = gr.Radio()
fetch_songs.click(
fn=generate_playlist,
inputs=[song_prompt],
outputs=[song_option],
)
This manner, when the button gets clicked, Gradio grabs the present value of the TextArea and passes it to a function, shown below:
def generate_playlist(prompt):
prompt_embedding = embedder.encode(prompt, convert_to_tensor=True)
hits = util.semantic_search(prompt_embedding, corpus_embeddings, top_k=20)
hits = pd.DataFrame(hits[0], columns=['corpus_id', 'score'])
song_names = ...
return (
gr.Radio.update(label="Songs", interactive=True, decisions=song_names)
)
In that function, we use the text prompt to conduct the semantic search. As seen above, to push updates to the Gradio components within the app, the function just must return components created with the .update() method. Since we connected the song_option Radio component to fetch_songs.click with its output parameter, generate_playlist can control the alternatives for the Radio component!
You’ll be able to even do something just like the Radio component with the intention to let users select which song lyrics to view. Visit the code on Hugging Face Spaces to see it intimately!
Some Thoughts
Sentence Transformers and Gradio are great decisions for this sort of project! ST has the utility functions that we want for quickly generating embeddings, in addition to for running semantic search with minimal code. Getting access to a big collection of pre-trained models can also be extremely helpful, since we don’t have to create and train our own models for this sort of stuff. Constructing our demo in Gradio means we only need to deal with coding in Python, and deploying Gradio projects to Hugging Face Spaces can also be super easy!
There’s a ton of other stuff I wish I’d had the time to construct into this project, equivalent to these ideas that I’d explore in the longer term:
- Integrating with Spotify to robotically generate a playlist, and perhaps even using Spotify’s embedded player to let users immediately take heed to the songs.
- Using the **HighlightedText** Gradio component to discover the particular verse that was found by the semantic search.
- Creating some visualizations of the embedding space, like in this Space by Radamés Ajna.
While the song lyrics aren’t being released, I’ve published the verse embeddings together with the mappings to every song, so that you’re free to mess around and get creative!
Remember to drop by the Discord to ask questions and share your work! I’m excited to see what you find yourself doing with Sentence Transformers embeddings 🤗
Extra Resources
