Sentence Transformers is joining Hugging Face!

-


Tom Aarsen's avatar


Today, we’re announcing that Sentence Transformers is transitioning from Iryna Gurevych’s Ubiquitous Knowledge Processing (UKP) Lab on the TU Darmstadt to Hugging Face. Hugging Face’s Tom Aarsen has already been maintaining the library since late 2023 and can proceed to steer the project. At its latest home, Sentence Transformers will profit from Hugging Face’s robust infrastructure, including continuous integration and testing, ensuring that it stays up-to-date with the newest advancements in Information Retrieval and Natural Language Processing.

Sentence Transformers (a.k.a. SentenceBERT or SBERT) is a preferred open-source library for generating high-quality embeddings that capture semantic meaning. Since its inception by Nils Reimers in 2019, Sentence Transformers has been widely adopted by researchers and practitioners for various natural language processing (NLP) tasks, including semantic search, semantic textual similarity, clustering, and paraphrase mining. After years of development and training by and for the community, over 16,000 Sentence Transformers models are publicly available on the Hugging Face Hub, serving greater than 1,000,000 monthly unique users.

“Sentence Transformers has been an enormous success story and a culmination of our long-standing research on computing semantic similarities for the entire lab. Nils Reimers has made a really timely discovery and has produced not only outstanding research outcomes, but in addition a highly usable tool. This continues to affect generations of scholars and practitioners in natural language processing and AI. I might also prefer to thank all of the users and particularly the contributors, without whom this project wouldn’t be what it’s today. And at last, I would love to thank Tom and Hugging Face for taking the project into the longer term.”

  • Prof. Dr. Iryna Gurevych, Director of the Ubiquitous Knowledge Processing Lab, TU Darmstadt

“We’re thrilled to officially welcome Sentence Transformers into the Hugging Face family! Over the past two years, it’s been amazing to see this project grow to massive global adoption, because of the incredible foundation from the UKP Lab and the amazing community around it. That is only the start: we’ll keep doubling down on supporting its growth and innovation, while staying true to the open, collaborative spirit that made it thrive in the primary place.”

  • Clem Delangue, co-founder & CEO, Hugging Face

Sentence Transformers will remain a community-driven, open-source project, with the identical open-source license (Apache 2.0) as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged. The project will proceed to prioritize transparency, collaboration, and broad accessibility.



Project History

The Sentence Transformers library was introduced in 2019 by Dr. Nils Reimers on the Ubiquitous Knowledge Processing (UKP) Lab at Technische Universität Darmstadt, under the supervision of Prof. Dr. Iryna Gurevych. Motivated by the restrictions of ordinary BERT embeddings for sentence-level semantic tasks, Sentence-BERT used a Siamese network architecture to supply semantically meaningful sentence embeddings that may very well be efficiently compared using cosine similarity. Because of its modular, open-source design and robust empirical performance on tasks comparable to semantic textual similarity, clustering, and knowledge retrieval, the library quickly became a staple within the NLP research toolkit, spawning a variety of follow-up work and real-world applications that depend on high-quality sentence representations.

In 2020, multilingual support was added to the library, extending sentence embeddings to greater than 400 languages. In 2021, with contributions from Nandan Thakur and Dr. Johannes Daxenberger, the library expanded to support pair-wise sentence scoring using Cross Encoder and Sentence Transformer models. Sentence Transformers was also integrated with the Hugging Face Hub (v2.0). For over 4 years, the UKP Lab team maintained the library as a community-driven open-source project and provided continued research-driven innovation. During this era, the project’s development was supported by grants to Prof. Gurevych by the German Research Foundation (DFG), German Federal Ministry of Education and Research (BMBF), and Hessen State Ministry for Higher Education, Research and the Arts (HMWK).

In late 2023, Tom Aarsen from Hugging Face took over maintainership of the library, introducing modernized training for Sentence Transformer models (v3.0), in addition to improvements of Cross Encoder (v4.0) and Sparse Encoder (v5.0) models.



Acknowledgements

The Ubiquitous Knowledge Processing (UKP) Lab at Technische Universität Darmstadt, led by Prof. Dr. Iryna Gurevych, is internationally recognized for its research in natural language processing (NLP) and machine learning. The lab has a protracted track record of pioneering work in representation learning, large language models, and knowledge retrieval, with quite a few publications at leading conferences and journals. Beyond Sentence Transformers, the UKP Lab has developed plenty of widely used datasets, benchmarks, and open-source tools that support each academic research and real-world applications.

Hugging Face would love to thank the UKP Lab and all past and present contributors, especially Dr. Nils Reimers and Prof. Dr. Iryna Gurevych, for his or her dedication to the project and for entrusting us with its maintenance and now stewardship. We also extend our gratitude to the community of researchers, developers, and practitioners who’ve contributed to the library’s success through model contributions, bug reports, feature requests, documentation improvements, and real-world applications. We’re excited to proceed constructing on the strong foundation laid by the UKP Lab and to work with the community to further advance the capabilities of Sentence Transformers.



Getting Began

For those latest to Sentence Transformers or trying to explore its capabilities:



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x