Welcome fastText to the Hugging Face Hub

-


Sheon Han's avatar

Juan Pino's avatar


fastText is a library for efficient learning of text representation and classification. Open-sourced by Meta AI in 2016, fastText integrates key ideas which have been influential in natural language processing and machine learning over the past few many years: representing sentences using bag of words and bag of n-grams, using subword information, and utilizing a hidden representation to share information across classes.

To hurry up computation, fastText uses hierarchical softmax, capitalizing on the imbalanced distribution of classes. All these techniques offer users scalable solutions for text representation and classification.

Hugging Face is now hosting official mirrors of word vectors of all 157 languages and the most recent model for language identification. Because of this using Hugging Face, you possibly can easily download and use the models with a number of commands.



Finding models

Word vectors for 157 languages and the language identification model could be present in the Meta AI org. For instance, you will discover the model page for English word vectors here and the language identification model here.



Widgets

This integration includes support for text classification and have extraction widgets. Check out the language identification widget here and have extraction widget here!

text_classification_widget
feature_extraction_widget



The way to use

Here is tips on how to load and use a pre-trained vectors:

>>> import fasttext
>>> from huggingface_hub import hf_hub_download

>>> model_path = hf_hub_download(repo_id="facebook/fasttext-en-vectors", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.words

['the', 'of', 'and', 'to', 'in', 'a', 'that', 'is', ...]

>>> len(model.words)

145940

>>> model['bread']

array([ 4.89417791e-01,  1.60882145e-01, -2.25947708e-01, -2.94273376e-01,
       -1.04577184e-01,  1.17962055e-01,  1.34821936e-01, -2.41778508e-01, ...])

Here is tips on how to use this model to question nearest neighbors of an English word vector:

>>> import fasttext
>>> from huggingface_hub import hf_hub_download

>>> model_path = hf_hub_download(repo_id="facebook/fasttext-en-nearest-neighbors", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.get_nearest_neighbors("bread", k=5)

[(0.5641006231307983, 'butter'), 
 (0.48875734210014343, 'loaf'), 
 (0.4491206705570221, 'eat'), 
 (0.42444291710853577, 'food'), 
 (0.4229326844215393, 'cheese')]

Here is tips on how to use this model to detect the language of a given text:

>>> import fasttext
>>> from huggingface_hub import hf_hub_download

>>> model_path = hf_hub_download(repo_id="facebook/fasttext-language-identification", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.predict("Hello, world!")

(('__label__eng_Latn',), array([0.81148803]))

>>> model.predict("Hello, world!", k=5)

(('__label__eng_Latn', '__label__vie_Latn', '__label__nld_Latn', '__label__pol_Latn', '__label__deu_Latn'), 
 array([0.61224753, 0.21323682, 0.09696738, 0.01359863, 0.01319415]))



Would you wish to integrate your library to the Hub?

This integration is feasible because of our collaboration with Meta AI and the huggingface_hub library, which enables all our widgets and the API for all our supported libraries. When you would really like to integrate your library to the Hub, we’ve a guide for you!



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x