a direct integration with Hugging Face

Prodigy is an annotation tool made by Explosion, an organization well often called the creators of spaCy. It’s a totally scriptable product with a big community around it. The product has many features, including tight integration with spaCy and lively learning capabilities. However the most important feature of the product is that it’s programmatically customizable with Python.

To foster this customisability, Explosion has began releasing plugins. These plugins integrate with third-party tools in an open way that encourages users to work on bespoke annotation workflows. Nonetheless, one customization specifically deserves to be celebrated explicitly. Last week, Explosion introduced Prodigy-HF, which offers code recipes that directly integrate with the Hugging Face stack. It has been a much-requested feature on the Prodigy support forum, so we’re super excited to have it on the market.

Features

The primary most important feature is that this plugin lets you train and re-use Hugging Face models in your annotated data. Meaning in case you’ve been annotating data in our interface for named entity recognition, you may directly fine-tune BERT models against it.

What the Prodigy NER interface looks like.

After installing the plugin you may call the hf.train.ner recipe from the command line to coach a transformer model directly on your personal data.

python -m prodigy hf.train.ner fashion-train,eval:fashion-eval path/to/model-out --model "distilbert-base-uncased"

This can fine-tune the distilbert-base-uncased model for the dataset you have stored in Prodigy and reserve it to disk. Similarly, this plugin also supports models for text classification via a really similar interface.

python -m prodigy hf.train.textcat fashion-train,eval:fashion-eval path/to/model-out --model "distilbert-base-uncased"

This offers plenty of flexibility since the tool directly integrates with the AutoTokenizer and AutoModel classes of Hugging Face transformers. Any transformer model on the hub might be fine-tuned on your personal dataset with only a single command. These models will probably be serialised on disk, which suggests that you would be able to upload them to the Hugging Face Hub, or re-use them to aid you annotate data. This could save plenty of time, especially for NER tasks. To re-use a trained NER model you should use the hf.correct.ner recipe.

python -m prodigy hf.correct.ner fashion-train path/to/model-out examples.jsonl

This gives you an identical interface as before, but now the model predictions will probably be shown within the interface as well.

Upload

The second feature, which is equally exciting, is that you would be able to now also publish your annotated datasets on the Hugging Face Hub. That is great in case you’re fascinated with sharing datasets that others would really like to make use of.

python -m prodigy hf.upload  /

We’re particularly keen on this upload feature since it encourages collaboration. People can annotate their very own datasets independently of one another, but still profit once they share the information with the broader community.

More to return

We hope that this direct integration with the Hugging Face ecosystem enables many users to experiment more. The Hugging Face Hub offers many models for a wide selection of tasks in addition to a wide selection of languages. We actually hope that this integration makes it easier to get data annotated, even in case you’ve got a more domain specific and experimental use-case.

More features for this library are on their way, and be happy to succeed in out on the Prodigy forum if you’ve more questions.

We might also wish to thank the team over at Hugging Face for his or her feedback on this plugin, specifically @davanstrien, who suggested so as to add the upload feature. Thanks!

Source link

a direct integration with Hugging Face

Features

Upload

More to return

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

That is essentially the most misunderstood graph in AI

Deep Learning with Proteins

Anthropic’s ad-free campaign takes aim at OpenAI

Construct Your Own Custom LLM Memory Layer from Scratch

Illustrating Reinforcement Learning from Human Feedback (RLHF)

a direct integration with Hugging Face

Features

Upload

More to return

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.