a tool to work with datasets using open AI models!

🧭TL;DR

Hugging Face AI Sheets is a brand new, open-source tool for constructing, enriching, and remodeling datasets using AI models with no code. The tool might be deployed locally or on the Hub. It helps you to use 1000’s of open models from the Hugging Face Hub via Inference Providers or local models, including gpt-oss from OpenAI!

Useful links

Try the tool free of charge (no installation required): https://huggingface.co/spaces/aisheets/sheets
Install and run locally: https://github.com/huggingface/sheets

What’s AI Sheets

AI Sheets is a no-code tool for constructing, transforming, and enriching datasets using (open) AI models. It’s tightly integrated with the Hub and the open-source AI ecosystem.

AI Sheets uses an easy-to-learn user interface, much like a spreadsheet. The tool is built around quick experimentation, starting with small datasets before running long/costly data generation pipelines.

In AI Sheets, latest columns are created by writing prompts, and you’ll be able to iterate as again and again as you would like and edit the cells/validate cells to show the model what you would like. But more on this later!

What can I exploit it for

You should use AI Sheets to:

Compare and vibe test models. Imagine you ought to test the newest models in your data. You possibly can import a dataset with prompts/questions, and create several columns (one per model) with a prompt like this: Answer the next: {{prompt}}, where prompt is a column in your dataset. You possibly can validate the outcomes manually or create a brand new column with an LLM as a judge prompt like this: Evaluate the responses to the next query: {{prompt}}. Response 1: {{model1}}. Response 2: {{model2}}, where model1 and model2 are columns in your dataset with different model responses.

Improve prompts on your data and specific models. Imagine you ought to construct an application to process customer requests and provides automatic answers. You possibly can load a sample dataset with customer requests and begin playing and iterating with different prompts and models to generate responses. One cool feature of AI Sheets is which you can provide feedback by editing or validating cells. These example cells shall be added to your prompts robotically. You possibly can consider it as a tool to fine-tune prompts and add a few-shot examples to your prompts very efficiently, by your data in real-time!

Transform a dataset. Imagine you ought to clean up a column of your dataset. You possibly can add a brand new column with a prompt like Remove extra punctuation marks from the next text: {{text}}, where text is a column in your dataset containing the texts you ought to clean up.

Classify a dataset. Imagine you ought to classify some content in your dataset. You possibly can add a brand new column with a prompt like Categorize the next text: {{text}}, where text is a column in your dataset containing the texts you ought to categorize.

Analyze a dataset. Imagine you ought to extract the foremost ideas in your dataset. You possibly can add a brand new column with a prompt like this: Extract an important ideas from the next: {{text}}, where text is a column in your dataset containing the texts you ought to analyze.

Enrich a dataset. Imagine you might have a dataset with addresses which might be missing zip codes. You possibly can add a brand new column with a prompt like this: Find the zip code of the next address: {{address}} (on this case, it’s essential to enable the “Search the online” option to make sure accurate results).

Generate an artificial dataset. Imagine you would like a dataset with realistic emails, but the info just isn’t available for data privacy reasons. You possibly can create a dataset with a prompt like this: Write a brief description of knowledgeable in the sector of pharma firms and name the column person_bio. Then you definitely can create one other column with a prompt like this Write a sensible skilled email because it was written by the next person: {{person_bio}}.

Now let’s dive into easy methods to use it!

Methods to use it

AI Sheets gives you two ways to start out: import existing data or generate a dataset from scratch. Once your data is loaded, you’ll be able to refine it by adding columns, editing cells, and regenerating content.

Getting began

To start, you would like create one from scratch describing it in natural language or import an existing dataset.

Generate Dataset from Scratch

Best for: Familiarizing with AI Sheets, brainstorming, rapid experiments, and creating test datasets.

Consider this as an auto-dataset or prompt-to-dataset feature—you describe what you would like, and AI Sheets creates the whole dataset structure and content for you.

When to make use of this:

You are exploring AI Sheets for the primary time
You wish synthetic data for testing or prototyping
Data accuracy and variety usually are not critical (e.g., brainstorming use cases, quick research, generating test datasets)
You need to experiment with ideas quickly

How it really works:

Describe the dataset you would like within the prompt area
- Example: “A listing of fictional startups with name, industry, and slogan”
AI Sheets generates the schema and creates 5 sample rows
Extend to as much as 1,000 rows or modify the prompt to vary structure

Example

Should you type this prompt: cities of the world, alongside countries they belong to and a landmark image for every, generated in Ghibli style:

AI Sheets will robotically generate a dataset with three columns, as shown below:

This dataset accommodates only five rows, but you’ll be able to add more cells by dragging down on each column, including the image one! You can too write items in any of the cells and complete the others by dragging.

The next sections will show you easy methods to iterate and expand the dataset.

Import your dataset (really helpful)

Best for: Most use cases where you ought to transform, classify, enrich, and analyze real-world data.

That is really helpful for many use cases, as importing real data gives you more control and suppleness than ranging from scratch.

When to make use of this:

You could have existing data to rework or enrich using AI models
You need to generate synthetic data, and accuracy and variety are necessary

How it really works:

Upload your data in XLS, TSV, CSV, or Parquet format
Ensure your file includes at the least one column name and one row of information
Upload as much as 1,000 rows (unlimited columns)
Your data appears in a well-known spreadsheet format

Pro tip: In case your file accommodates minimal data, you’ll be able to manually add more entries by typing directly into the spreadsheet.

Working together with your dataset

Once your data is loaded (no matter the way you began), you will see it in an editable spreadsheet interface. Here’s what you might want to know:

Understanding AI Sheets

Imported cells: Manually editable but cannot be modified by AI prompts
AI-generated cells: May be regenerated and refined using prompts and your feedback (edits + thumbs-up)
Recent columns: All the time AI-powered and fully customizable

Getting Began with AI columns

Click the “+” button so as to add a brand new column
Pick from really helpful actions:
- Extract specific information
- Summarize long text
- Translate content
- Or write custom prompts with “Do something with {{column}}”

Refining and expanding the dataset

Now that you might have AI columns, you’ll be able to improve their results and expand your data. You possibly can improve results by providing feedback through manual edits and likes or by adjusting the column configuration. Each require regeneration to take effect.

1. Methods to add more cells

Drag down: From the last cell in a column to generate additional rows immediately
No regeneration needed – latest cells are created immediately
You should use this to regenerate errored cells too

2. Manual editing and feedback

Edit cells: Click any cell to edit content directly – this provides the model examples of your chosen output
Like results: Use thumbs-up to mark examples of excellent output
Regenerate to use feedback to other cells within the column.

Under the hood, these manually edited and liked cells shall be used as few-shot examples for generating the cells whenever you regenerate or add more cells within the column!

3. Adjust column configuration Change the prompt, switch models or providers, or modify settings, then regenerate to get well results.

Rewrite the prompt

Each column has its generation prompt
Edit anytime to vary or improve output
Column regenerates with latest results

Switch models/providers

Try different models for various performance or compare them.
Some are more accurate, creative, or structured than others for specific tasks.
Some providers have faster inference and different context lengths; test different providers for the chosen model.

Toggle Search

Enable: Model pulls up-to-date information from the online
Disable: Offline, model-only generation

Exporting your final dataset to the Hub

When you’re glad together with your latest dataset, export it to the Hub! This has the extra advantage of generating a config file you’ll be able to reuse for (1) generating more data with HF jobs using this script, and (2) reusing the prompts for downstream applications, including the few shots out of your edited and liked cells.

Here’s an example dataset created with AISheets, which produces this config.

Running data generation scripts using HF Jobs

If you ought to generate a bigger dataset, you should use the above-mentioned config and script, like this:

hf jobs uv run 
-s HF_TOKEN=$HF_TOKEN 
https://huggingface.co/datasets/aisheets/uv-scripts/raw/foremost/extend_dataset/script.py  
--config https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions/raw/foremost/config.yml  
--num-rows 100  
nvidia/Nemotron-Personas dvilasuero/nemotron-kimi-qa-distilled

Examples

This section provides examples of datasets you’ll be able to construct with AI Sheets to encourage your next project.

Vibe testing and comparing models

AI Sheets is your perfect companion if you ought to test the newest models on different prompts and data you care about.

You only must import a dataset (or create one from scratch) after which add different columns with the models you ought to test.

Then, you’ll be able to either inspect the outcomes manually or add a column to make use of LLMs to evaluate the standard of every model.

Below is an example, comparing open frontier models for mini web apps. AI Sheets helps you to see the interactive results and play with each app. Moreover, the dataset includes several columns using LLM to evaluate and compare the standard of the apps.

Example dataset exported from a session just like the one we just described: : https://huggingface.co/datasets/dvilasuero/jsvibes-qwen-gpt-oss-judged

Config:

columns:
  gpt-oss:
    modelName: openai/gpt-oss-120b
    modelProvider: groq
    userPrompt: Create a complete, runnable HTML+JS file implementing {{description}}
    searchEnabled: false
    columnsReferences:
      - description
  eval-qwen-coder:
    modelName: Qwen/Qwen3-Coder-480B-A35B-Instruct
    modelProvider: cerebras
    userPrompt: "Please compare the 2 apps and tell me which one is healthier and why:nnApp description:nn{{description}}nnmodel 1:nn{{qwen3-coder}}nnmodel 2:nn{{gpt-oss}}nnKeep it very short and give attention to whether or not they work well for the aim, be sure that they work and usually are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so watch out to evaluate thatnnRespond with:nnchosen: {model 1, model 2}nnreason: ..."
    searchEnabled: false
    columnsReferences:
      - gpt-oss
      - description
      - qwen3-coder
  eval-gpt-oss:
    modelName: openai/gpt-oss-120b
    modelProvider: groq
    userPrompt: "Please compare the 2 apps and tell me which one is healthier and why:nnApp description:nn{{description}}nnmodel 1:nn{{qwen3-coder}}nnmodel 2:nn{{gpt-oss}}nnKeep it very short and give attention to whether or not they work well for the aim, be sure that they work and usually are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so watch out to evaluate thatnnRespond with:nnchosen: {model 1, model 2}nnreason: ..."
    searchEnabled: false
    columnsReferences:
      - gpt-oss
      - description
      - qwen3-coder
  eval-kimi:
    modelName: moonshotai/Kimi-K2-Instruct
    modelProvider: groq
    userPrompt: "Please compare the 2 apps and tell me which one is healthier and why:nnApp description:nn{{description}}nnmodel 1:nn{{qwen3-coder}}nnmodel 2:nn{{gpt-oss}}nnKeep it very short and give attention to whether or not they work well for the aim, be sure that they work and usually are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so watch out to evaluate thatnnRespond with:nnchosen: {model 1, model 2}nnreason: ..."
    searchEnabled: false
    columnsReferences:
      - gpt-oss
      - description
      - qwen3-coder

Add categories to a Hub dataset

AI Sheets may also augment existing datasets and allow you to with quick data evaluation and data science projects that involve analyzing text datasets.

Here’s an example of adding categories to an existing Hub dataset.

A cool feature is which you can validate or edit manually the initial categorization outputs and regenerate the total column to enhance the outcomes, as seen below:

Config:

columns:
  category:
    modelName: moonshotai/Kimi-K2-Instruct
    modelProvider: groq
    userPrompt: |-
      Categorize the foremost topics of the next query:

      {{query}}
    prompt: "

      You're a rigorous, intelligent data-processing engine. Generate only the
      requested response format, with no explanations following the user
      instruction. You may be supplied with positive, accurate examples of how
      the user instruction have to be accomplished.

      # Examples

      The next are correct, accurate example outputs with respect to the
      user instruction:

      ## Example

      ### Input

      query: Given the world of a parallelogram is 420 square centimeters and
      its height is 35 cm, find the corresponding base. Show all work and label
      your answer.

      ### Output

      Mathematics – Geometry

      ## Example

      ### Input

      query: What's the minimum variety of red squares required to make sure
      that every of $n$ green axis-parallel squares intersects 4 red squares,
      assuming the green squares might be scaled and translated arbitrarily
      without intersecting one another?

      ### Output

      Geometry, Combinatorics
      # User instruction

      Categorize the foremost topics of the next query:

      {{query}}

      # Your response
      "
    searchEnabled: false
    columnsReferences:
      - query

Evaluate models with LLMs-as-Judge

One other use case is evaluating the outputs of models using an LLM as a judge approach. This might be useful for comparing models or assessing the standard of an existing dataset, for instance, fine-tuning a model on an existing dataset on the Hugging Face Hub.

In the primary example, we combined vibe testing with a judge LLM column. Here’s the judge prompt:

Example dataset: https://huggingface.co/datasets/dvilasuero/jsvibes-qwen-gpt-oss-judged

Config:

columns:
  object_name:
    modelName: meta-llama/Llama-3.3-70B-Instruct
    modelProvider: groq
    userPrompt: Generate the name of a common day to day object
    searchEnabled: false
    columnsReferences: []
  object_description:
    modelName: meta-llama/Llama-3.3-70B-Instruct
    modelProvider: groq
    userPrompt: Describe a {{object_name}} with adjectives and short word groups separated by commas. No more than 10 words
    searchEnabled: false
    columnsReferences:
      - object_name
  object_image_with_desc:
    modelName: multimodalart/isometric-skeumorphic-3d-bnb
    modelProvider: fal-ai
    userPrompt: RBNBICN, icon, white background, isometric perspective, {{object_name}} , {{object_description}}
    searchEnabled: false
    columnsReferences:
      - object_description
      - object_name
  object_image_without_desc:
    modelName: multimodalart/isometric-skeumorphic-3d-bnb
    modelProvider: fal-ai
    userPrompt: "RBNBICN, icon, white background, isometric perspective, {{object_name}} "
    searchEnabled: false
    columnsReferences:
      - object_name
  glowing_colors:
    modelName: multimodalart/isometric-skeumorphic-3d-bnb
    modelProvider: fal-ai
    userPrompt: "RBNBICN, icon, white background, isometric perspective, {{object_name}}, glowing colours "
    searchEnabled: false
    columnsReferences:
      - object_name
  flux:
    modelName: black-forest-labs/FLUX.1-dev
    modelProvider: fal-ai
    userPrompt: Create an isometric icon for the object {{object_name}} based on {{object_description}}
    searchEnabled: false
    columnsReferences:
      - object_description
      - object_name

Next steps

You possibly can try AI Sheets without installing anything or download and deploy it locally from the GitHub repo. For running locally and get probably the most out of it, we recommend you to subscribe to PRO and get 20x monthly inference usage.

If you might have questions or suggestions, tell us within the Community tab or by opening a difficulty on GitHub.

Source link

a tool to work with datasets using open AI models!

Useful links

What’s AI Sheets

What can I exploit it for

Methods to use it

Getting began

Generate Dataset from Scratch

Import your dataset (really helpful)

Working together with your dataset

Refining and expanding the dataset

Exporting your final dataset to the Hub

Running data generation scripts using HF Jobs

Examples

Vibe testing and comparing models

Add categories to a Hub dataset

Evaluate models with LLMs-as-Judge

Next steps

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Reconstruct a Scene in NVIDIA Isaac Sim Using Only a Smartphone

A guide to Efficient Multi-GPU Training

How NVIDIA DGX Spark’s Performance Enables Intensive AI Tasks

Solve Linear Programs Using the GPU-Accelerated Barrier Method in NVIDIA cuOpt

How Good are LLMs at Text-Based Video Games?

a tool to work with datasets using open AI models!

Useful links

What’s AI Sheets

What can I exploit it for

Methods to use it

Getting began

Generate Dataset from Scratch

Import your dataset (really helpful)

Working together with your dataset

Refining and expanding the dataset

Exporting your final dataset to the Hub

Running data generation scripts using HF Jobs

Examples

Vibe testing and comparing models

Add categories to a Hub dataset

Evaluate models with LLMs-as-Judge

Next steps

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.