Welcome to the Falcon 3 Family of Open Models!

-


Falcon LLM TII UAE's avatar


We introduce Falcon3, a family of decoder-only large language models under 10 billion parameters, developed by
Technology Innovation Institute (TII) in Abu Dhabi. By pushing the
boundaries of performance and training efficiency, this release reflects our ongoing commitment to advancing open
and accessible large foundation models.

Falcon3 represents a natural evolution from previous releases, emphasizing expanding the models’ science, math, and code capabilities.

This iteration includes five base models:

  1. Falcon3-1B-Base
  2. Falcon3-3B-Base
  3. Falcon3-Mamba-7B-Base
  4. Falcon3-7B-Base
  5. Falcon3-10B-Base

In developing these models, we incorporated several key innovations aimed toward improving the models’ performances while reducing training costs:

  • One pre-training for transformer-based models: We conducted a single large-scale pretraining run on the 7B model, using 1024 H100 GPU chips, leveraging 14 trillion tokens featuring web, code, STEM, and curated high-quality and multilingual data.
  • Depth up-scaling for improved reasoning: Constructing on recent studies on the consequences of model depth, we upscaled the 7B model to a 10B parameters model by duplicating the redundant layers and continuing pre-training with 2 trillion tokens of high-quality data. This yielded Falcon3-10B-Base which achieves state-of-the-art zero-shot and few-shot performance for models under 13B parameters.
  • Knowledge distillation for higher tiny models: To offer compact and efficient alternatives, we developed Falcon3-1B-Base and Falcon3-3B-Base by leveraging pruning and knowledge distillation techniques, using lower than 100GT of curated high-quality data, thereby redefining pre-training efficiency.
  • Pure SSM: We now have further enhanced Falcon Mamba 7B by training on a further 1.5 trillion tokens of high-quality data, leading to Falcon3-Mamba-7B-Base. Notably, the updated model offers significantly improved reasoning and mathematical capabilities.
  • Other variants: All models within the Falcon3 family can be found in variants corresponding to Instruct, GGUF, GPTQ-Int4, GPTQ-Int8, AWQ, and 1.58-bit, offering flexibility for a big selection of applications.



Key Highlights

Falcon3 featured the boundaries inside the small and medium scales of enormous language models by demonstrating high performance on common benchmarks:

  • Falcon3-1B-Base surpasses SmolLM2-1.7B and is on par with gemma-2-2b.
  • Falcon3-3B-Base outperforms larger models like Llama-3.1-8B and Minitron-4B-Base, highlighting the advantages of pre-training with knowledge distillation.
  • Falcon3-7B-Base demonstrates top performance, on par with Qwen2.5-7B, amongst models under the 9B scale.
  • Falcon3-10B-Base stands because the state-of-the-art achieving strong leads to the under-13B category.
  • All of the transformer-based Falcon3 models are compatible with Llama architecture allowing higher integration within the AI ecosystem.
  • Falcon3-Mamba-7B continues to guide because the top-performing State Space Language Model (SSLM), matching and even surpassing leading transformer-based LLMs on the 7B scale, together with support for an extended 32K context length. Having the identical architecture as the unique Falcon Mamba 7B, users can integrate Falcon3-Mamba-7B seamlessly with none additional effort.
  • The instruct versions of our collection of base models further show remarkable performance across various benchmarks with Falcon3-7B-Instruct and Falcon3-10B-Instruct outperforming all instruct models under the 13B scale on the open leaderboard.



Enhanced Capabilities

We evaluated models with our internal evaluation pipeline (based on lm-evaluation-harness) and we report raw scores.
Our evaluations highlight key areas where the Falcon3 family of models excel, reflecting the emphasis on enhancing performance in scientific domains, reasoning, and general knowledge capabilities:

  • Math Capabilities: Falcon3-10B-Base achieves 22.9 on MATH-Lvl5 and 83.0 on GSM8K, showcasing enhanced reasoning in complex math-focused tasks.
  • Coding Capabilities: Falcon3-10B-Base achieves 73.8 on MBPP, while Falcon3-10B-Instruct scores 45.8 on Multipl-E, reflecting their abilities to generalize across programming-related tasks.
  • Prolonged Context Length: Models within the Falcon3 family support as much as 32k tokens (except the 1B supporting as much as 8k context), with functional improvements corresponding to scoring 86.3 on BFCL (Falcon3-10B-Instruct).
  • Improved Reasoning: Falcon3-7B-Base and Falcon3-10B-Base achieve 51.0 and 59.7 on BBH, reflecting enhanced reasoning capabilities, with the 10B model showing improved reasoning performance over the 7B.
  • Scientific Knowledge Expansion: Performance on MMLU benchmarks demonstrates advances in specialized knowledge, with scores of 67.4/39.2 (MMLU/MMLU-PRO) for Falcon3-7B-Base and 73.1/42.5 (MMLU/MMLU-PRO) for Falcon3-10B-Base respectively.



Models’ Specs and Benchmark Results

Detailed specifications of the Falcon3 family of models are summarized in the next table. The architecture of Falcon3-7B-Base
is characterised by a head dimension of 256 which yields high throughput when using FlashAttention-3 because it is optimized for this dimension. These decoder-only models span 18 to 40 layers for the transformer-based ones, and 64 layers for the mamba one, all models share the SwiGLU activation function, with vocabulary size of 131K tokens (65Kfor Mamba-7B). The Falcon3-7B-Base is trained on the most important amount of information ensuring comprehensive coverage of concepts and knowledge, the opposite variants require way less data.

Training efficiency

The table below highlights the performances of Falcon3-7B-Base and Falcon3-10B-Base on key benchmarks showing competitive performances normally, math, reasoning, and customary sense understanding domains.
Be happy to try models’ cards where we offer additional evaluation results (e.g. MT-Bench, Alpaca, etc).

Training efficiency

The instruct models also show competitive and super performances with equivalent and small-size models as highlighted within the tables below.



Instruct models

Falcon3-1B-Instruct and Falcon3-3B-Instruct achieve robust performance across the evaluated benchmarks. Specifically, Falcon3-1B attains competitive leads to IFEval (54.4), MUSR (40.7), and SciQ (86.8), while Falcon3-3B exhibits further gains—particularly in MMLU-PRO (29.7) and MATH (19.9)—demonstrating clear scaling effects. Although they don’t surpass all competing models on every metric, Falcon models show strong performances in reasoning and commonsense understanding relative to each Qwen and Llama.
In our internal evaluation pipeline:

  • We use lm-evaluation harness.
  • We report raw scores obtained by applying chat template without fewshot_as_multiturn (unlike Llama3.1).
  • We use same batch-size across all models.
Training efficiency

Moreover, Falcon3-7B and Falcon3-10B show robust performance across the evaluated benchmarks. Falcon3-7B achieves competitive scores on reasoning (Arc Challenge: 65.9, MUSR: 46.4) and math (GSM8K: 79.1), while Falcon3-10B demonstrates further improvements, notably in GSM8K (83.1) and IFEval (78), indicating clear scaling advantages.

Training efficiency



Open Source Commitment

According to our mission to foster AI accessibility and collaboration, all models within the Falcon3 family are released under the Falcon LLM license. We hope the AI community finds these models priceless for research, application development, and further experimentation. Falcon3 isn’t a culmination but a continuation of our efforts to create more capable, efficient, specialized foundation models. In January 2025, we are going to further release other models of the Falcon3 family featuring enhanced multi-modal capabilities including image, video, and audio support, in addition to a full technical report covering our methodologies. We welcome feedback and collaboration from the community as we proceed to refine and advance these technologies.



Useful links



Acknowledgments

We warmly thank the next people for his or her smooth support and integration inside the ecosystem.



Citation

If the Falcon3 family of models were helpful to your work, be at liberty to provide us a cite.

@misc{Falcon3,
    title = {The Falcon 3 Family of Open Models},
    url = {https://huggingface.co/blog/falcon3},
    creator = {Falcon-LLM Team},
    month = {December},
    yr = {2024}
}



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x