NVIDIA Blackwell Sets STAC-AI Record for LLM Inference in Finance

Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated evaluation of vast amounts of unstructured data to generate actionable trading insights. These advanced AI systems can process financial news, social media sentiment, earnings reports, and market data to predict stock price movements and automate investment strategies with unprecedented accuracy.

The Strategic Technology Evaluation Center (STAC) has been developing benchmarks for the workloads key to the financial industry for over 15 years. They’ve now developed the STAC-AI benchmark to assist corporations assess the end-to-end retrieval-augmented generation (RAG) and LLM inference pipeline.

This post presents the outcomes achieved on the STAC-AI LANG6 benchmark across multiple NVIDIA platforms. We may also share some recommendations on how any user can benchmark NVIDIA TensorRT LLM in accordance with the specifications of their dataset.

Model	Dataset	2x GH200 144 GB TensorRT LLM FP8		4x GB200 NVL72 TensorRT LLM NVFP4		2x RTX PRO 6000 NVFP4
Model	Dataset	WPS	RPS	WPS	RPS	WPS	RPS
Llama 3.1 8B	EDGAR4	8,237	51.5	37,480	224	5,500	32.9
Llama 3.1 8B	EDGAR5	304	0.784	1,112	2.85	138	0.345
Llama 3.1 70B	EDGAR4	1,071	6.77	5,618	35.9	831	5.26
Llama 3.1 70B	EDGAR5	41.4	0.119	150	0.477	13	0.04

NVIDIA Blackwell Sets STAC-AI Record for LLM Inference in Finance

STAC-AI LANG6 (Inference-Only) benchmark

Hardware and software stack

Benchmarking results on STAC-AI LANG6

Batch mode

Interactive mode

How you can benchmark TensorRT LLM together with your custom data

Prerequisites:

Step 1: Launch the container

Step 2: Clone the repositories

Step 3: Quantize the model

Step 4: Generate synthetic data

Step 5: Run the benchmark

Start with TensorRT LLM benchmarking

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Controlling Floating-Point Determinism in NVIDIA CCCL

AI in Multiple GPUs: ZeRO & FSDP

Trump gets data center corporations to pledge to pay for power generation

Introducing Modular Diffusers – Composable Constructing Blocks for Diffusion Pipelines

Dataset Recording, VLA High quality‑Tuning, and On‑Device Optimizations

NVIDIA Blackwell Sets STAC-AI Record for LLM Inference in Finance

STAC-AI LANG6 (Inference-Only) benchmark

Hardware and software stack

Benchmarking results on STAC-AI LANG6

Batch mode

Interactive mode

How you can benchmark TensorRT LLM together with your custom data

Prerequisites:

Step 1: Launch the container

Step 2: Clone the repositories

Step 3: Quantize the model

Step 4: Generate synthetic data

Step 5: Run the benchmark

Start with TensorRT LLM benchmarking

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.