Home Artificial Intelligence Document AI | APP to match the Document Understanding LiLT and LayoutXLM (base) models at paragraph level DocLayNet + Layout models in Open Source: Document AI is (truly) starting! DocLayNet small/base/large and a DocLayNet Image Viewer APP: explore data to raised understand it Let’s compare our 2 models (LiLT vs LayoutXLM) Example

Document AI | APP to match the Document Understanding LiLT and LayoutXLM (base) models at paragraph level DocLayNet + Layout models in Open Source: Document AI is (truly) starting! DocLayNet small/base/large and a DocLayNet Image Viewer APP: explore data to raised understand it Let’s compare our 2 models (LiLT vs LayoutXLM) Example

0
Document AI | APP to match the Document Understanding LiLT and LayoutXLM (base) models at paragraph level
DocLayNet + Layout models in Open Source: Document AI is (truly) starting!
DocLayNet small/base/large and a DocLayNet Image Viewer APP: explore data to raised understand it
Let’s compare our 2 models (LiLT vs LayoutXLM)
Example

APP to compare the Document Understanding LiLT and LayoutXLM (base) models at paragraph level
APP to match the Document Understanding LiLT and LayoutXLM (base) models at paragraph level

Document AI | Inference APP at paragraph level with 2 Document Understanding models (LiLT base and LayoutXLM base fine-tuned on DocLayNet base dataset)

The recent publication of the DocLayNet dataset (IBM Research) and that of Document Understanding models (by the detection of layout and texts) on Hugging Face (LayoutLM, LayoutLMv2, LayoutLMv3, LayoutXLM, LiLT), allow the.

The pages in DocLayNet can be grouped into six distinct categories, namely Financial Reports, Manuals, Scientific Articles, Laws & Regulations, Patents and Government Tenders.
The pages in DocLayNet will be grouped into , namely .

Many firms and individuals are waiting for such models. Indeed, having the ability to to go looking for information, classify documents, interact with them via different NLP models corresponding to QA, NER and even chatbots (humm… who’s talking about ChatGPT here?)

Furthermore, in an effort to encourage AI professionals to coach this sort of model, IBM Research has just launched a contest: ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents.

On this context and in an effort to help as many individuals as possible to explore and higher understand the DocLayNet dataset, :

  1. the to facilitate using DocLayNet with annotated text (and never only with bounding boxes) (to read: “Document AI | Processing of DocLayNet dataset to be utilized by layout models of the Hugging Face hub (finetuning, inference)”);
  2. an to visualise the annotated bounding boxes of lines and paragraphs of the documents of the (to read: “Document AI | DocLayNet image viewer APP”).
  3. a model finetuned on the dataset DocLayNet base with overlap chunks of 384 tokens at that uses the XLM-RoBERTa base model and its inference app and production code
  4. a model finetuned on the dataset DocLayNet base with overlap chunks of 512 tokens at that uses the XLM-RoBERTa base model and its inference app and production code.
  5. a model finetuned on the dataset DocLayNet base with overlap chunks of 384 tokens at that uses the XLM-RoBERTa base tokenizer and its inference app and production code.
  6. a model finetuned on the dataset DocLayNet base with overlap chunks of 512tokens at and its inference app and production code.

APP

As a way to compare these 2 models, there’s an APP now 🙂

Notebook with Gradio APP

Here, the App notebook 🙂

This notebook runs a Gradio App that processes the primary page of any uploaded PDF. As done by our other Document Understanding APPs, this APP displays not only the paragraph labelled image of the primary page for every of the two models but in addition the DataFrame of labelled texts.

Gradio App that processes the first page of any uploaded PDF and displays not only the paragraph labelled image of the first page for each of the 2 models but also the DataFrame of labelled texts
Gradio App that processes the primary page of any uploaded PDF and displays not only the paragraph labelled image of the primary page for every of the two models but in addition the DataFrame of labelled texts

This notebook will be run on Google Colab. It’s hosted in github.

Notebook: Gradio_inference_on_LiLT_&_LayoutXLM_base_model_finetuned_on_DocLayNet_base_in_any_language_at_levelparagraphs_ml512.ipynb

Let’s have a look at a report from the European Commission.

Page 1

Our Gradio app renders the primary page of this PDF.

First page of a PDF processed by our Document Understanding LiLT base model (left) and LayoutXLM base model (right) at paragraph level
First page of a PDF processed by our Document Understanding LiLT base model (left) and LayoutXLM base model (right) at paragraph level

We are able to see from the paragraph labeled images that there are differences: our Document Understanding LiLT base model seems to work higher:

  • labeled Page Header text well,
  • it does a greater job of labeling texts blocks.

Nonetheless, the two models did not label the title of the page.

Page 2

Second page of a PDF processed by our Document Understanding LiLT base model (left) and LayoutXLM base model (right) at paragraph leve
Second page of a PDF processed by our Document Understanding LiLT base model (left) and LayoutXLM base model (right) at paragraph level

This time, we will see from the paragraph labeled images that there are again differences BUT that is our Document Understanding LayoutXLM base model that seems to work higher:

  • it detects thoroughly the Sub-Header.

: Pierre Guillou is an AI consultant in Brazil and France. Get in contact with him through his LinkedIn profile

LEAVE A REPLY

Please enter your comment!
Please enter your name here