Home Artificial Intelligence Master Dispersion Plots in 6 Minutes!

Master Dispersion Plots in 6 Minutes!

0
Master Dispersion Plots in 6 Minutes!

Quick Success Data Science

Learn graphical text evaluation with NLTK

A sepia-colored photo of Sherlock Holmes examining a book with a magnifying glass.
Sherlock Holmes (by DALL-E3)

The Natural Language Tool Kit (NLTK) ships with a fun feature called a dispersion plot that allows you to post the placement of a word in a text. More specifically, it plots the occurrences of a word versus the variety of words from the start of the corpus.

Here’s an example dispersion plot for the essential characters within the Sherlock Holmes novel, The Hound of the Baskervilles:

A dispersion plot that uses vertical blue tick marks to indicate the occurrence of a word in a text.
Dispersion plot for major characters in “The Hound of the Baskervilles” (by writer)

The vertical blue tick marks represent the locations of the goal words within the text. Each row covers the corpus from starting to finish.

If you happen to’re accustomed to The Hound of the Baskervilles — and I won’t spoil it should you’re not — you then’ll appreciate the sparse occurrence of Holmes in the center, the late return of Mortimer, and the overlap of Barrymore, Selden, and the hound.

Dispersion plots can have more practical applications. For instance, imagine you’re an information scientist working with paralegals on a criminal case involving insider trading. To seek out out whether the accused contacted board members just before making the illegal trades, you possibly can load the subpoenaed emails of the accused as a continuous string and generate a dispersion plot to examine for the juxtapositions of names.

Social scientists analyze dispersion plots to review language trends related to specific topics. By tracking the occurrence of terms like “climate change” or “gun control” in news articles, they’ll gain insights into priorities which can be vital to society over specific timeframes.

On this Quick Success Data Science project, we’ll write the Python code that generated The Hound of the Baskervilles dispersion plot shown previously.

We’ll use a replica of the novel stored on this Gist. It originally got here from Project Gutenberg, an awesome source for public domain literature. As really helpful for natural language processing, I’ve stripped it of…

LEAVE A REPLY

Please enter your comment!
Please enter your name here