Book Summary: Turning Text into Gold Key Takeaways What the heck is a taxonomy? Now what? A straightforward example

Artificial Intelligence

Book Summary: Turning Text into Gold Key Takeaways What the heck is a taxonomy? Now what? A straightforward example

admin

June 14, 2023

Book Summary: Turning Text into Gold
Key Takeaways
What the heck is a taxonomy?
Now what?
A straightforward example

Creator’s Note: This was a review written by the creator and posted on the Translating Nerd blog from 2017. The recent emergence of generative AI, specifically large language models (LLM) makes basic NLP knowledge more essential to know. For businesses, organizations and governments alike seek to leverage the advancements in LLM, having a sounds grasp to the taxonomies and processes of NLP and text analytics is paramount.

Turning Text into Gold: Taxonomies and Textual Analytics, by Bill Inmon, covers a plethora of text analytics and Natural Language Processing (NLP) foundations. Inmon makes it abundantly clear in the primary chapter of his book that organizations are underutilizing their data. He states that . This data, labeled structured data as a result of its ability to slot in matrices, spreadsheets, relational databases and are easily ingested into machine learning models, are well understood. Nevertheless, unstructured data, the text, and words that our world generates on a day by day basis are seldom used. Much like the alchemists of the center ages who looked for a technique to show atypical metals into gold, Inmon describes a process to show unstructured data into decisions; turning text into gold.

Taxonomies are the dictionaries that we use to refer tie the words in a document, book, corpus of materials, right into a business-related understanding. For instance, if I were a automobile manufacturer, I’d have a taxonomy of varied car-related concepts so I could discover those concepts within the text. We then begin to see repetition and patterns within the text. We’d begin to see latest words that relate to automobile manufacturing within the text. We are able to then add these terms to our taxonomy. While the unique taxonomy might garner 70 percent of car-related words within the document, 90 percent is frequently a business appropriate level to maneuver from taxonomy/ontology to database migration.

Once now we have the needed inputs from our long list of taxonomies. Through textual disambiguation, the raw text from our document is in comparison with the taxonomy now we have created. If there may be a fit, then this text is moved from the document and stored in a processing stage. This stage involves in search of more distinct patterns within the newly moved text. Using regular expressions, or a variety of investigative method in coding, we will discern more distinct patterns from the text. We are able to then move this raw text right into a matrix, or what many individuals are aware of in a spreadsheet. Transferring the text right into a matrix involves the manipulation of text to numbers, which will be quite large when fitting right into a matrix. While there are specific steps that will be taken (ie, sparse matrix vs. dense matrix), the method is similar: make text machine-readable. Words grow to be zeros and ones and analytical models can now be applied to the document. Machine learning algorithms, resembling offshoots of Bayes Theorem and other classification techniques will be used to categorize and cluster text.

Imagine you go to the ER in the future and a report is generated if you find yourself out-processed. This record holds many essential elements to your medical history. Nevertheless, having someone extract the name, address, your medications, your condition, your treating doctor’s information, your health vitals, etc would take a variety of time. More time than a swamped hospital staff on a limited budget can handle. Text analytics is used to link the all this information right into a spreadsheet that may then be fitted into the hospital’s database. Add up enough of those records and you’ll be able to start in search of patterns.

Your visit to the ER is documented as text
The hospital takes a pre-defined “dictionary”, or taxonomy, of medical-related terms
The taxonomy is compared against your medical evaluation and processed right into a spreadsheet/matrix.
The spreadsheet is uploaded right into a relational database the hospital maintains
An analyst queries the database data to make a machine learning model that may create value-added predictions.
Based in your model, a worth is produced that leads to a choice being made.

Image sources:

www.medicalexpo.com http://openres.ersjournals.com/content/2/1/00077-2015 https://www.sharesight.com/blog/ode-to-the-spreadsheet/

LEAVE A REPLY Cancel reply