S2W Director Jeong Jin-woo: “AI for security is different from existing LLM… human-like approach required”

S2W Director Jeong Jin-woo explains the ‘Cybertune’ model approach.

“To properly understand security data, a human-like intelligent approach is required, different from the present large language model (LLM) method. The goal is to guide a brand new trend that understands each security and AI.”

S2W (CEO Sangdeok Website positioning), a synthetic intelligence (AI) data intelligence specialist, recently released ‘CyBERTuned’, an AI language model specialized in cybersecurity documents.

That is our second in-house model following ‘DarkBERT’, an AI language model specializing in understanding the dark web.

Regarding this, S2W Director Jeong Jin-woo said, “The dark web is made up of a posh structure and language that’s difficult for even humans to know.” He explained, “Through the development of Darkbert, we utilized domain-specific data evaluation that we acquired by processing, analyzing, and learning from a considerable amount of unstructured data.”

Particularly, it received attention as a paper with a distinct approach from existing models. Cybertune said that its significance lies within the indisputable fact that it went beyond understanding data characteristics and presented a brand new learning methodology based on this. It began from the principle that “AI models begin with understanding data.”

First, he identified that there are various nonverbal elements in cybersecurity. Most of them are sequences of numbers or letters without context, resembling Bitcoin addresses, URLs, IPs, and CVEs.

“People don’t necessarily understand the meaning or context when they give the impression of being at an internet site address,” said Director Jeong. “I assumed that AI also must have this human-like approach.”

In other words, what is vital in a model that understands security data will not be the arrangement and meaning of characters. After identifying the pattern, you simply need to differentiate the kinds. Which means that a distinct approach is required from the present language model that infers the following character based on probability.

Part of Cybertune's new data learning approach paper (Photo = S2W) — A part of Cybertune’s latest data learning approach paper (Photo = S2W)

Nevertheless, within the case of the present cybersecurity model, AI has had difficulty choosing information since it recognizes each character in the present way. Director Jeong emphasized, “With the intention to discover the kind of nonverbal data, it ought to be identified in ‘chunks’ fairly than individual characters,” and “That is way more advantageous when it comes to learning efficiency in addition to cost.”

That’s, it presents a brand new method in the information labeling technique of cybersecurity language.

Based on this, a paper was published and a model was developed inside a brief period of three to 4 months. There are cases where related models have been released in academia or research institutes, but this is nearly the primary attempt by a general company.

Using Google BERT as a base model, we learned about 500,000 cybersecurity-related news and reports to derive the optimal effect and applicability. This model will be used for ▲classifying malware from conversations, papers, etc. ▲summarizing cybersecurity news ▲analyzing hacker behavior patterns ▲understanding correlations between hacking groups ▲converting to industry-common words and terms, etc.

Director Jeong said, “Within the case of domain-specific small language models (sLMs), it’s mandatory to take a very different approach from that of huge language models (LLMs).” He continued, “LLMs prioritize generality, so that they aim to supply answers at the extent of a median student by inputting as much data as possible, whereas sLMs minimize irrelevant data and choose only the absolutely mandatory data to derive expert-level answers.”

As a number one data intelligence company that has focused on nonverbal and unstructured data processing for a few years, demand from enterprises has also been rapidly increasing recently.

He said that inquiries are coming from manufacturing, retail, and even the defense sector. This yr’s expected sales are also expected to exceed 10 billion won.

As previously disclosed, S2W is growing right into a comprehensive AI security company that covers all industries beyond the dark web.

Director Jeong Jin-woo said, “By specializing in specific keywords resembling dark web and security, we’ve got gained insight into the whole data,” and added, “In the long run, we’ll contribute to constructing language models for all industries and move beyond ‘AI for security’ to ‘security for AI.'”

Particularly, he said this is important for increasing trust in AI, which is crucial point on account of the spread of AI.

He emphasized that just because the spread of the Web previously made it essential to construct security programs worldwide, AI security may also turn out to be essential as firms increasingly adopt AI.

He summarized S2W’s vision as follows: “By combining our understanding of security and our understanding of AI, we’ll turn out to be a frontrunner in increasing the reliability of the whole AI ecosystem.”

Reporter Jang Se-min semim99@aitimes.com

S2W Director Jeong Jin-woo: “AI for security is different from existing LLM… human-like approach required”

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Google bets on ‘vibe design’ with Stitch

Generative AI improves a wireless vision system that sees through obstructions

A greater method for identifying overconfident large language models

Why You Should Stop Worrying About AI Taking Data Science Jobs

Two-Stage Hurdle Models: Predicting Zero-Inflated Outcomes

S2W Director Jeong Jin-woo: “AI for security is different from existing LLM… human-like approach required”

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.