A recent method to spice up the speed of online databases


Hashing is a core operation in most online databases, like a library catalogue or an e-commerce website. A hash function generates codes that replace data inputs. Since these codes are shorter than the actual data, and typically a hard and fast length, this makes it easier to seek out and retrieve the unique information.

Nevertheless, because traditional hash functions generate codes randomly, sometimes two pieces of knowledge will be hashed with the identical value. This causes collisions — when looking for one item points a user to many pieces of knowledge with the identical hash value. It takes for much longer to seek out the correct one, leading to slower searches and reduced performance.

Certain kinds of hash functions, referred to as perfect hash functions, are designed to sort data in a way that stops collisions. But they need to be specially constructed for every dataset and take more time to compute than traditional hash functions.

Since hashing is utilized in so many applications, from database indexing to data compression to cryptography, fast and efficient hash functions are critical. So, researchers from MIT and elsewhere got down to see in the event that they could use machine learning to construct higher hash functions.

They found that, in certain situations, using learned models as an alternative of traditional hash functions could lead to half as many collisions. Learned models are those which were created by running a machine-learning algorithm on a dataset. Their experiments also showed that learned models were often more computationally efficient than perfect hash functions.

“What we present in this work is that in some situations we will give you a greater tradeoff between the computation of the hash function and the collisions we’ll face. We are able to increase the computational time for the hash function a bit, but at the identical time we will reduce collisions very significantly in certain situations,” says Ibrahim Sabek, a postdoc within the MIT Data Systems Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Their research, which might be presented on the International Conference on Very Large Databases, demonstrates how a hash function will be designed to significantly speed up searches in an enormous database. As an illustration, their technique could speed up computational systems that scientists use to store and analyze DNA, amino acid sequences, or other biological information.

Sabek is co-lead creator of the paper with electrical engineering and computer science (EECS) graduate student Kapil Vaidya. They’re joined by co-authors Dominick Horn, a graduate student on the Technical University of Munich; Andreas Kipf, an MIT postdoc; Michael Mitzenmacher, professor of computer science on the Harvard John A. Paulson School of Engineering and Applied Sciences; and senior creator Tim Kraska, associate professor of EECS at MIT and co-director of the Data Systems and AI Lab.

Hashing it out

Given an information input, or key, a standard hash function generates a random number, or code, that corresponds to the slot where that key might be stored. To make use of an easy example, if there are 10 keys to be put into 10 slots, the function would generate a random integer between 1 and 10 for every input. It is extremely probable that two keys will find yourself in the identical slot, causing collisions.

Perfect hash functions provide a collision-free alternative. Researchers give the function some extra knowledge, similar to the variety of slots the info are to be placed into. Then it might perform additional computations to determine where to place each key to avoid collisions. Nevertheless, these added computations make the function harder to create and fewer efficient.

“We were wondering, if we all know more in regards to the data — that it can come from a selected distribution — can we use learned models to construct a hash function that may actually reduce collisions?” Vaidya says.

A knowledge distribution shows all possible values in a dataset, and the way often each value occurs. The distribution will be used to calculate the probability that a selected value is in an information sample.

The researchers took a small sample from a dataset and used machine learning to approximate the form of the info’s distribution, or how the info are opened up. The learned model then uses the approximation to predict the placement of a key within the dataset.

They found that learned models were easier to construct and faster to run than perfect hash functions and that they led to fewer collisions than traditional hash functions if data are distributed in a predictable way. But when the info will not be predictably distributed, because gaps between data points vary too widely, using learned models might cause more collisions.

“We can have an enormous number of knowledge inputs, and every one has a unique gap between it and the subsequent one, so learning that is sort of difficult,” Sabek explains.

Fewer collisions, faster results

When data were predictably distributed, learned models could reduce the ratio of colliding keys in a dataset from 30 percent to fifteen percent, compared with traditional hash functions. They were also in a position to achieve higher throughput than perfect hash functions. In the most effective cases, learned models reduced the runtime by nearly 30 percent.

As they explored using learned models for hashing, the researchers also found that throughout was impacted most by the variety of sub-models. Each learned model consists of smaller linear models that approximate the info distribution. With more sub-models, the learned model produces a more accurate approximation, however it takes more time.

“At a certain threshold of sub-models, you get enough information to construct the approximation that you just need for the hash function. But after that, it won’t result in more improvement in collision reduction,” Sabek says.

Constructing off this evaluation, the researchers need to use learned models to design hash functions for other kinds of data. Additionally they plan to explore learned hashing for databases through which data will be inserted or deleted. When data are updated in this fashion, the model needs to vary accordingly, but changing the model while maintaining accuracy is a difficult problem.

“We wish to encourage the community to make use of machine learning inside more fundamental data structures and operations. Any form of core data structure presents us with a possibility use machine learning to capture data properties and improve performance. There continues to be loads we will explore,” Sabek says.

This work was supported, partially, by Google, Intel, Microsoft, the National Science Foundation, the USA Air Force Research Laboratory, and the USA Air Force Artificial Intelligence Accelerator.


What are your thoughts on this topic?
Let us know in the comments below.


Notify of
Newest Most Voted
Inline Feedbacks
View all comments
Creati un cont gratuit
Creati un cont gratuit
9 months ago

Thanks for sharing. I read many of your blog posts, cool, your blog is very good. https://www.binance.com/ro/register?ref=OMM3XK51

6 months ago

… [Trackback]

[…] Read More here: bardai.ai/artificial-intelligence/a-recent-method-to-spice-up-the-speed-of-online-databases/ […]

6 months ago

… [Trackback]

[…] There you can find 80020 additional Information on that Topic: bardai.ai/artificial-intelligence/a-recent-method-to-spice-up-the-speed-of-online-databases/ […]

5 months ago

Your article gave me a lot of inspiration, I hope you can explain your point of view in more detail, because I have some doubts, thank you.

5 months ago

… [Trackback]

[…] Info to that Topic: bardai.ai/artificial-intelligence/a-recent-method-to-spice-up-the-speed-of-online-databases/ […]

binance Konto er"offnen
binance Konto er"offnen
1 month ago

Thanks for sharing. I read many of your blog posts, cool, your blog is very good. https://www.binance.info/de-CH/join?ref=FIHEGIZ8

Guns for sale USA
1 month ago

… [Trackback]

[…] There you can find 99777 additional Info to that Topic: bardai.ai/artificial-intelligence/a-recent-method-to-spice-up-the-speed-of-online-databases/ […]

Share this article

Recent posts

Humane-SKT partnership launches first AI device 'Ai Pin' in Korea

Humain's 'Ai Pin', well referred to as the primary artificial intelligence (AI) hardware device, will likely be released in Korea. Humain announced a strategic partnership...

Bans on deepfakes take us only to this point—here’s what we really want

Rules that require all AI-generated content to be watermarked are unattainable to implement, and it’s also highly possible that watermarks could find yourself...

Empathetic AI: Transforming Mental Healthcare and Beyond with Emotional Intelligence

In an era where technology and humanity increasingly intertwine, the rise of empathetic AI represents a major step forward in bridging the gap between...

Gwangju’s ‘G-Unicorn Company’ growth is visible

Gwangju's 'G-Unicorn Corporations', which select and foster local startups with high growth potential, are producing results. Gwangju City (Mayor Kang Ki-jeong) said that the five...

Advanced Selection from Tensors in Pytorch

Using torch.index_select, torch.gather and torch.takeIn some situations, you’ll have to do some advanced indexing / selection with Pytorch, e.g. answer the query: “how can...

Recent comments

AeroSlim Weight loss price on NIA holds AI Ethics Idea Contest Awards Ceremony
skapa binance-konto on LLMs and the Emerging ML Tech Stack
бнанс рестраця для США on Model Evaluation in Time Series Forecasting
Bonus Pendaftaran Binance on Meet Our Fleet
Créer un compte gratuit on About Me — How I give AI artists a hand
To tài khon binance on China completely blocks ‘Chat GPT’
Regístrese para obtener 100 USDT on Reducing bias and improving safety in DALL·E 2
crystal teeth whitening on What babies can teach AI
binance referral bonus on DALL·E API now available in public beta
www.binance.com prihlásení on Neural Networks and Life
Büyü Yapılmışsa Nasıl Bozulur on Introduction to PyTorch: from training loop to prediction
yıldızname on OpenAI Function Calling
Kısmet Bağlılığını Çözmek İçin Dua on Examining Flights within the U.S. with AWS and Power BI
Kısmet Bağlılığını Çözmek İçin Dua on How Meta’s AI Generates Music Based on a Reference Melody
Kısmet Bağlılığını Çözmek İçin Dua on ‘이루다’의 스캐터랩, 기업용 AI 시장에 도전장
uçak oyunu bahis on Thanks!
para kazandıran uçak oyunu on Make Machine Learning Work for You
medyum on Teaching with AI
aviator oyunu oyna on Machine Learning for Beginners !
yıldızname on Final DXA-nation
adet kanı büyüsü on ‘Fake ChatGPT’ app on the App Store
Eşini Eve Bağlamak İçin Dua on LLMs and the Emerging ML Tech Stack
aviator oyunu oyna on AI as Artist’s Augmentation
Büyü Yapılmışsa Nasıl Bozulur on Some Guy Is Trying To Turn $100 Into $100,000 With ChatGPT
Eşini Eve Bağlamak İçin Dua on Latest embedding models and API updates
Kısmet Bağlılığını Çözmek İçin Dua on Jorge Torres, Co-founder & CEO of MindsDB – Interview Series
gideni geri getiren büyü on Joining the battle against health care bias
uçak oyunu bahis on A faster method to teach a robot
uçak oyunu bahis on Introducing the GPT Store
para kazandıran uçak oyunu on Upgrading AI-powered travel products to first-class
para kazandıran uçak oyunu on 10 Best AI Scheduling Assistants (September 2023)
aviator oyunu oyna on 🤗Hugging Face Transformers Agent
Kısmet Bağlılığını Çözmek İçin Dua on Time Series Prediction with Transformers
para kazandıran uçak oyunu on How China is regulating robotaxis
bağlanma büyüsü on MLflow on Cloud
para kazandıran uçak oyunu on Can The 2024 US Elections Leverage Generative AI?
Canbar Büyüsü on The reverse imitation game
bağlanma büyüsü on The NYU AI School Returns Summer 2023
para kazandıran uçak oyunu on Beyond ChatGPT; AI Agent: A Recent World of Staff
Büyü Yapılmışsa Nasıl Bozulur on The Murky World of AI and Copyright
gideni geri getiren büyü on ‘Midjourney 5.2’ creates magical images
Büyü Yapılmışsa Nasıl Bozulur on Microsoft launches the brand new Bing, with ChatGPT inbuilt
gideni geri getiren büyü on MemCon 2023: We’ll Be There — Will You?
adet kanı büyüsü on Meet the Fellow: Umang Bhatt
aviator oyunu oyna on Meet the Fellow: Umang Bhatt
abrir uma conta na binance on The reverse imitation game
código de indicac~ao binance on Neural Networks and Life
Larry Devin Vaughn Wall on How China is regulating robotaxis
Jon Aron Devon Bond on How China is regulating robotaxis
otvorenie úctu na binance on Evolution of Blockchain by DLC
puravive reviews consumer reports on AI-Driven Platform Could Streamline Drug Development
puravive reviews consumer reports on How OpenAI is approaching 2024 worldwide elections
www.binance.com Registrácia on DALL·E now available in beta