17.05.2023

AI helps to fasten processing times of online databases

German and US scholars find ways for improving hash functions and try new way to speed up databases.

Their colleagues from MIT and Harvard showed how that function may improve searchers of big databases. This could lead to DNA analysis, chemical sequences and other biological data. 

Other big companies like Google or Microsoft, etc. also take part in the research and their findings will be showcased at the Conference on Very Large Databases.

Hashing is vital for libraries and e-commerce websites. It generates codes that removes data entry, leading to easier data retrieval, since the codes are shorted and of established length. Nevertheless, misunderstanding does occur when different pieces of information are attached with the same value. This slows down searches and online performance.

To prevent this, perfect hash attachments are applied, but more time for computation is needed and their construction is needed for each dataset. Hashing is used in a variety of instances, like database indexing, data compression, cryptography - this is why effective hash function is important. MIT scholars improve hash functioning with the use of machine learning.

Their conclusion indicates, that machine learning-based algorithms cause half as many head-ons in some situations. Those algorithms also count better, than hash functions.

They found that sometimes, there is a compromise between hash function computation and head-ons, that scholars come across. The scholars may be able to speed up calculation time for the hash-function and decrease the number of head-ones substantially. 

For instance, this could help with DNA analysis, and other biological data.

Perfect hash functions offer opportunities of collision-free work

The problem with traditional hash function is that imagine a situation with 10 apples and 10 boxes, one for each box, in theory. In practice, though two apples may end up in one spot.

An ideal hash-function is about collision-free approach. Additional information is needed, like the number of slots available for placement, calculations for appropriate slow for each and every key. Additional data, though slows the system and makes it less efficient. 

Scientists are curious, whether it is possible to know more about data coming from a particular source - can they utilize already known models to make a hash-function that can trim collisions?

Data distribution shows every value in a dataset as well their occurrence level. It helps to computate the chances that a particular value is in data sample.

The algorithm helped to evaluate the look of the information distribution through a small piece from the dataset. The algorithm then foresees placement of a key in the dataset.

The scholars found that the machine is faster and easier in construction than the ideal hash function headons, leading to fewer headons.

More sub-models mean more accuracy

The learned models downsized the rate of head-on keys in datafile to 15% comparing to regular hash functions, throughput was also more time efficient.

Sub-models were also effective in the procedure. Submodels improve exactness of the learned models approximation, but taking longer time. 

This research is helpful for creation of hash functions for other kinds of data, including cases when data is deleted or inserted.

Yasmin Anderson

AI Catalog's chief editor

Share on social networks:

Similar news

Stay up to date with the latest news and developments in AI tools at our AI Catalog. From breakthrough innovations to industry trends, our news section covers it all.

29.05.2023

Fashion Brands use AI to create a variety of models. To complete the idea of the diff...

30.05.2023

Country’s Spring Budget is directed towards supporting the AI industry. In the recent...

30.05.2023

Facial recognition tool Clearview AI has revealed that it reached almost a million sea...