“Google for DNA”: Scientists create the world’s first and fastest genetic search engine |


“Google for DNA”: Scientists create the world’s first and fastest genetic search engine

A team of researchers at ETH Zurich has developed MetaGraph, a groundbreaking tool that allows scientists to search through vast public DNA and RNA databases in seconds, earning it the nickname “Google for DNA”. With global repositories now holding nearly 100 petabytes of genetic data, equivalent to the total text on the internet, traditional methods of downloading and analysing sequences have become slow and resource-intensive. MetaGraph compresses this enormous volume of data into a searchable, full-text index, enabling rapid identification of sequences across millions of datasets. This innovation could accelerate research into pathogens, antibiotic resistance and rare genetic conditions.

How “Google for DNA” transforms genetic research

DNA sequencing has revolutionised biomedical research, enabling scientists to identify hereditary disorders, track tumour mutations and monitor emerging pathogens such as SARS-CoV-2. However, the exponential growth of publicly shared sequencing data in repositories like the American Sequence Read Archive (SRA) and European Nucleotide Archive (ENA) has created a major computational challenge. Until now, searching for specific sequences required downloading massive datasets, which was time-consuming, expensive and often incomplete. MetaGraph changes this by allowing researchers to perform near-instant searches across millions of sequences, making genetic exploration faster, more efficient and far more comprehensive than ever before.

How MetaGraph works

MetaGraph introduces a full-text search system for genetic sequences, allowing researchers to input a DNA or RNA sequence and instantly find where it appears across public datasets. By creating a compressed, indexed representation of the data, the tool reduces storage needs by a factor of 300 while retaining essential information. Complex mathematical graphs structure the data efficiently, enabling scalable searches. As the dataset grows, the tool requires minimal additional computing resources. According to ETH researchers, this approach is both precise and cost-effective, with queries costing as little as $0.74 per megabase.

Applications in research and medicine

The speed and precision of MetaGraph could transform genetic research. It can help scientists identify resistance genes, explore bacteriophages that combat harmful bacteria and accelerate the study of little-researched pathogens. In the future, the tool may also aid in understanding rare genetic conditions or supporting rapid responses to emerging infectious diseases. Half of the world’s publicly available sequence datasets are already indexed, with the remainder expected to be added by the end of the year. Its open-source nature also makes it valuable for pharmaceutical companies with large internal databases.

The future of DNA search engines

ETH researchers believe MetaGraph could eventually be used beyond scientific laboratories. Dr André Kahles notes that, just as Google evolved in unexpected ways, the ability to search genetic data could become commonplace for broader applications, such as identifying plant species at home. By turning enormous, complex genetic archives into a searchable resource, MetaGraph represents a major leap in bioinformatics, offering scientists a tool to explore the code of life faster and more efficiently than ever before.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *