Scientists Create the Ultimate Drug Database


Chemists at Duke University have put together a database covering every possible combination of molecules that chemists could create in a laboratory. The idea is to keep track of what work has already been done, as well as settling disputes about who created what.

The chemists reasoned that as drug synthesis involve humans putting together molecules into compounds, there must be a limit to the possible combinations. They’ve dubbed this the Small Molecule Universe.

The possible combinations covered by the project are subject to two restrictions. Firstly, they must contain at least some carbon. Secondly, only molecules with an atomic mass of 500 daltons or less are included, the logic being that larger molecules are harder to manipulate in synthesis.

Creating the database involved the Algorithm for Chemical Space Exploration with Stochastic Search, which simply meant a computer starting with a single atom and randomly adding or removing atoms of different elements to come up with the possible combinations.

Chemists then reviewed the combinations as the project went on to highlight any suggestions that human’s couldn’t actually create. Their explanations for why a combination wouldn’t work were then turned into systematic rules which refined the the algorithm’s future suggestions.

Terms like “limit” and “restrictions” are somewhat relative in this situation: the final database allows for one novemdecillion different feasible combinations, which is a 1 followed by sixty zeroes. Let’s just say this wouldn’t work as a notebook.

To make things a little easier, the chemists have broken the possibilities down into around nine million categories of combinations that share broad similarities.

The next step was to update what is effectively a blank sheet with details of which compounds humans have already created: around 100 million in total. The researchers have found a way to display this data as a two-dimensional “map” representation.

As the image above (credit Virshup et. al. JACS, 2013) demonstrates, chemists have so far only covered a small portion of the possibilities. The GDB-13 term refers to an existing database covering the 977 million possible combinations when you restrict synthesis to 13 atoms and five specific elements.

The source code for the algorithm has been published online. The idea now is that chemists working on new projects can quickly check the database for what combinations should be possible. They can also confirm whether a “new” creation genuinely is original and may be eligible for patent protection.