Design and Art news, reviews, comments and original features

Mining the Medical Literature Explosion

Computational biologists at Baylor College of Medicine and analytics experts at IBM research are developing a powerful new tool called the Knowledge Integration Toolkit (KnIT) that promises to help research scientists deal with the more than 50 million scientific papers available in public databases.

There is a new paper published nearly every thirty seconds. KnIT's goal is to allow researchers pursuing new scientific studies to mine all available medical literature and formulate hypotheses that lead to disease cures, reports KurzweilAI.

In a case study using KnIT, researchers predicted the existence of proteins that modify p53, an important tumor suppressor protein.

"Even if a scientist reads five papers a day, it could take nearly thirty-eight years to completely understand all of the research already available today on this protein," said Olivier Lichtarge, director of the Center of Computational and Integrative Biomedical Research at Baylor and the principal investigator on the study.

"On average, a scientist might read between one to five research papers on a good day," said Lichtarge, also a professor of molecular and human genetics, biochemistry and molecular biology at Baylor. "But to put this in perspective with p53, there are over 70,000 papers published on this protein.

Working with colleagues at IBM led by Scott Spangler, the principal data scientist at IBM, the team took advantage of existing text mining capabilities such as IBM's Watson.

"Our hope is that scientists and researchers will be able to use Watson's cognitive capabilities to accelerate the understanding of biology underlying diseases," said Spangler. "Better understanding the biology of diseases can eventually lead to better treatments for some of the most complex and challenging diseases like cancer."

KnIT represents the knowledge explicitly in a network that researchers can query and then allows for further attempts to use these data to generate new reasonable and testable hypotheses that can help direct laboratory studies.

"This study showed that, in a very narrow field of study regarding p53, we can suggest new relationships and new functions associated with p53, which can later be directly validated in the laboratory," said Lichtarge.

Details from the study were published online with the Association for Computing Machinery's  digital library.