Medical researchers turn to crowdsourcing in the coding community to crack DNA data challenges.

In 2014, a team of Pfizer scientists faced a data bottleneck. While conducting a study to find genetic variations that could increase a person’s risk of developing Chronic Obstructive Pulmonary Disease (COPD) — a group of lung diseases that includes emphysema and chronic bronchitis — they collected millions of data points.

With their best available software at the time, it took up to ten hours to analyze the associated genetic markers of just a single trait, such as lung function, smoking history, or whether or not they had COPD. For this study to be useful, the researchers hoped to look at multiple traits at once from a pool of nearly 10,000 subjects. With the current algorithm, it would be nearly impossible to analyze all that data. 

“One of the benefits of going out to the crowd is the on-demand resources," says Pfizer's Scott Jelinsky, a computational biologist in the Inflammation and Immunology unit at Kendall Square.

Turning To The Crowd

To find a faster algorithm, the team of researchers reached outside the walls of Pfizer to the wisdom of the crowd. That spring they posed the challenge to an online community that hosts coding contests. Competitors were incentivized with cash prizes and the bragging rights of solving a complex coding problem.

“We’re not a software development company,” says Pfizer’s Scott Jelinsky, a computational biologist in the Inflammation and Immunology unit at Kendall Square, who led the project with respiratory lead Iain Kilty. “One of the benefits of going out to the crowd is the on-demand resources. You’re more likely to find people with both the skills and the time to help us out.”

Another advantage to crowdsourcing is the diversity of minds who tackle the challenge, from engineers to physicists. “It’s a real advantage to remove biases from specific fields,” adds Jelinsky. 

For the coding contest, researchers converted their data — which were represented by the letters T, G, A, C, the basic building blocks of DNA — into numbers. “We wanted to make the contest applicable to people with no knowledge of genetics, but very good at math,” said Jelinsky. 

And The Winner Is…

The first contest spanned ten days, with over 400 entries from around the globe. The top prize went to a coder from France, who went by the handle Doudouille. A computer engineer for a digital marketing company by day, Doudouille developed an algorithm that reduced the processing time from ten hours to twenty minutes. His trick was being able to figure out a formula that could determine the one percent of genetic data that was significant to look at— and the rest that could be left out. Jelinsky eventually contracted the winner to help implement the software solution.  

In the following weeks, later phases of the contest led to reducing the computational time to thirty seconds. 

Novel Approaches to Science

In early 2017, Jelinsky and Kilty were invited to the White House to speak at a meeting for The Cancer Moonshot program about Pfizer’s successes in using crowdsourcing to find faster and better approaches to analyzing clinical data.

Crowdsourcing can be an important tool in keeping drug innovation moving forward, according to Jelinsky. “It’s accelerating the way we do things,” he says. “But it’s also creating novel approaches to the way we do science. It’s allowing us to think in ways we haven’t thought before.”