Predictive Toxicology and Data Science: Predicting “Low Effect Levels” For Otherwise Harmful Chemicals

October 2, 2014 Ben Kerschberg

Recent advances in chemical safety research provide innovative solutions to persistent and pervasive issues facing risk assessments and policy decisions made about the safety of chemicals. To address some of these issues, U.S. Environmental Protection Agency (EPA) researchers have been using advances in predictive toxicology to begin addressing the significant lack of health data on thousands of chemicals used in the products we use everyday.

In 2005, EPA embarked on an innovative research venture called ToxCast to revolutionize the current approach for evaluating the safety of chemicals. ToxCast, uses hundreds of high-throughput screening assays — an analytic screening procedure performed in a wide variety of scientific disciplines — to simultaneously expose living cells or proteins to thousands of chemicals. The cells or proteins are then screened for changes in biological activity that may suggest potential toxic effects and eventually potential adverse health effects.

More than 500 assays covering a wide range of health effects have been used to screen almost 2,000 chemicals so far. Anyone with an interest in using this data may analyze, interpret, and determine ways it can be used to evaluate chemicals. As part of EPA’s commitment to gather and share its chemical data openly, all ToxCast chemical data is publicly available.

Appirio Services, Data Science, and ToxCast Come Together

In 2013, EPA approached Appirio about leveraging its crowdsourcing platform (topcoder) to dive into using this new chemical screening data. The goal of this challenge was to develop a model based on EPA data to quantitatively predict a chemical’s systemic lowest observable effect level (LEL) in traditional animal toxicity studies. In traditional tests, the LEL is conservatively adjusted in different ways by regulators to derive a value that can be used by EPA to set exposure limits expected to be tolerated by the majority of the population. Ideally, every chemical to which we are exposed would have a well-defined LEL. However, the full battery of studies required to estimate the LEL costs millions of dollars and takes many months to complete. As a result, thousands of chemicals lack the required data needed to estimate an LEL.

It is important to put the gravity of the challenge into perspective. The systemic LEL is the lowest dose that shows adverse effects in these animal toxicity tests, and thus will have long-standing effects on both humans and animals. The challenges may reduce an array of chemicals — or at least the LEL to which we can be exposed — to food ingredients and additives, pesticides, cosmetics, medicines, cleaners, and solvents. Put otherwise, it will establish a threshold at or above which chemicals will be recognized as harmful. Additional testing will be required below those thresholds.

Appirio Services and World-Class Community

Appirio’s challenges have two important traits: atomization (often called decomposition in light of the scientific process known as systems decomposition) and abstraction from the domain. These processes allowed hundreds of thousands of the world’s best problem-solvers to analyze EPA’s chemical screening data and determine how it could be used, along with other publicly available data to develop a model that could predict an LEL.

By atomizing a challenge, Appirio’s crowdsourcing community breaks it down into small component parts. This reaps a number of benefits for clients such EPA. First, community members self-select to compete in those sub-challenges where they feel they have a comparative advantage and can win. Second, atomization allows for parallel development. With so many community members choosing to compete in specific challenges, you may have 143 contestants on challenge (A), 110 on (B), and 79 on (C). When each of those challenges finish, they can be resynthesized with the others to bring together a whole, whether that’s (A)–(C) or (A)–(X). The alternative is sequential development, which requires that (A) be finished before (B), and then on to (C). This leaves development vulnerable to every weak link in a development chain.

Appirio also abstracts a problem from its domain (e.g., the metabolic pathway of a toxic compound) into the denominator that unifies the Appirio community: mathematics. Rather than restrict the competition to scientists who specialize in a single field, we encourage new perspectives to join the effort, which can lead to extreme value results — paradigm-shifting ways in which problems are conceived — submitted by competitors who know nothing about the original domain.


ToxCast is considered to be a success. The ability to determine the LELs of different chemicals constitutes potentially tremendous advances in how we evaluate the safety of chemicals. The process is not yet perfect, but the ability to develop a model to predict LELs — will be used by scientists to improve EPA’s predictive models.

Previous Article
This Week in Crowdsourcing – 3 Things to Know
This Week in Crowdsourcing – 3 Things to Know

3 new articles from the world of crowdsoucing. Daniel Castro and Mark Doms collaborated on the most interes...

Next Article
How Companies in 5 Traditional Industries Stay Cool for Millennials
How Companies in 5 Traditional Industries Stay Cool for Millennials

By Charles Coy When it comes to hiring Millennials, not every company can offer a flashy tech job with perk...