The Impact of Bad Data

Feb 08, 2016

Lab DataOne of the most valuable assets belonging to a company, yet one that is sometimes underappreciated, is data. Although this concept applies to all life science laboratories, we’ll focus on how lab data quality has a significant impact on the bottom line in drug discovery.  Since the onset of high throughput screening in the 1990s, there has been a lot of discussion on improving data (quality, accessibility, comparability, and repeatability). Companies have invested considerable resources over the past decade to improve data mining by developing sophisticated data analysis tools.  One theme that has emerged is that the quality of the data analysis is only as good as the assay itself, and this has driven companies to implement various quality control strategies throughout the drug development process.

If we take a look at a simple overview of a drug’s progression from discovery through manufacturing as illustrated in Figure 1, we notice that the process can be categorized as regulated and non-regulated. For the regulated portions of this process, quality is a regulatory requirement. However, for the discovery portion, the meaning of “quality” has been largely left up to the individual organizations.  Of course, data quality has different implications at each point in the process.

Figure 1. Impact of Bad Data Throughout the Drug Development Process. [1,2]



During the drug discovery phase, where disease targets are identified and screened against 10,000s to 1,000,000s of compounds across multiple chemical scaffolds, the cost per assay data point is relatively low, less than $0.10. If an error becomes noticeable throughout a screening campaign, compounds are simply rescreened.  However, actual cost to the organization is significantly higher as it also includes lost resources (labor and equipment), consumption of valuable reagents or screening compounds and lost time.  An extreme, but common example, is chasing the wrong chemical scaffold due to unnoticed aberrant screening data caused by false positives or compound impurities.  The longer that an inappropriate chemical scaffold perpetuates throughout lead optimization and structure-activity relationship, the more costly the mistake.

An additional consideration when evaluating the cost of an error: assays are involved in many transfers – from HTS to Lead Generation/Lead Optimization to Candidate Selection – and must be robust to support this transfer. Otherwise, assay transfers usually require troubleshooting at the receiving lab, and may need additional development.  Both scenarios require time and resources, ultimately driving up costs.


When a candidate molecule proceeds to the development and preclinical stages, animals are often involved in order to assess:

  • how the therapeutic is absorbed, distributed, metabolized, and excreted;
  • the potential mechanisms of action;
  • the best dosage;
  • how to administer the drug (i.e., oral or injection); and
  • potential side effects.

Not only do costs increase dramatically for each assay in this stage, but the samples are more variable and limited, precluding multiple retests. Also, the samples typically come in the form of blood, tissue or other biological fluid, so shelf life and accessibility can be problematic.

Once a therapeutic has been characterized and determined to have the desired properties during the preclinical phase, manufacturing process development typically begins. Samples from process development can be challenging in that they can come in a variety of matrices. Using a biotherapeutic as an example, matrices can include cell lysate, clarified cell harvest, chromatographic elution buffers, etc. During process development, samples and their analytes are widely varied. Decisions are constantly being made in order to optimize the process. Aberrant data within this phase can cause significant delays.


Clinical testing involves human samples, and thus precludes multiple/repeated sampling. Also, since human subjects are involved, sample size is somewhat limited so repeat tests may not be practical.  At this stage of the drug development process, sample costs (for testing not clinical enrollment) can rise to $10s or $100s for a single sample, depending on what assay is employed.  The nature of these biological samples may also affect shelf life, and they may be subjected to repeated freeze-thaw cycles.


QC testing is a primary operation supporting the manufacture of pharmaceuticals. Typically, one sample will be subdivided and distributed across multiple tests in support of release.  For example, a final fill sample might be tested for endotoxin, identity, purity, concentration/potency and sterility, as well as specific testing for residual DNA and host cell proteins.  While the assays themselves can cost $10s-$100s per sample, process delays can be much more costly.

A single failed test at points early in the manufacturing process can delay pharmaceutical bulk from advancing to the next step. If this happens, losses of $10,000s to $100,000s or more per day can be incurred. Additionally, by this time documentation and processes have already been approved by a regulatory agency.  Regulations typically do not allow a simple retest as a result of a bad data, but would likely require an investigation, consuming additional labor resources.  Assay failure, at this stage becomes very expensive.


Bad data has many different impacts depending on where in the drug discovery process it occurs. The cost can be easily quantifiable when thinking of only the assay itself, but can be much larger in magnitude if labor, down time, scheduling and loss of precious sample or reagents are considered.

Additional Resources

About the Author

Nat-HentzDr. Nathaniel Hentz is the Associate Director of the BTEC Analytical Lab at North Carolina State University. Dr. Hentz has served as an independent consultant working with Artel and his tenure in the HTS industry includes nearly two years as Sr. Research Investigator at Bristol-Myers Squibb and seven years at Eli Lilly RTP Laboratories in North Carolina. Dr. Hentz received his Ph.D. in analytical chemistry from the University of Kentucky in 1996 and joined Lilly as a postdoctoral scientist the same year.


1. MM Hayward (Ed), Lead-Seeking Approaches (Springer, Heidelburg, Germany, 2010), Chapter 1.
2. KE Avis and VL Wu (Eds), Biotechnology and Biopharmaceutical Manufacturing, Processing, and Preservation (CRC Press, Boca Raton, FL, 1996), Chapter 6.