• Products
  • Statistics and Data Mining Solutions
  • Statistics and Data Mining Services
  • Statistics and Data Mining Resources
  • Support
  • Events
  • Company
News & Events
Home / News & Events / Mining Imperfect Data

Mining Imperfect Data

Presented: Tuesday, April 18th, 2006

Speakers: Ronald Pearson, ProSanos Corporation and Michael O'Connell, Insightful Corporation

This presentation considers the origins, consequences, and treatments of a variety of important data anomalies, including missing data, outliers, misalignment errors (both record and field misalignments), disguised missing data, and various local anomalies (e.g., time-series outliers and patchy record anomalies). All of these problems are illustrated either with real datasets or published accounts to demonstrate their practical importance, character, and severity. Many of the examples discussed involve clinical datasets, but it is important to emphasize that the problems described here can arise in any application area, with potentially severe consequences if they are not handled correctly. In particular, all of these data anomalies have the potential of introducing large biases into analysis results, especially the more subtle ones like misalignment errors or disguised missing data, which can easily go undetected.



Ron Pearson, ProSanos Corporation

Ronald K. Pearson is Senior Scientist with ProSanos Corporation and holds an adjunct faculty position with the Department of Pathology, Anatomy and Cell Biology at Jefferson Medical College, Thomas Jefferson University in Philadelphia. His primary research interests are in the areas of exploratory data analysis, particularly for large observational clinical datasets, the analysis and identification of nonlinear discrete-time dynamical models, and nonlinear digital signal processing, particularly as it relates to cleaning time-series data. He is author of the books Discrete-Time Dynamic Models (Oxford University Press, 1999) and Mining Imperfect Data (SIAM, 2005), and coauthor of the book Identification and Control Using Volterra Models (Springer, 2001) with F.J. Doyle, III and B.A. Ogunnaike. He has published three encyclopedia articles and approximately 100 journal and conference papers.


Michael O'Connell, Director of Life Sciences,
Insightful Corporation

Michael O'Connell has been working in the medical device, diagnostics, pharmaceutical and biotech arena for the past 15 years. Dr. O'Connell's background and graduate work was in applied statistics and he has published more than 40 papers on statistical methods and life science applications including calibration, mixed models, and nonparametric regression. He has also written several statistical software packages and libraries using S-PLUS, R and SAS. Most recently he has been active in bioinformatics and the statistical analysis of microarray data; and in the development of tools for analysis and reporting of clinical and safety data from
S-PLUS.

Dr. O'Connell holds a Bachelors degree in Science from the University of Sydney, a Masters degree in Statistics from the University of New South Wales and a Ph.D. in Statistics from North Carolina State University.