|
|
Mining Imperfect Data
Presented: Tuesday, April 18th, 2006
Speakers: Ronald Pearson, ProSanos Corporation
and Michael O'Connell, Insightful Corporation
This presentation considers the origins, consequences, and treatments
of a variety of important data anomalies, including missing data, outliers,
misalignment errors (both record and field misalignments), disguised missing
data, and various local anomalies (e.g., time-series outliers and patchy
record anomalies). All of these problems are illustrated either with real
datasets or published accounts to demonstrate their practical importance,
character, and severity. Many of the examples discussed involve clinical
datasets, but it is important to emphasize that the problems described
here can arise in any application area, with potentially severe consequences
if they are not handled correctly. In particular, all of these data anomalies
have the potential of introducing large biases into analysis results,
especially the more subtle ones like misalignment errors or disguised
missing data, which can easily go undetected.

Ron Pearson, ProSanos Corporation
|
Ronald
K. Pearson is Senior Scientist with ProSanos Corporation
and holds an adjunct faculty position with the
Department of Pathology, Anatomy and Cell Biology
at Jefferson Medical College, Thomas Jefferson
University in Philadelphia. His primary research
interests are in the areas of exploratory data
analysis, particularly for large observational
clinical datasets, the analysis and identification
of nonlinear discrete-time dynamical models, and
nonlinear digital signal processing, particularly
as it relates to cleaning time-series data. He
is author of the books Discrete-Time Dynamic Models
(Oxford University Press, 1999) and Mining Imperfect
Data (SIAM, 2005), and coauthor of the book Identification
and Control Using Volterra Models (Springer, 2001)
with F.J. Doyle, III and B.A. Ogunnaike. He has
published three encyclopedia articles and approximately
100 journal and conference papers.
|

Michael O'Connell, Director
of Life Sciences,
Insightful Corporation
|
Michael
O'Connell has been working in the medical device,
diagnostics, pharmaceutical and biotech arena
for the past 15 years. Dr. O'Connell's background
and graduate work was in applied statistics and
he has published more than 40 papers on statistical
methods and life science applications including
calibration, mixed models, and nonparametric regression.
He has also written several statistical software
packages and libraries using S-PLUS, R and SAS.
Most recently he has been active in bioinformatics
and the statistical analysis of microarray data;
and in the development of tools for analysis and
reporting of clinical and safety data from
S-PLUS.
Dr. O'Connell holds a Bachelors degree in Science
from the University of Sydney, a Masters degree
in Statistics from the University of New South
Wales and a Ph.D. in Statistics from North Carolina
State University. |
|