The Data Explosion and the Scientific Method

by Eric Drexler on 2008/10/25

Scientists in an increasing number of fields are doing science in new ways, exploiting powerful new data-collection technologies with the aid of computational methods and a little humility.

Tradition demands that science always be hypothesis-driven: First, try to guess the truth, and only afterward collect experimental data to test whether the guess predicts the results. Indeed, this has been termed “The Scientific Method”. The new data-driven approach suggests that we collect data first, then see what it tells us. This becomes practical when experimental methods can amass enormous amounts of data, enough data to test more hypotheses than any mortal scientist could conceivably imagine.

The adoption of data-driven approaches has been surprisingly controversial: In a “The Human Genome Project: Lessons from Large-Scale Biology”, a viewpoint article in Science magazine, Collins, Morgan, and Patrinos observe that

Some of the most significant lessons date to the HGP’s formative days in the mid-1980s, when a handful of visionaries dared to break ranks with the prevailing view that biological research must always be conducted as a hypothesis-driven enterprise.

The basic idea is that if we can collect enough data to form a large, rich picture — as in modern genomics, but not in old-style gene-by-gene investigation — then we are likely to learn something by looking at it. This can be seen as a hypothesis, but a very humble one. There is no pretense here that every possibility can be guessed beforehand.

But what does it mean to “look at it”? For these methods to work, we must know enough about patterns (repetition, correlation, difference, functional correspondence…) that we can recognize some of them and separate the real patterns from the statistical illusions. This too is a hypothesis, but there is no pretense of vast insight.

Stepping back for a broader view of science makes it obvious that the “new” approach is, in some fields, very old. Astronomers and microscopists, for example, did data-driven science centuries ago. They gathered optical data (images on retinas or photographic film), then made discoveries by applying the powerful pattern-recognizers in the human visual system.

Whether literally or metaphorically, scientists have used data-driven approaches in many fields, including biology. Data-driven methodologies in biology were controversial, but necessary to make genomics possible. As the engines of data collection and automated pattern recognition grow more powerful, more fields of biology are following that lead.

Next post: