
Once around the Sun
Metamodern is one year old today, and I wish I’d started a blog years earlier. I have some notes on popular posts in the last year — there are some interesting patterns that I’ve been pleased to see — but here, today, it seems fitting to revisit the first.
My 25 October 2008 post, “The Data Explosion and the Scientific Method” addressed a question that was discussed in Science earlier this month: What is the relationship between traditional hypothesis-driven science and the new data-driven science of [bio-prefix]-omics, full-sky astronomy, and so on? This has been controversial.
The Science article, “The Coordinates of Truth” has a nice, brief characterization of a central aspect of data-driven science: it can be viewed as “hypothesis generating” science, and thus as a straightforward complement to traditional “hypothesis testing” science. This perspective should damp down some of the controversy.
The real problem with data-driven science
However, data-driven science becomes more messy, methodologically and conceptually, when generation and testing of hypotheses are both based on the same, enormous data sets, and when the hypotheses to be tested are products of an automated search for patterns. Thousand-to-one odds in favor of a hypothesis (based on the usual kind of analysis) don’t mean much when a million hypotheses were screened to find it — but the evidence is the same, so what is the problem?
In other words, What is so special about starting with a human-generated hypothesis? Bayesian methods suggest what I think is the right answer: To get from probabilistic evidence to the probability of something requires combining the evidence with a prior expectation, a “prior probability”, and human hypothesis generation enables this requirement to be ignored with considerable practical success.
Evidence is not enough
To see why evidence is not enough, suppose that I pull an ordinary coin from my pocket, flip it 10 times, and get heads each time. I will still think that the probability of tails is about 1/2. Nonetheless, If I were watching you flipping what is allegedly an ordinary coin that you pulled from your pocket, I’d seriously consider other possibilities. The evidence for bias in each coin is the same, but my prior expectations are not. I’m biased against the idea that my coin is biased.
In science, however, a fully Bayesian approach has often met resistance (declining, these days) because it necessarily begins with explicit prior-to-evidence expectations that might seem to tarnish scientific objectivity.
The virtue of human-proposed hypotheses
The act of putting forward a hypothesis, however, tacitly assigns a significant probability of truth to it, and this has been sufficient for to science to work while using methods of data analysis that pretend that evidence more-or-less implies probabilities — methods that include no (explicit) prior expectations that might tarnish scientific objectivity. The million-hypothesis problem helps to show that the validity of these methods is a fiction, although it has served well for many years.
Thus, I think that some of the discomfort with data-driven science is closely linked to discomfort with Bayesian methods. Recognizing this may help in thinking more effectively about scientific methods in the new era.



{ 5 trackbacks }
{ 8 comments… read them below or add one }
edeast 10.26.09 at 2:51 am UTC
Thanks for blogging.
Benj 10.26.09 at 4:14 am UTC
Hi Mr Drexler,
what is the “correct” approach to unprovable, can’t experiment / test hypothesis ?
For example, what if someone declares that he believes there is a tea cup on Saturn’s titan.
Since this can’t be absolutely proved or disproved, what is the healthiest approach ?
There certainly isn’t 1/2 chance of a tea cup on Titan, but how to get there ? How to establish the probabilities to build a correct approach to such absurd hypothesis ?
Thanks,
Benj
Sam Ghandchi 10.28.09 at 8:03 am UTC
Dear Dr. Drexler,
I read your post last year and I think you have touched on a very critical issue. I remember Popper’s take on objective knowledge was that when a scientific theory is proposed it entails its conditions for rejection and all scientists will try to falsify it. True that not everyone agreed with him and to this day people like Martin Gardner make sure to dispute not only falsification but Popper’s rejection of induction as a whole. But whether one is talking about verification and testing or falsification, one is talking about seeking data to contradict a theory, whereas in our times, as you have brilliantly noted, it is the other way, and fast stream of data is happening continuously, maybe even before a theory is proposed! Now if one concludes that some form of probabilistic model used by supercomputers can postulate theories to fit the data automatically, that would again be back to *induction*, although done very powerfully by computers but that is not the main outcome of this situation. It seems to me what is even more interesting to note in this respect is the way falsification of theories proposed or not proposed is happening without even one working for it, without the targeted effort Popper would have thought. Internet definitely is helping this pattern a lot, right before our eyes and this itself is a powerful pattern, if we can call it a pattern! And regardless, it definitely is discarding theories when corroborating evidence turns out to support the opposite assumption thus what had lasted for centuries are dying out at a speed never thought of before. Some kind of survival of the fittest unprecedented in the realm of human knowledge itself. Thank you for taking the time to share your broad knowledge for all to benefit from.
Best Regards
Sam
Michael G.R. 10.28.09 at 10:28 am UTC
“Thanks for blogging.”
Seconded.
Eric Drexler 10.29.09 at 9:06 pm UTC
@ Benj — The Bayesian approach (for personal decisions, at least) would recommend that you start with a “prior” probability estimate that incorporates everything that you know, whether this is based on formal evidence or not. Then mathematical principles then define a Bayesian update procedure that tells how to modify your estimate in response to fresh evidence. Provided that one has not (arrogantly) assigned an actual zero probability to anything, any and all prior estimates will eventually converge toward the same result, given sufficient evidence.
So, in the case of the teacup on Titan, my prior probability estimate would be extremely low — so low that, if an image said to be from the Huygens lander had shown a teacup, I’d be almost certain that it isn’t real (expecting it to be a joke or a fraud of some sort).
In scientific analysis (by contrast to personal decisions) it is important to minimize the subjective element, and so the standard methods begin with a so-called “uninformative prior” — in the case of the coin-flip, a natural choice would assign an equal probability to all degrees of bias, with P(heads) = 0.5 ± epsilon being treated as no more likely than P(heads) = 0.17 ± epsilon. There is, however, no natural, mathematically-based way to apply this approach to the teacup-on-Titan problem. You’re right in thinking that there’s a conceptual difficulty in choosing a supposedly-objective prior in a case like this.
Eric Drexler 10.29.09 at 9:48 pm UTC
@ Sam Ghandchi — You speak of Popperian falsification, and of knowledge evolving, a “…kind of survival of the fittest…in the realm of human knowledge itself”.
The two are closely linked conceptually, and William W. Bartley III, who studied under Popper, was a founder of what is called “evolutionary epistemology”. I think that evolutionary epistemology is the right approach to the fundamental question of how knowledge emerged in a world that began as accreted rock and progressed to worms with eyes long before the first ape, or the first spoken word.
Dogs acquire knowledge through life experience, and plant species acquire knowledge — in the sense of an alignment of implicit expectations and behaviors with reality — through what might be called “evolutionary experience”. Any philosophy that cannot see these as kinds of knowledge seems to me to be radically defective, and prone to losing itself in word games, or to plunging into another fruitless search for a deep, solid, indisputable “foundation of knowledge”.
Human knowledge is a great web of densely connected parts, not a stack of blocks with a bottom that must rest on something else. It has central and peripheral regions, perhaps, but not a top or bottom.
Eric Drexler 10.29.09 at 10:27 pm UTC
@ Michael G. R. — I see that you’ve been reading Judea Pearl’s book, Causality. It’s a landmark, and for a reason that I find shocking: Before Pearl’s recent work, there had been no correct, formal understanding of the relationships that link actions, evidence, and inferences of causality. Boggling.
(The memorable example for me: If I observe that the sidewalk is wet, I take regard this as evidence that it rained last night. If I wash the sidewalk with a hose, and then observe it to be wet…. I find that many formalisms cannot represent the situation. Ooops.)
Benj 10.31.09 at 4:03 am UTC
Thanks for your answer Dr. Keep blogging !