In a decade, the global Human Genome Project sequenced 3 billion DNA base pairs. Today, a single machine (the Illumina HiSeq™ 2000) can sequence 25 billion base pairs per day, and BGI (the Shenzhen company formerly known as the Beijing Genomics Institute) has purchased 128 of them. This puts BGI “on track to surpass the entire sequencing output of the United States”.
These statistics are from a news article in Nature, “The Sequence Factory”, that also discusses controversies about the scientific status of the work, and mentions “the charge that the BGI has reduced science to brute mechanization”.
This is absurd, because this aspect of BGI’s work isn’t science to begin with: BGI is (merely!) providing data of enormous scientific value, enabled by state-of-the art instruments. This requires smart problem-solving by scientifically trained staff, some of whom participate in the science side of BGI’s work, but these connections are beside the point. Having science-based input and science-enabling output doesn’t change the nature of the task itself.
Wang Jun, executive director of BGI, jokes that “We are the muscle, we have no brain”. If engineering and production were brainless, and if that were all BGI did, this might be an accurate metaphor.
Comprehensive data is nature on display
Data-intensive science includes what looks like traditional scientific work writ large (collecting gigabytes of genome sequence, petabytes of sky-wide telescopic images) but collecting these comprehensive datasets lacks the hypothesis-testing aspect of science.
Rather than trying to force comprehensive data-collection projects into the mold of science, I think it’s better to view them as providing instruments that make nature more visible.
Genomic data can replace slow and costly gene-reading with fast and cheap database-reading. Within instrumental limits, synoptic sky surveys can replace scarce telescope time with unlimited data access. If reading from database isn’t a way to read nature, then reading any instrument, viewing any image, fails the same test.
Making new instruments isn’t science. It is merely a process of technology development that sets the pace of scientific progress and makes new sciences possible.
By the way, Wang Jun was indeed joking about “having no brain”. The same issue of Nature includes a landmark paper that tests hypotheses in addition to delivering massive amounts of genetic data that to open a new window on human biology: most of the authors of “A human gut microbial gene catalogue established by metagenomic sequencing” are with BGI, including both the first author and the last author, Wang Jun himself.
See also:
- Science and Engineering: A Layer-Cake of Inquiry and Design
- The Data Explosion and the Scientific Method
- Learning Bioinformatics



{ 3 comments… read them below or add one }
If anyone reading this knows how to suggest something for BGI, talk them into running the naked mole rat.
They live at least 10 times longer than other rodents of their size. It would be interesting to see how their genes differ from other mammals.
Bats would also be interesting.
“Rather than trying to force comprehensive data-collection projects into the mold of science, I think it’s better to view them as providing instruments that make nature more visible.” Are almost all of the basic principles of science already known? Is the main challenge of science to build the ultimate instrument, namely, superhuman intelligence?
The paradigm-shift rate is doubling every decade. — Ray Kurzweil
Ray Kurzweil is the best person I know at predicting the future of artificial intelligence. — Bill Gates
YouTube Ray Kurzweil: How technology’s accelerating power will transform us
A. The spatial and temporal resolution of brain scanning is doubling every year.
B. We will succeed in reverse-engineering the human brain in the 2020s.
C. If you go to the year 2029 we will have the full maturity of these trends.
D. Progress in technology is exponential — not linear.
Is comprehensive data collection with rapidly evolving technology now the fundamental paradigm of science?
Is proving the validity of Chapter 9 of Wolfram’s “A New Kind of A Science” merely a matter of detecting paradigm-breaking photons on a huge scale and then using such data to develop M-theory?
Nice.
When I first read EoC, I realized that eventually we were going to create methods to rapidly sequence every kind of DNA on the planet, build a massive DNA database, and use it to completely decode the programming language of biology. Then, once we knew exactly how to express ANYTHING in DNA programming, that was going to be merged with a simplified computer interface which would allow high level design of DNA strands, rapid creation of bio-organic tissue, and create a Genesis Machine. Ask it for a Unicorn, and it builds you one.
Between BGI’s Sequencing farm, and Tinkercell, that day is looking a lot closer.
{ 1 trackback }