A key factor driving these collaborations is the power of big data. Today, the term big data is used to describe information on the scale of petabytes. How much is a petabytes? A large hard disk on a personal computer can store a terabyte. A petabytes is 1,000 times larger.
A few years ago, I attended a meeting on the Cancer Genome Atlas. (The Cancer Genome Atlas was an attempt to collect information on the gene, RNA, and protein expression patterns of major human malignancies.) They were collecting 15 petabytes of data a day. Clearly, just storing that amount of information in a manner that can be retrieved and analyzed is a technical challenge. But the task of understanding cancer biology requires information on this scale.
Why even try to do this for cancer? We know that cancer cells’ behavior is determined by a complex array of molecular changes. We hope that if we understand what drives a man’s prostate cancer, we might gain insight into how best to kill his cancer—or at least block its growth.
How do we go about examining data on this scale? Fortunately for those involved in bioinformatics, others have already addressed these issues. Google, Facebook, Amazon, and other technology leaders have spent years studying Internet information at this —and larger—scale. These tech leaders have made many of their insights and tools publically available.
One of the most promising approaches to analyzing complex information is to use computer programs that can learn, or machine learning. As a result, companies like Google, Amazon, Facebook, and Microsoft have major programs aimed at improving machine learning. One particularly successful approach to machine learning is based on deep neural networks.
The Google subsidiary Deep Mind has developed deep neural networks that can defeat the best human players at the game Go. Go is too complex for even the most powerful computer to analyze the consequence of every step. Instead, the computer program must develop an analog of human intuition. These deep neural network programs deal with a level of complexity that approaches the scale we encounter when we attempt to understand cancer cell biology.
What steps do we need to take to understand prostate cancer biology? First, we need to collect detailed information about the molecular changes that determine this cancer’s biology. This month’s conversations feature individuals and organizations either actively collecting and analyzing this data, or funding the efforts.