Mountain of data


Image via Wikipedia

Many scientist believe that in the near future there will be breakthroughs due to our ability to generate huge data bases, and more importantly our ability to mine them.

GlaxoSmithKline have done a huge and expensive study on cancer, and as of June 20 2008 released a mountain of data for free. They are not releasing all of the data, but the vast majority of it.

The data is mostly if the form of microarray results which the company has given to Cancer Biomedical Informatics Grid (caBIG), part of the National Cancer Institute to house.

Here is a link to where the data is located. It contains the genomic profile of 300 cancer cell lines. The genomic profiles include both the results of SNPs microarrays and microarrays to measure mRNA transcript expression.

Now this is potential a fantastic resource of information for the hundreds of thousands of cancer researchers. I am sure the data in this bank could be the start of hundreds of Phd degrees.

I really hope that the new generation of smart open source scientist can harness this data and make gold out of straw (one example would be Shirley Wu).

Taking a slightly cynical route, do you think GlaxoSmithKline, or their shareholders, would be happy with the company giving away information that contains ‘value’? Or is it more likely they have used massive computer power all their bioinformatic personnel to strip out all the gold before releasing this mountain of data freely to the public ? Yes, this move will build some ‘good will’ for GlaxoSmithKline with the public which is of some worth, especially with the current climate of the publics trust in big pharmaceuticals. However, it seems unlikely that big pharma is really in the habit of giving away valuable information. Could it be that even after shifting through this mountain of data on cancer cell lines that they only found a few nuggets, but overall the mountain was barren? That would be a scary thought regarding future cancer research.