News & Views item - February 2013

 

 

The Name of the Game is BIG DATA. (February 11, 2013)

Chris Mattmann* in his January 23, 2013 comment piece for Nature discusses the challenges and utility of dealing with what has come to be knows as Big Data, and his one-line take-home message? To get the best out of big data, funding agencies should develop shared tools for optimizing discovery and train a new breed of researchers. And we might add that unless Australia joins in and contributes to meeting the challenges with its international cohort it will be left punching well below its weight.

 

As Dr Mattmann points out: "Funding agencies, such as the National Science Foundation and the National Institutes of Health in the United States, have created million-dollar programmes around the challenges of storing and handling vast data streams. Although these are important, I believe that agencies should focus on developing shared tools for optimizing discovery."

 

 

And big data come like Caesar's Gaul made up of three parts: "the volume of information that systems must ingest, process and disseminate; the number and complexity of the types of information handled; and the rate at which information streams in or out."

 

Dr Mattmann maintains that: "Rather than finding one system that can 'do it all' for any data set, my team aims to define a set of architectural patterns and collaboration models that can be adapted to a range of projects [and] four advancements are necessary to achieve that aim. [1] Methods for integrating diverse algorithms seamlessly into big-data architectures need to be found. [2] Software development and archiving should be brought together under one roof. [3] Data reading must become automated among formats. [4] Ultimately, the interpretation of vast streams of scientific data will require a new breed of researcher equally familiar with science and advanced computing.

 

He goes on to discuss the approaches he sees are required to overcome the inherent problems in dealing with these four "advancements".

 

For example there is the matter of dealing with "Many Formats":

 
Big-data systems must deal with thousands of file types and conventions. The communities that have formed around information modelling, ontology and semantic web software address this complexity of data and metadata (descriptive terms attached to files) to some extent. But they have so far relied on human intervention. None has delivered the silver bullet: automatic solutions that identify file types and extract meaningful data from them.

 

And then there is the matter of competent individuals:

 

Because big-data fields stretch across national as well as disciplinary boundaries, such facilities and panels must be international. In centres of excellence around the world, such as the JPL, data scientists will help astronomers and Earth scientists to share their approaches with bioinformaticians, and vice versa.

For the specialism to emerge and grow, data scientists will have to overcome barriers that are common to multidisciplinary research. As well as acquiring understanding of a range of science subjects, they must gain academic recognition.

 

And he concludes: "Empowering students with knowledge of big-data infrastructures and open-source systems now, will allow them to make steps towards addressing the major challenges that big data pose."

 

It was just over thirty years ago that Barry Jones had published Sleepers Wake! Technology and the Future of Work to which we might add and Science.

 

_______________________________________________________

*Chris A. Mattmann is a senior computer scientist at the Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California and adjunct assistant professor in computer science at the University of Southern California, Los Angeles, California.