High-throughput data processing for ocean plankton diversity mapping and modelling in the age of climate change
Although next generation sequencing casted off the ceiling of limitations on the amount of data that can be collected in metagenomic and metatranscriptomics studies of environmental microbial diversity, the massive wall of sequences now facing us clouds our ability to easily interpret and predict underlying patterns structuring the data. In the framework of BioComp we united the power and expertise of scientists from different disciplines (microbial ecology, algorithm, network and graph theory, mathematical modeling) to analyse taxonomic marker genes from microbial eukaryote plankton (ciliated protists) collected from ocean sampling sites around the world. These data were collected in the framework of the TARA Oceans expeditions (http://oceans.taraexpeditions.org), which sampled the world’s oceans on a three-year circumglobal voyage to study and understand the impact of climate change and the ecological crisis facing the world's oceans. For eukaryotic plankton, TARA Oceans has thus far produced a data set of taxonomic marker genes consisting of 1.3 billion sequences with 4.122.916.564 nucleotides. The first step in the computational analysis of such a large data set requires a strategy (algorithm) for data pre-processing (e.g. quality filtering) and processing (e.g. clustering of sequences). This strategy will be developed and applied in the framework of BioComp. In the second step, underlying patterns structuring the data will be revealed using network approaches (graph theory). The results will then be included into predictive models to analyse potential effects of climate change on ocean plankton dynamics. To complete the Systems Biology approach, predictive models will be experimentally tested and evaluated by subjecting artificial plankton communities to environmental changes that were predicted with a significant impact on ocean plankton community structures.