Metagenomics, the study of genetic materials recovered directly from environmental samples without isolating and culturing organisms, has become one of the principal tools of “meta-omic” analysis. It can be used to explore the diversity, function, and ecology of whole microbial ecosystems. The broad field may also be referred to as environmental genomicsecogenomics or community genomics. While traditional microbiology and and genomics rely on cultivated clonal cultures, early environmental gene sequencing cloned specific marker genes (often the 16S/18S rRNA gene) to produce a profile of diversity in a natural sample. Such work revealed that the vast majority of microbial diversity had been missed by cultivation-based methods. Recent studies use either shotgun (WGS) or amplicon (16S/18S) sequencing to get largely unbiased samples from all the members of the sampled communities. Shotgun metagenomics (also known as quantitative metagenomics) is more expensive but with a much higher resolution. This course will cover all the steps from sampling to data analysis.

    • Definition of Metagenomics
    • Technologies: 16S rRNA motivation and WGS motivation
    • NGS data preprocessing (QC, assembly, alignment, abundance counting)
    • Whole community analysis
    • Assembly, Binning, Analytical Pipelines.
    • Exploratory data analysis, Clustering and Multiple testing
    • Differential abundance testing, sequencing depth, rarefaction curves
    • Longitudinal analyses
    • Annotation and Functional analysis
    • Analyzing a metagenome and assessing metagenomics sequence quality
    • Comparing metagenomes: Big data analytics for metagenomics

  • Jean-Daniel Zucker

    Jean-Daniel Zucker graduated from the ENSAE National Higher School of Aeronautics and Space in 1995. He then graduated in artificial intelligence in 1992. He got his PhD in 1996 in Machine Learning from Paris 6 University where he became an associate professor focusing on Relational Machine Learning. In 2002, he became Full Professor of Computer Science at Paris 13 University where he started a laboratory on Medical Informatics and Bioinformatics (LIM&BIO) in which he was heading a team on Prediction Analysis for Transcriptomics Data. In 2008 he became a Senior Researcher at the national institute of Research for development (IRD) on the themes of Data Mining and Decentralized AI for Complex Systems modeling. He is now the director of the Mathematical and Computer Modeling of Complex Systems Laboratory UMMISCO (IRD & University Paris 6) that counts 67 permanent staff in France, Vietnam, Morocco, Senegal and Cameroun. He is also heading the Bioinformatics department called INTEGROMICS of the ICAN institute of cardiometabolism and nutrition. His research is focused on AI in finding approaches for the automatic construction of predictive models (supervised learning) or characteristic model (unsupervised learning or "clustering"). His main field of application is today Metagenomics of the gut microbiota and contributed to several European Networks in genetics and functional genomics (Diogenes, METAHIT, METACARDIS,...). His research is developed through International collaboration with Vietnam, China, Taiwan, USA, Italy. He has been posted in Vietnam for 5 years (2011-2015).

    Eugeni Belda

    Eugeni Belda is native of Spain, where he was graduated in Biological sciences (2006) and got a Master degree in Biodiversity and Evolutionary Biology (2008) and a PhD in Bioinformatics (2010) by the University of Valencia. He then moved to France, where he have worked as Post-Doctoral researcher at the Laboratory of Bioinformatic Analysis for Genomics and Metabolism (LABGeM) in Genoscope (2010-2013) and at the Genetics and Genomics of Insect Vector Unit of Institut Pasteur (2014-Nowadays). His research focuses on three main topics: (i) The study of the dynamics of prokaryotic genome evolution from the structural and functional annotation of genomes and metagenomes, (ii) The reconstruction of genome-scale metabolic networks and models from genome and metagenome sequence data, and (iii) the taxonomic and functional profiling of viral and microbial communities in natural mosquito populations.

    Ho Bich Hai

    Ho Bich Hai studied computer science in University of East Anglia (2006) and finished her PhD in Computational Biology at Japan Advanced Institute of Science and Technology (2012). Since 2007, she has joined Institute of Information Technology, Vietnam Academy of Science and Technology as researcher. Her current research in environment genomics focus on human gut and plant rhizosphere microbial communities and their applications in health and agriculture, respectively. She works mainly on next-generation sequencing data analytical pipelines and applies data mining/machine learning techniques in modeling microbial community diversity, dynamics and interaction with host of interest.

    Joseph Paulson

    Joseph Paulson studied mathematics and completed a Ph.D. in Applied Mathematics, Statistics and Scientific Computation as a National Science Foundation Graduate Fellow at the University of Maryland, College Park in 2015. Under the guidance of Mihai Pop and Hector Corrada-Bravo he began developing computational methods for the analysis of high-throughput sequencing data and in particular metagenomics. Afterwards, he moved to the Department of Biostatistics and Computational Biology at the Dana Farber Cancer Institute and Department of Biostatistics at the Harvard School of Public Health under the guidance of Professor John Quackenbush where he began thinking of large network based solutions to understanding host-pathogen interactions computationally. The core of his interests involves accounting for sequencing artifacts, like under-sampling, to get at robust biological and translational interpretations. He is committed to open-source software and has contributed to Bioconductor and popular metagenomic pipelines.

    Edi Prifti

    Edi Prifti graduated in biomedical informatics in 2007 and received his PhD in bioinformatics from the Pierre and Marie Curie University (Paris) in 2011. His research focused on integrative centrality measures in omics derived networks applied to complex diseases such as Obesity and diabetes. Since 2010 after he joined the INRA/MetaGenoPolis lab he focused extensively in developing methods and tools (MetaOMineR package suite) for the analysis of very large quantitative metagenomics data and applied them to multiple medical conditions (Obesity, Liver Cirrhosis, Diabetes, HIV, etc). In 2015 he joined the Institute of Cardiometabolic and Nutrition (ICAN) as a researcher and is at present the deputy director of the IntegrOmics department. He is particularly interested in exploring and understanding the microbial ecosystem that inhabits our guts and that is tightly associated with health and disease.