Come join us

PhD position in Machine Learning for metagenome binning

Metagenomics makes use of shotgun sequencing of environmental samples (e.g. water, air, soil) to identify the biodiversity and functional roles of microbiota, many of which are yet unknown to science. Metagenomic binning, the grouping of assembled contigs into bins corresponding to species-specific genomes (called MAGs or metagenome-assembled genomes), is a key step required to gain insight into the biodiversity, genomic repertoire and functional roles of individual microbial species. Current methods separate contigs into bins based on base composition (GC%, k-mer frequency) and coverage (abundance) profiles. These tools have proven beneficial, but there is significant potential for improvement, especially to assign short contigs and separate closely related species.

This project aims to develop sophisticated metagenome binning tools that combine multiple sources of evidence, spanning traditional approaches based on k-mer composition and coverage with several innovative elements, including phylogenetic affinities, assembly graphs and lineage-specific marker genes. You will design improved clustering and classification algorithms based in machine learning to generate the finest resolution and purest MAGs. You will use supervised and unsupervised learning methods, including new algorithms developed in Halgamuge's lab, such as ENVirT (Jayasundara et al. 2019), which does not rely on prior knowledge of data composition.

The successful candidate must have sound programming skills, and show strong potential to learn and acquire skills and knowledge in an interdisciplinary research context. The candidate will work in a group of data scientists, computer scientists, algorithm designers, environmental microbiologists and evolutionary biologists, providing opportunities to acquire skills and knowledge across these research areas.

For Australian and New Zealand applicants, this project is eligible for a strategic EMRI scholarship (link). Other scholarships are available for international applicants. Please send in your application to us as soon as possible and we will discuss the best scholarship option with the preferred applicant.

The supervisory team consists of Prof. Saman Halgamuge in the Melbourne School of Engineering, and Dr. Kshitij Tandon and A/Prof Heroen Verbruggen in the School of BioSciences. Saman is an expert in machine learning, with lab members active in algorithm development and their applications in different fields. Kshitij is a postdoctoral fellow in coral holobiont ecogenomics, with a strong interest in metagenomics methods development. Heroen has expertise in molecular phylogenetics and genomics and his lab is carrying out several projects in large-scale metagenomics data mining. Here are links to the Google Scholar pages of Saman, Kshitij and Heroen to give you a better idea of our research.

Instructions to applicants:

Please send an expression of interest statement, your CV, transcripts and samples of original writing (e.g. thesis) and computer code (e.g. link to github) by email to Saman (saman@unimelb.edu.au), Kshitij (kshitijtandon1990@gmail.com) and Heroen (heroen@unimelb.edu.au). We will expect your applications by 14 October at 5pm Melbourne time. Please take into account that Australia is nearly a day ahead of the Americas.

References:

  • Jayasundara D., Herath D., Senanayake D., Saeed I., Yang C.Y., Sun Y., Chang B.C., Tang S.-L. & Halgamuge S.K. (2019) ENVirT: inference of ecological characteristics of viruses from metagenomic data. BMC Bioinformatics 19: 377. Link
  • Senanayake D., Wang W., Naik S.H. & Halgamuge S.K. (2020) Self Organizing Nebulous Growths for Robust and Incremental Data Visualization. IEEE Transactions on Neural Networks and Learning Systems. Link
  • Herath D., Tang S.L., Tandon K., Ackland D. & Halgamuge S.K. (2017) CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision. BMC Bioinformatics 18: 571 Link