Previous Collaborations


  • The creative industries in days of isolation - a fast pace shift to making, learning and living in a crisis

    Chief Investigator: Dr Kathryn Coleman (MGSE)
    MDAP Collaboration Lead: Dr Kristal Spreadborough

    We are witnessing a cultural and pedagogical evolution. The current global crisis triggered by COVID-19 has caused an unprecedented societal shift. This has been starkly epitomised within the arts, creative industries, and education sectors – our first responders to societal change. These sectors have seen rapid shifts in their norms, practices, and pedagogies and have had to creatively adapt to ensure their survival and longevity in an environment mediated by physical isolation, social disruption, and health and economic crises. Drawing on a diverse range of big data sources, this research takes a data-led approach to examine the interplay between data as practice and practice as data. It maps the evolution of norms, practices, and pedagogies within the creative industries in preparation for the “new normal”.

  • The megalithic jar sites of Laos

    Chief Investigator: Dr Louise Shewan (Science)
    MDAP Collaboration Lead: Dr Aleks Michalewicz

    This interdisciplinary collaboration involves the integration of data generated from the Plain of Jars Archaeological Research Project: UAV acquired imagery, large point clouds of archaeological sites, photographs of more than 1000 megalithic jars, photogrammetry models of burials, analyses of human remains (isotopic, ancient DNA, osteological, synchrotron Micro-CT), artefact conservation and analysis (ceramic, beads, metals), 3D models of material culture, dating analyses, and the geological characterisation of jar and quarry samples. Work with MDAP involves data management and curation, virtual/augmented reality and machine learning capability to create educational/research offerings and to facilitate ongoing archaeological research in inaccessible environments contaminated with unexploded ordnance (UXO), visualisation proficiency to share project outcomes, image and video processing, data analysis, and innovative web development to ensure open access to dynamic and unique datasets.

  • Causality in Complex Dynamical Systems: Implementing Core Algorithms in a Distributed Computing Environment

    Chief Investigator: A/Prof Michael J. Zyphur (FBE)
    MDAP Collaboration Lead: Dr Edoardo Tescari

    This project will revolutionize the ability of social and physical scientists to characterize and assess causal effects in complex dynamical systems. These systems are difficult to study using traditional methods including experiments and model-based regressions because their assumptions are easily violated by nonlinear system evolution and causal effects. The broad aim of this project is to implement distributed computing methods for analyzing data from complex dynamical systems. The project will implement two core empirical dynamic modeling (EDM) algorithms on a distributed computing platform with a manifold learning method for data imputation and denoising.

  • Automating image segmentation and morphometric analysis for accelerating neuroscience at the nanoscale

    Chief Investigator: Dr Calvin Eiber (MDHS)
    MDAP Collaboration Lead: Zaher Joukhadar

    The aim of this project is to develop automated image processing tools to identify and measure nerve cell axons in electron microscope (EM) images. We aim to do this in three stages: by implementing existing algorithms for the automatic segmentation and analysis of myelinated axons from conventional transmission EM images, by developing new algorithms to handle unmyelinated axons, and by developing algorithms and data handling processes for the automatic processing of 3D tissue volumes generated by cutting-edge EM techniques like serial block-face scanning electron microscopy (SBF-SEM). The broader goals are to improve our understanding of how the brain communicates with the organs of the body, as well as how different brain regions communicate with one another, in order to support the growing field of bioelectronic medicine.

  • An Empirical Study of How the US Termination Right Operates (part of the Author’s Interest Project)

    Chief Investigator: A/Prof Rebecca Giblin (Law)
    MDAP Collaboration Lead: Dr Daniel Russo-Batterham

    The Author’s Interest Project is investigating how changes to copyright law could help increase the opportunities authors have to profit from their work, whilst improving public access to knowledge and culture. We’re particularly interested in reversion rights, which see copyrights returned to authors after a certain amount of time or when they are no longer being exploited. More than half the world’s nations have statutory reversion rights, but their effects are woefully under researched. This sub-project is to investigate the operation of a US reversion law that allows authors to terminate copyright grants 35 years after they are made. There have been no in-depth studies investigating how often this scheme is used, by who, and with what effects. Using the publicly available US Copyright Office registration database, the Project intends to fill this gap, contributing rigorous new evidence to debates about what kind of rights can help get authors paid and books read.

  • Mapping our emotional lives: Building a searchable database of experience sampling data on emotional processes

    Chief Investigator: Dr Elise Kalokerinos (MDHS)
    MDAP Collaboration Lead: Dr Daniel Russo-Batterham

    Experience sampling involves following participants into their everyday lives using smartphones, taking multiple measures in real-time to get a more meaningful picture of psychological processes. Experience sampling methods are central to understanding both normal and disordered emotional responding: they allow us to capture strong and meaningful emotional events that can’t be recreated in the lab, to map dynamic fluctuations of emotions across time, and to study how these processes contribute to psychological well-being. However, conducting experience sampling studies is costly, time-intensive, and requires expertise. In addition, research using these methods generates large, rich datasets, which are often not used to their fullest potential. To reduce the barriers to using experience sampling data, and to harness the reanalysis potential of these datasets, we are building an open-access, searchable, and cumulative database of our existing experience sampling datasets.

  • Natural Language Historical Mapping

    Chief Investigator: Dr Mitchell Harrop (Arts)
    MDAP Collaboration Lead: Dr Emily Fitzgerald

    The Melbourne History Workshop ( in the School of Historical and Philosophical Studies is working on an Australian Research Council Linkage Infrastructure, Equipment and Facilities grant on the challenge of Time-Layered Cultural Mapping ( TLCMap is an open online facility to create maps and timelines and combine new Australian cultural and historical datasets with existing ones. The project is developing infrastructure around the Gazetteer of Historical Australian Placenames (GHAP) API and Recogito ( This includes Named Entity Recognition, time recognition, Sentiment Analysis, and other common text processing techniques. We will be working with MDAP to apply these techniques to existing datasets of Encyclopedia of Melbourne entries (, historical news stories from the National Library’s Trove (, podcasts (, tweets and three photographic collections. The application of these techniques will result in the improvement of the original datasets for use by amateur and professional researchers.
    Our methods will be documented in a cookbook-style publication, leveraging the strong background of the MDAP workforce in pedagogy and translating highly technical research methods to an audience that includes both Graduate Researchers and even some amateur historians.

  • Early Modern Women Translators in Europe (15th-18th c.)

    Chief Investigator: Prof Véronique Duché (Arts)
    MDAP Collaboration Lead: Bobbie Shaban and Dr Emily Fitzgerald

    Building on a previous research project (Histoire des Traductions en Langue Française, XVe- XVIe siècles, ANR project – equivalent to ARC – finished in 2015), chapters published (Duché, 2015, 2016, 2017) and papers given at international conferences (RSA, SRS, 2018), this project will further investigate Early modern Women translators in Europe. It seeks to promote cultural awareness by charting the contours and transnational connections of early modern women translators through the use of a sophisticated, interactive visualisation tool. In particular, the project aims to:

    • identify the early modern women translators (not only in France, but also in other European countries);
    • map the intellectual as well as personal and cultural networks to which these women belong (author, publisher, reader etc.);
    • create a repository for metadata on early-modern scholarship.

  • Unlocking published metagenomes as a source of information for microbial eukaryotes

    Chief Investigator: A/Prof Heroen Verbruggen (Science)
    MDAP Collaboration Lead: Dr Mar Quiroga and Bobbie Shaban

    Microbial eukaryotes play key roles in earth’s ecosystem and human lives, but knowledge of their biodiversity and evolution are very incomplete. This collaborative project between MDAP, the Faculty of Science, Faculty of Medicine, CSIRO and Royal Botanic Gardens Melbourne aims to bring big data to this field of research, advancing our knowledge of microbial eukaryotic biodiversity and evolution by extracting their organelle genomes (mitochondria & chloroplasts) from publicly available metagenomes. Through this project, we aim to achieve: (1) A fully functioning pipeline for automated download of metagenome reads, their assembly and calculation of contig statistics, (2) A large library of assembled metagenomes generated with this pipeline, permitting testing of alternative strategies for organelle genome binning, (3) Advanced proof of concept for externally funded project proposals based on these ideas.

  • Development of the PolyMuse Database, to map conservation issues with polymer based plastics in museum collections

    Chief Investigator: Dr Petronella Nel (Arts)
    MDAP Collaboration Lead: Karen Thompson

    PolyMuse is a collaborative Australian Research Council funded project between four universities and six heritage institutions, aiming to develop methods for extending the lifespan of vulnerable polymer based plastics in heritage collections. Data is being collected from different locations and collated into a relational database, with the ultimate aim of aiding analysis and providing access to heritage professionals. This MDAP collaboration will create a "universal-adapter" for bringing data already collected into the database, and to build a streamlined data collection process for future data collection.

  • Machine Learning and Patterns of Primary Care Utilisation in Cancer Diagnosis and Outcomes

    Chief Investigator: Professor Jon Emery (MDHS)
    MDAP Collaboration Lead: Zaher Joukhadar and Dr Mar Quiroga

    The Victorian Comprehensive Cancer Centre-funded Data-Driven Research Program was the first in Australia to link large-scale primary care and hospital data for the purpose of enabling increased capabilities in health services research. Until now, traditional analytical and statistical methods have been utilised to examine and map patterns in primary care attendances and how these are associated with cancer diagnoses, treatment and outcomes. These are generally based on the existing evidence base to test prevailing hypotheses and apply these to the local context. This project seeks to apply novel analytical and machine learning methods to the large-scale linked data sources in order to generate new hypotheses and further characterise how people with cancer engage with primary care services. Specifically, this project would aim to identify patterns in various aspects of primary cancer attendances prior to a definitive cancer diagnosis (as identified in linked data sources). These could include prescribing, test requests/results, semi-coded fields such as ‘reason for encounter’ and co-morbidities/other conditions captured in primary care management systems.
    The project would identify a specific cancer type (i.e. Upper Gastrointestinal) with sufficient linkages between hospital diagnosis and primary care data and apply tools and algorithms to detect patterns present within specific timeframes of diagnosis or treatment.

  • Evaluation of automated deidentification of hospital and general practice health records

    Chief Investigator: Professor Karin Verspoor (MSE) and A/Prof Douglas Boyle (MDHS)
    MDAP Collaboration Lead: Dr Simon Mutch and Dr Noel Faux

    Text analysis is an emerging field in health data research. Use of textual health data for research purposes has been limited in Australia to date, due to concerns around the personal identifying information they may contain. The aim of this project is to develop, test and evaluate methods to deidentify textual data sets from electronic medical records in both hospital and general practice settings so these data can be securely extracted for further analysis using natural language processing. The project is a collaboration between the School of Computing and Information Systems (Engineering), the Department of General Practice, the School of Population and Global Health, and the Centre for Digital Transformation of Health (MDHS). The results of this project will facilitate larger scale use of clinical textual data for research purposes in Australia and will provide opportunities for future research with world-class collaborators within the University and externally.

  • Using novel analytic methods to predict and prevent falls and fall-related injuries in older persons with dementia.

    Chief Investigator: Professor Gustavo Duque (MDHS)
    MDAP Collaboration Lead: Dr Noel Faux

    Six million Australians will be diagnosed with dementia in the next 20 years, at a cost of more than $1 trillion. One third of them will die due to traumatic falls, which are, in the majority, predictable and preventable. Although most of the larger longitudinal studies on ageing have included musculoskeletal and cognitive variables that could help to predict and prevent falls in people with dementia, no studies have developed accurate predictive tools to identify those at risk of falling. We propose the development of transdisciplinary links between several major databases including the Canadian Longitudinal Study on Aging (across Canada), the Quebec Longitudinal Study on Nutrition and Aging (Quebec, Canada), the Women’s Healthy Ageing Project (Metropolitan Melbourne), and the Geelong Osteoporosis Study (Geelong, VIC) to combine and compare cognitive and musculoskeletal data from 35,000 participants, and characterise and model the risk for incident falls and fall-related injuries in cognitively impaired and healthy older adults participating in these studies. Our preliminary analyses have demonstrated that Artificial Neural Networks and Factor Mixture Models may improve the performance criteria of fall prediction in older persons compared to classical linear models. We hypothesise that these models could be helpful in the identification of patterns of risk factors for falls and fall-related injuries in older persons with dementia. The results of this strong international collaboration (facilitated by MDAP) will pave the way for the development of new care pathways for people with dementia, including a future application that can be used in clinical practice to screen for fall risk in this population.

  • Melbourne Pollen: Using Machine Learning to predict pollen counts based on vegetation, weather, land use and geographical data

    Chief Investigator: A/Prof Ed Newbigin (Science)
    MDAP Collaboration Lead: Usha Nattala

    Our vision is to help the millions of Australians affected by allergies to airborne pollen and other allergens by discovering, translating and delivering research solutions to help better manage their condition. Our current immediate project goal is to use advanced machine learning algorithms to produce a novel forecast model based on our extensive pollen databases, citizen science responses and various environmental inputs including satellite weather, landuse and vegetation data. Machine learning techniques make it possible for us to deal with large multivariate datasets; set up forecasting processes that are capable of identifying and extracting patterns in data without explicit instructions; continuously learning and improving as they go along without the need for excessive human intervention

  • Rural land use classification algorithms for emergency animal disease responses

    Chief Investigator: Dr Simon Firestone (FVAS)
    MDAP Collaboration Lead: Jonathan Garber

    In an emergency animal disease situation, such as an outbreak of African Swine Fever in Australia, accurate classification of current land uses for different types of farming are needed to facilitate an effective response. The aim of this project is to leverage high resolution aerial imagery with state of the art image classification algorithms in order to identify the types of livestock being raised on a farm. Our goal is to build an artificial intelligence pipeline using high resolution imagery stored in mediaflux, and the GPU compute power of the Spartan HPC, in order to predict the land use of a given farm image. Additional GIS layers will be added to the predictive model to improve the accuracy. The output of this collaboration will be used by stakeholders in order to automatically map the geospatial risks and aid in the response to an animal epidemic.

  • Introducing new analysis modalities to the Stemformatics Stem Cell Atlas

    Chief Investigator: Professor Christine Wells and Dr Jarny Choi (MDHS)
    MDAP Collaboration Lead: Priyanka Pillai services a local and international community of stem cell researchers, who use the resource to share and interrogate stem cell studies. It is an online portal associated with a database of curated stem cell experiments. The collaboration with MDAP will enhance the capability of the Stemformatics atlas by creating new modalities for analysis and visualisation of data. This involves mining the data to find biologically meaningful features in sample space, developing new visualisation prototypes and query wizards pertinent to these, as well as improving the sample annotations in our databases to enable more powerful data analyses.

  • Sustainable Energy Advocacy Coalitions

    Chief Investigator: Dr Alfonso Martinez Arranz (MSE)
    MDAP Collaboration Lead: Kim Doyle

    Climate and energy policy is a highly divisive issue in Australian politics; however, beyond partisan caricatures, little is known about the exact nature of the divisions or their boundaries. This project addresses this gap in the scientific understanding of socio-political divisions around energy policy. The project adopts a data-intensive methodology structured in five parts: online data mining, quantitative surveys, semi-structured interviews, Delphi-method survey, and community meetings.


  • Forecasting global diversity in a changing world, Dr Payal Bal, Science

    Although the effects of threats to biodiversity can be characterised at local scales, linking this highly resolved, local knowledge to emergent patterns at the global scale is a massive conceptual and computational challenge. This collaboration aims to build an integrated data and analysis pipeline that will set a gold standard for the integration of computationally intensive analyses and data in ecology, including alternative climate and socio-economic scenarios at annual, decadal and millennial timescales, thereby identifying current and future hotspots of nature-land use conflict.

  • Using 6 billion social media posts to understand gendered hate speech, domestic violence and mental illness, Dr Khandis Blake, MDHS

    Having already established that social media posts can be used to test predictions about the effects of city, state, and country-level variables (eg. income inequality) on gendered behavior, this collaboration will interrogate an eight-year dataset of ~6 billion geolocated tweets to understand how sentiment on social media can predict and reveal real-world problems through the creation of a database that can be queried by tweet text, username, location, and date.

  • Sixth coupled model inter-comparison project climate model data analysis and visualisation, A/Prof Malte Meinshausen, Science

    The world's most comprehensive climate models are currently making projections of 21st Century climate as part of the Sixth Coupled Model Inter-comparison Project (CMIP6). These projections will inform climate science and decision making over the next decade and beyond. This collaboration aims to reformat this 10PB of specialised binary format data so that it can be easily used with widespread data analysis tools.

  • Ready, set, go LawTech: unleashing the power of information extraction in the legal domain, Prof Jeannie Paterson, Law

    Whilst freely available digital legal information (eg. most Australian court judgments since the 1980s) exists, it is generally not in a machine-readable form. And legal judgments are often very lengthy and its time-consuming to extract the pertinent sections. This forms a barrier to the bulk analysis of judgments that would enable the study of overarching trends and patterns. This collaboration aims to use Natural Language Processing to train a machine learning model to automatically extract information from legal judgments.

  • Vote compass: understanding campaign dynamics and political representation using data from a voter literacy platform, Dr Aaron Martin, Arts

    The voter literacy platform Vote Compass has data collected during the 2013, 2016 and 2019 federal elections in excess of 1 million responses from each election campaign. This collaboration will use this data to conduct in-depth research on campaign dynamics, political representation and political geography.

  • Machine learning for clinical and preclinical diagnosis of neurodegenerative disease using speech data, Dr Benjamin Schultz, MDHS

    Only Speech changes with altered brain function. Acoustic analysis of speech provides objective data on these changes. Acoustic features differ between healthy speakers and individuals with disease and evolve over the course of disease. On this basis, sophisticated speech biometrics can act as a proxy for brain integrity and may assist in optimising diagnostic pathways or identifying symptom onset in neurodegenerative diseases. The role of the Melbourne Data Analytics Platform is to liaise with the Neuroscience of Speech team and facilitate the development of machine learning algorithms using open source software.

  • Supporting Indigenous governance of Indigenous data: The Indigenous Data Network and the Kaiela Institute, Dr James Rose, MDHS

    The Indigenous Data Network (IDN) at the Indigenous Studies Unit, Melbourne School of Population and Global health, is an unprecedented national initiative aimed at providing support to Indigenous-controlled Research Organizations (ICROs) around the Australia. The IDN is assisting ICROs to consolidate, organise and leverage the data assets already in their possession, towards better evidence-based service delivery for the communities that they represent. Among numerous ICROs partnered with the IDN, the Kaiela Institute in Shepparton, Victoria is the first in that state to successful coordinate a practical governance strategy that brings these support streams to an implementation stage. Together with the Kaiela Institute and the IDN, MDAP data stewards and researchers will contribute to the development of a data governance platform for the Kaiela Institute, that will allow its Indigenous researchers to hold, control, and process data in the interests of better service delivery to their communities.