Previous Collaborations


  • Longitudinal Mapping of Residential Building Projects in Victoria

    Chief Investigator: Dr Vidal Paton-Cole (ABP)
    MDAP Collaboration Lead: Robert Turnbull

    The proposed research intends to analyse trends in building construction projects in the State of Victoria from 1996 to 2019. The researchers have sourced historical building permit data from the Victorian Building Authority (VBA), which includes a broad range of information relating to over 1.4 million building projects issued with building permits between 1996 and 2019 (in Excel format). This includes information on project location, cost, type, builder, architect, materials used etc.. In collaboration with MDAP, this data will be visualised within an interactive location-based tool and the data analysed to identify trends over time, including material use, construction type, prevalence of solar hot water systems and rainwater tanks. This will help to understand the nature of housing stock across different municipalities in the state of Victoria and also identify relevant correlations between building project characteristics, such as the prevalence of rainwater tanks by site area or local government area, or solar hot water systems by number of storeys.

  • Sustainable Development Goals for City of Melbourne

    Chief Investigator: Dr Alexei Trundle (ABP)
    MDAP Collaboration Lead: Geordie Zhang

    In partnership with the City of Melbourne, the Connected Cities Lab is developing an evidence-based framework for localising and embedding the UN Sustainable Development Goals (SDGs) in the City’s long-term strategic planning processes. Consisting of 17 Goals, 169 Targets and 231 unique indicators, the SDGs are an ambitious, overarching, globally agreed upon framework that aims to achieve worldwide sustainable development by the year 2030. The primarily research question for the project is “how can the UN SDGs be used to guide a city’s strategic planning while maintaining connectivity and comparability to other cities as well as the broader global sustainable development agenda?”.
    The research methodology draws on global best practice in translating SDGs target and indicators for the local context, based on an extensive literature review and consultation led by the research team with leading global cities and peak multilateral bodies, with a focus on the Asia-Pacific region. Phase one of the project included a 6-month internal SDG strategic environment assessment process that mapped existing City of Melbourne plans against the targets to ascertain current organisational SDG alignment. Procedures were developed iteratively through parallel project teams within both the University and the City.
    Through 2021 the project will focus on 'localising' SDG targets and associated indicators, aligning the existing city database of more than 600 indicators with the SDG framework, which will form the basis of reporting and prioritisation within the City's overarching integrated planning framework going forward. This will include city-to-city comparability and sub-municipal precinct dissagregation.

  • Linking song collections and communities: Open release and interoperability of a song database and linking tool.

    Chief Investigator: A/Prof Sally Treloyn (FFAM)
    MDAP Collaboration Lead: Dr Aleks Michalewicz

    There has been an exponential rise in Indigenous community use of archival song data to support the revival of song practice and knowledges, particularly where lineages of intergenerational knowledge transmission have been ruptured by colonisation. Research is needed to: 1. remove barriers to access including incomplete metadata and dispersed metadata and collections; and, 2. ensure newly-created data produced by ethnomusicologists and others, and Indigenous community members, are linked to legacy data. Non- interoperability of database and content management systems used by archives, and content management systems such as Mukurtu CMS, Keeping Culture CMS, and other systems, used in Indigenous communities, is also a barrier.
    The interdisciplinary team has previously developed a linking interface and database tool to ingest, record and link metadata associated with archival and newly-created records of Indigenous song using FileMakerPro: the Discovery Database Tool (DDT) 0.9.2 Beta. The resulting metadata and surrogate source-data aggregates are used to support community song revitalisation initiatives (e.g., here) that serve to repair ruptures in intergenerational knowledge transmission, and serve to enrich the archival record by linking collections at the level of recorded song item. The DDT seeks to consolidate and optimise data for use in communities and by researchers.
    Collaborative research with data management specialists is now needed, to:
    AIM 1. Critically review and revise the design and documentation of the DDT 0.9 Beta for public, open access release.
    AIM 2. Explore and test the tool so it can be used to populate commonly used community content management systems such as Keeping Culture KMS and Mukurtu CMS.
    AIM 3. Explore how the tool can be used to communicate linked metadata back to archival databases.
    These serve future priorities of identifying a fair/open platform for the DDT and scopeing correspondence of the design thinking behind the system with other data environments in the university.

  • A Fair Day's Work: Detecting Wage Theft with Data

    Chief Investigator: Professor John Howe (Law)
    MDAP Collaboration Lead: Dr Kristal Spreadborough

    The research team seek support to develop a set of data science driven tools to prevent the underpayment and exploitation of young workers (frequently referred to as 'wage theft'). The team are specifically looking for MDAP support in accessing and managing relevant datasets, and to prepare for the development of a new dataset, which will involve the use of natural language processing.
    Young people (~15-24 years old) are especially vulnerable to wage theft, due in part to issues such as: a culture of wage theft in industries where young people make up the majority of employees; a lack of awareness among young employees of their workplace rights; reluctance to complain about exploitation because of fear of retribution, combined with lack of resourcing for proactive detection of non-compliance by the regulator. This last point, in turn, makes it difficult for regulators, unions, and other organisations to detect wage theft, let alone address it.
    To address the disadvantage wage theft causes, the team proposes a multi-pronged approach that aims to first and foremost, support young people at risk of wage theft, while also providing data for regulators, policymakers and business to drive system change. The project will draw upon cross-disciplinary expertise in labour law and regulation, digital design, information science, UX design, data analysis and data ethics to design/develop three interlinked components: The Fair Day's Work portal; a Wage Theft Database and finally a Wage Theft Prediction Tool. At the core of these three components is developing a wage theft database from public and private datasets, and through a new dataset collected from employees and unions via the Fair Day's Work portal.

  • Constraining the thermal evolution of Earth's crust through machine learning: Development of fully automated digital fission track analysis

    Chief Investigator: Dr Samuel C Boone (Science)
    MDAP Collaboration Leads: Dr Noel Faux & Usha Nattala

    Fission track thermochronology is a temperature-sensitive geological dating technique that provides unparalleled insights into the thermal and tectonic evolution of the Earth’s crust with widespread applications. The method is based on the formation of radiation damage zones, called fission tracks, from the decay of 238U in natural minerals. The retention and length of fission tracks are sensitive to temperatures common in the shallow parts of the Earth’s crust (up to 5km depth). Thus, by quantifying the number and length distribution of fission tracks through digital microscopy and determining the 238U content via mass spectrometry, the detailed thermal history of a rock can be determined.
    Since 2003, the Melbourne Thermochronology Research Group has developed Fission Track Studio (, an image analysis software suite that has brought significantly increased automation to the laborious collection and analysis of fission track data. However, persistent difficulties remain in identifying certain classes of tracks and a considerable degree of analyst review is still required to correct for deficiencies in the present algorithms.
    This team wishes to collaborate with MDAP to develop a radically new approach to digital fission track analysis based on machine learning. Using an existing database of more than 30,000 photomicrograph stacks with corresponding expert-reviewed image sets, an artificial neural network approach will enable fully automated fission track image analysis to be achieved. Such an advance will revolutionise the methodology by significantly reducing analytical time, removing the influence of observer bias and allowing the wider, non-specialist geoscience community to utilise this powerful technique.

  • Forecasting Bushfire Risk: Integrating a new ground-based sensor network, remote sensing, and weather data to forecast forest fuel dryness

    Chief Investigator: A/Prof Gary Sheridan (Science)
    MDAP Collaboration Lead: Usha Nattala

    Bushfires in southern Australia have resulted in more than 825 deaths and destroyed more than 7400 houses since 1901. The economic cost of the 2020 bushfire season alone was estimated to be $100 billion. The “dryness” of leaves and litter (called fuel) within the forest is a strong determinant of the risk of bushfire, however this dryness is very difficult to adequately predict. This project aims to develop new tools for Australian bushfire managers to forescast and map fuel dryness and in doing so, better anticipate where, when, and how bushfires and prescribed burns will burn. This will save lives, protect property, and improve the allocation of limited firefighting resources.
    This team's novel approach will develop machine learning algorithms to integrate a world-first network of 34 real-time “fuel dryness” sensors in forests, with spatial and temporal remote sensing and meteorological predictor variables, to forecast fuel dryness at high resolution across vast forested landscapes. The research group is an international leader in this research area, developing both biophysical and remote sensing-based models of fuel dryness. The team seeks to collaborate with MDAP (who will provide the technical knowledge of ML model development, and the computational resources to develop, train, and interact with new models) in the exploration and development of data-based models for landscape fuel dryness.

  • Machine learning approaches for delineation of bankfull stream channel dimensions from LiDAR data

    Chief Investigator: Dr Kathryn Russell (Science)
    MDAP Collaboration Lead: Jonathan Garber

    Catchment urbanisation has profound effects on the physical form and functioning of stream channels, with far reaching economic, ecological and social implications for our cities and suburbs. In this overarching project, being undertaken in partnership with Melbourne Water (, the aim is to develop statistical models for the expected extent and severity of stream channel change relative to the level of catchment urbanization across the Greater Melbourne region. The outcomes will assist stream managers to plan for geomorphic change, and to develop new ways of protecting streams from catchment urbanization.
    A key input to this model is stream channel dimension data, which is challenging to gather across broad scales despite good coverage of LiDAR topographic data. Existing datasets are incomplete, low-quality and non-reproducible, and current methods to extend and improve them are labour-intensive, severely limiting our modelling. The team proposes a collaboration with MDAP to explore machine learning methods to identify bankfull channel extents from LiDAR digital elevation models.
    If a machine-learning method can perform comparably to current methods, the potential research impact is considerable, both on this project and on river research globally. This collaboration may to lead to improved coverage and quality of channel dimension data, and hence improved models of pressure-response relationships in stream geomorphology (both here in Melbourne and worldwide). Ultimately, these advances are expected to lead to better stream protection, management and planning.

  • Visualising networks and mobilities in the architecture profession

    Chief Investigator: Professor Julie Willis (ABP)
    MDAP Collaboration Lead: Dr Emily Fitzgerald

    This project seeks to visualise the movements and migrations of architects within and across the British Empire from the mid-nineteenth to the mid-twentieth century. At the heart of this project are the questions: ‘Where did architects come from and go to?’ and 'Where did they work?'
    To date, architectural histories have largely been grounded in a single place, disregarding architects’ movements across jurisdictions. But architects have long been highly mobile professionals. Their careers, then as now, could span the globe. Nor were their journeys simply a trip from centre to periphery. Indeed, a good number of colonial architects had careers which spanned Australasia, East and Southeast Asia, Africa, and the United Kingdom. And in these places, they could work for multiple entities – themselves, other firms, public agencies – making their careers complex journeys.
    The project uses disparate, primarily textual, sources – trade directories, newspapers, and other archival material – to trace the movements of hundreds of architects through various architectural firms, and through the colonies and concessions of the British Empire over the course of a century. By using database and visualisation tools, the project will enable new ways of understanding architectural history, new methods for synthesis and analysis of large and disparate data sets in design histories and new interfaces for the presentation of complex historical datasets that involve different geographical locations and movements, over long time frames in varying professional configurations.

  • Community-driven data initiatives for preventing HIV in Indonesia

    Chief Investigator: Dr Benjamin Hegarty (Arts)
    MDAP Collaboration Leads: Dr Kristal Spreadborough & Priyanka Pillai

    This project aims to investigate the processes and practices of data collection and use in efforts to prevent and treat HIV in Indonesia among vulnerable populations, with a focus on 'community-driven data initiatives'. Drawing on the interdisciplinary expertise of anthropologists, data scientists and epidemiologists/public health experts in Indonesia and Australia, this project hopes to inform emergent paradigms of governing HIV through data. The outcome is anticipated to be enhanced capacity to understand the social and cultural impacts of data-driven forms of health governance. Benefits include new forms of data visualisation, and more collaborative ethical protocols for collecting and using health data collected in the course of HIV prevention and treatment activities. The larger project of which this collaboration forms part will be based on two case studies that investigate data collection as it relates to two common categories: “men who have sex with men” (MSM) and “housewives” (ibu rumah tangga). Tracing how “MSM” and “housewives” are quantified and visualized through data collection – from the community to the epidemiological level – provides the opportunity to understand the visibilities and invisibilities that data entails. The team will work with Indonesian counterparts to look at data in two ways: 1) processes of data collection and analysis. They will work with epidemiologists and other program workers at a range of sites, including civil society organisations, regional departments of HIV, national department of health. 2) the effects of data as it is used in politics and policy. They will investigate the role of data in shaping responses to HIV, including data visualisation. The MDAP collaboration of which this project is part, only focuses on studying and improving community-driven data initiatives developed by/for MSM communities. The project builds on ongoing collaborations with these communities by CI Hegarty, Davies and Praptoraharjo.

  • Tackling the canine microbiome in chronic enteropathy: characterising the functionally significant changes that occur with remission of disease

    Chief Investigator: Professor Caroline Mansfield (FVAS)
    MDAP Collaboration Lead: Dr Noel Faux

    Dogs are the most popular companion animal in Australia, with over 4.2 million pet dogs. Gastrointestinal diseases are commonly diagnosed in dogs, with chronic enteropathy (CE) a group of disorders that causes gastrointestinal tract inflammation. Although the exact cause is not known, dysregulation of the resident gut microbiota has been implicated in development and/or exacerbation of CE.
    Microbiomic community profiling using universal biological markers (primarily the 16S rRNA gene) coupled with high throughput sequencing has been used to assess bacterial phyla in a wide variety of vertebrates, including dogs with CE. Although this approach is powerful, it has to date demonstrated few consistent biological differences associated with canine CE, other than a general loss in species richness and diversity. However, CE is largely an inflammatory condition and a major limitation with current research is that it examines all bacteria rather than only those bacteria recognized by the host immune response. Additionally, it is the functional capacity of these organisms (determined by their gene repertoires) and not their taxonomic identity that defines their niche within the microbiome. It may well be that despite variation in the taxonomic structure of the microbiome among individuals the functional niches occupied by these various species may be static and consistent. This longitudinal clinical study will functionally characterise the faecal microbiome through an integrative ‘omics approach: metagenomics, transcriptomics and metabolomics and bacteria coated with immunoglobulins. This will provide a better understanding of the significance of the changes of the microbiota and potentially identify therapeutic targets.

  • Machine Learning and Patterns of Primary Care Utilisation in Cancer Diagnosis and Outcomes

    Chief Investigator: Professor Jon Emery (MDHS)
    MDAP Collaboration Leads: Dr Mar Quiroga and Zaher Joukhadar

    The Victorian Comprehensive Cancer Centre-funded Data-Driven Research Program was the first in Australia to link large-scale primary care and hospital data for the purpose of enabling increased capabilities in health services research. Until now, traditional analytical and statistical methods have been utilised to examine and map patterns in primary care attendances and how these are associated with cancer diagnoses, treatment and outcomes. These are generally based on the existing evidence base to test prevailing hypotheses and apply these to the local context.
    This project seeks to apply novel analytical and machine learning methods to the large-scale linked data sources in order to generate new hypotheses and further characterise how people with cancer engage with primary care services. Specifically, this project would aim to identify patterns in various aspects of primary cancer attendances prior to a definitive cancer diagnosis (as identified in linked data sources). These could include prescribing, test requests/results, semi-coded fields such as ‘reason for encounter’ and co-morbidities/other conditions captured in primary care management systems.
    The project would identify a specific cancer type (i.e. Upper Gastrointestinal) with sufficient linkages between hospital diagnosis and primary care data and apply tools and algorithms to detect patterns present within specific timeframes of diagnosis or treatment.

  • Social Media Disinformation and the Papua Conflict: an Indonesian Language Investigation

    Chief Investigator: Dr Dave McRae (Arts)
    MDAP Collaboration Leads: Dr Daniel Russo-Batterham & Kim Doyle

    Online discussion of the conflict for independence in Indonesia's two easternmost provinces – hereafter the Papua conflict – is highly fractious. The Indonesian-language online space is especially contested. Actors posting online in Indonesian occupy a complex spectrum of positions ranging between full support for the Indonesian government or for Papuan independence. Increasingly, contention between these actors includes disinformation tactics, harassment of those criticising and scrutinising the Indonesian government, and – on occasion – internet shutdowns to obstruct the free flow of information.
    The increase in disinformation and pro-government interference in online discussion of the Papua conflict accords with what scholars have identified as a broader regional illiberal turn in the conduct of contentious politics via social media. Existing analyses have mapped the distribution of pro-government material pertaining to the Papua conflict by inauthentic accounts and coordinated campaigns in English and in Dutch. It is intended to extend these analyses by examining, what accounts are posting pro-government messages in Indonesian in social media, and what is the nature and content of these posts. By focusing on Indonesian language materials, the team seeks to understand how pro-government actors shape debate on the Papua conflict within Indonesian society – where the outcome of the Papua conflict will ultimately be decided – rather than scrutinising attempts to shape international perceptions of Papua.

  • Associate: Encoding manuscripts as primary research objects

    Chief Investigator: A/Prof Nick Thieberger (Arts)
    MDAP Collaboration Leads: Dr Daniel Russo-Batterham & Robert Turnbull

    Professor Thieberger is working on several Text Encoding Projects and would benefit from extending work begun with Robert Turnbull and Daniel Russo-Batterham. The pages at are an example, and he now has another project (with Simon Musgrave at Monash) working to prepare Heath's grammar, texts, dictionary, and media of the Aboriginal language Nunggubuyu and put it online. In SCIP/Digital Studio he is working with Birgit Lang (German) on a manuscript that she has transcribed, a diary of a Freudian analyst that has not been published before. With a DS intern the transcript has been encoded as a first draft TEI document and it is now ready for final editing. However, each of these projects needs a IIIF server for the images, and some assistance with the TEI, that Robert and Daniel can provide.

  • Digitally benchmarking public attitudes to secondary use of health data, with NLP data extraction from general practices as a case study

    Chief Investigator: A/Prof Mark Taylor (Law)
    MDAP Collaboration Lead: Priyanka Pillai

    Significant benefits can flow from re-using people's health data for research, including gaining new insights that can be used to improve preventive healthcare, diagnosis and treatment. Such data is sensitive, however, and individuals may have concerns about its secondary use even when data has been de-identified. Secondary use without appropriate consultation and justification has sometimes created public scandal that damages trust. We need better ways to understand people's views about different types of secondary use of health data so that the public interest can be better served and trust protected.
    This project has two related aims:
    The first is to establish a robust electronic method to gather evidence of public attitudes regarding secondary uses of health data. The team will scope and evaluate alternative electronic means of gathering and benchmarking public attitudes toward specific secondary uses of health data. Alternatives will be evaluated for cost-effectiveness, issues of inclusivity, representation and equity, and their ability to provide useful insight into public attitudes toward secondary uses of data. The most promising method will be tested in practice in relation to the second aim.
    The second aim is to gain insight into public attitudes to data extraction from medical records for primary care research using different natural language processing (NLP) tools. Tools will be tested according to: (1) location (2) robustness of de-identification (3) types of health data. This will provide insight into public acceptability of working with industry-leading medical annotation companies that require medical text to be sent to on-line services.

  • Accelerating large dimensional stochastic simulation models

    Chief Investigator: Dr Aaron Dodd (Science)
    MDAP Collaboration Leads: Dr Edoardo Tescari & Dr Mar Quiroga

    Bioeconomic simulation models are a critical tool used to estimate both the impacts that might be caused by pests or diseases and the relative value of the various interventions that we deploy to manage them. Typically, these models describe the impacts of a single species on a single asset at relatively small spatial and temporal scales. Management agencies, however, are required to take an ‘all hazards’ approach when determining how they might best protect assets from the negative impacts of pests and disease at state and national scales. The ‘value model,’ developed at the University of Melbourne, is the only model globally that is capable of simultaneously modelling the arrival, spread and impact of multiple biological hazards on multiple assets over time. However, in order to be effectively deployed within management agencies, or expanded for additional research use, the core architecture of the model needs to be faster. This project will explore a range of options for improving the performance of the model spanning: how the dispersal of organisms is modelled, how the model is encoded, how compute resources are utilised and how the result data is stored. Improvements in any (or all) of these areas will enable uptake of the model into real-world decision-making contexts ultimately delivering improved biosecurity outcomes for society.

  • The Heart

    Chief Investigator: Dr Robert Walton (FFAM/MSE)
    MDAP Collaboration Leads: Zaher Joukhadar

    Can a building have a heart? Does a building feel and can it dream? Imagine if the life of Melbourne Connect (MC) as a building and a community could be revealed through the collection and visualisation of omputational data. We are creating a major permanent digital artwork called ‘The Heart’ for the foyer of MC that visualises what the building is ‘feeling’ through its thousands of live data gathering sensors. The Heart will beat for the duration of the building’s life – at least 42 years until the end of the current lease. Its pulse is extrapolated from the sensations of its body: the ‘smartest’ building in Parkville. The project is well underway and is being developed through the collaboration of a broad coalition of university researchers, professional staff from FFAM and MSE, students, leading external craftspeople, and the MC architects and builders. The artwork reveals building functions that are normally hidden ‘offstage’ to bring to mind the volume of data and work supporting the life and optimal functioning of the University community. The pulse of MC is derived from Building Management System data combined with electricity generation (solar and geothermal), external temperature and wind direction, zoned data and power usage, human movement and behaviour. We envision MDAP helping us complete the data pipeline for the project, using Machine Learning/AI to extrapolate an evolving, live ‘pulse’ from realtime building data including human heart rate monitors in the foyer.


  • The creative industries in days of isolation - a fast pace shift to making, learning and living in a crisis

    Chief Investigator: Dr Kathryn Coleman (MGSE)
    MDAP Collaboration Lead: Dr Kristal Spreadborough

    We are witnessing a cultural and pedagogical evolution. The current global crisis triggered by COVID-19 has caused an unprecedented societal shift. This has been starkly epitomised within the arts, creative industries, and education sectors – our first responders to societal change. These sectors have seen rapid shifts in their norms, practices, and pedagogies and have had to creatively adapt to ensure their survival and longevity in an environment mediated by physical isolation, social disruption, and health and economic crises. Drawing on a diverse range of big data sources, this research takes a data-led approach to examine the interplay between data as practice and practice as data. It maps the evolution of norms, practices, and pedagogies within the creative industries in preparation for the “new normal”.

  • The megalithic jar sites of Laos

    Chief Investigator: Dr Louise Shewan (Science)
    MDAP Collaboration Lead: Dr Aleks Michalewicz

    This interdisciplinary collaboration involves the integration of data generated from the Plain of Jars Archaeological Research Project: UAV acquired imagery, large point clouds of archaeological sites, photographs of more than 1000 megalithic jars, photogrammetry models of burials, analyses of human remains (isotopic, ancient DNA, osteological, synchrotron Micro-CT), artefact conservation and analysis (ceramic, beads, metals), 3D models of material culture, dating analyses, and the geological characterisation of jar and quarry samples. Work with MDAP involves data management and curation, virtual/augmented reality and machine learning capability to create educational/research offerings and to facilitate ongoing archaeological research in inaccessible environments contaminated with unexploded ordnance (UXO), visualisation proficiency to share project outcomes, image and video processing, data analysis, and innovative web development to ensure open access to dynamic and unique datasets.

  • Causality in Complex Dynamical Systems: Implementing Core Algorithms in a Distributed Computing Environment

    Chief Investigator: A/Prof Michael J. Zyphur (FBE)
    MDAP Collaboration Lead: Dr Edoardo Tescari

    This project will revolutionize the ability of social and physical scientists to characterize and assess causal effects in complex dynamical systems. These systems are difficult to study using traditional methods including experiments and model-based regressions because their assumptions are easily violated by nonlinear system evolution and causal effects. The broad aim of this project is to implement distributed computing methods for analyzing data from complex dynamical systems. The project will implement two core empirical dynamic modeling (EDM) algorithms on a distributed computing platform with a manifold learning method for data imputation and denoising.

  • Automating image segmentation and morphometric analysis for accelerating neuroscience at the nanoscale

    Chief Investigator: Dr Calvin Eiber (MDHS)
    MDAP Collaboration Lead: Zaher Joukhadar

    The aim of this project is to develop automated image processing tools to identify and measure nerve cell axons in electron microscope (EM) images. We aim to do this in three stages: by implementing existing algorithms for the automatic segmentation and analysis of myelinated axons from conventional transmission EM images, by developing new algorithms to handle unmyelinated axons, and by developing algorithms and data handling processes for the automatic processing of 3D tissue volumes generated by cutting-edge EM techniques like serial block-face scanning electron microscopy (SBF-SEM). The broader goals are to improve our understanding of how the brain communicates with the organs of the body, as well as how different brain regions communicate with one another, in order to support the growing field of bioelectronic medicine.

  • An Empirical Study of How the US Termination Right Operates (part of the Author’s Interest Project)

    Chief Investigator: A/Prof Rebecca Giblin (Law)
    MDAP Collaboration Lead: Dr Daniel Russo-Batterham

    The Author’s Interest Project is investigating how changes to copyright law could help increase the opportunities authors have to profit from their work, whilst improving public access to knowledge and culture. We’re particularly interested in reversion rights, which see copyrights returned to authors after a certain amount of time or when they are no longer being exploited. More than half the world’s nations have statutory reversion rights, but their effects are woefully under researched. This sub-project is to investigate the operation of a US reversion law that allows authors to terminate copyright grants 35 years after they are made. There have been no in-depth studies investigating how often this scheme is used, by who, and with what effects. Using the publicly available US Copyright Office registration database, the Project intends to fill this gap, contributing rigorous new evidence to debates about what kind of rights can help get authors paid and books read.

  • Mapping our emotional lives: Building a searchable database of experience sampling data on emotional processes

    Chief Investigator: Dr Elise Kalokerinos (MDHS)
    MDAP Collaboration Lead: Dr Daniel Russo-Batterham

    Experience sampling involves following participants into their everyday lives using smartphones, taking multiple measures in real-time to get a more meaningful picture of psychological processes. Experience sampling methods are central to understanding both normal and disordered emotional responding: they allow us to capture strong and meaningful emotional events that can’t be recreated in the lab, to map dynamic fluctuations of emotions across time, and to study how these processes contribute to psychological well-being. However, conducting experience sampling studies is costly, time-intensive, and requires expertise. In addition, research using these methods generates large, rich datasets, which are often not used to their fullest potential. To reduce the barriers to using experience sampling data, and to harness the reanalysis potential of these datasets, we are building an open-access, searchable, and cumulative database of our existing experience sampling datasets.

  • Natural Language Historical Mapping

    Chief Investigator: Dr Mitchell Harrop (Arts)
    MDAP Collaboration Lead: Dr Emily Fitzgerald

    The Melbourne History Workshop ( in the School of Historical and Philosophical Studies is working on an Australian Research Council Linkage Infrastructure, Equipment and Facilities grant on the challenge of Time-Layered Cultural Mapping ( TLCMap is an open online facility to create maps and timelines and combine new Australian cultural and historical datasets with existing ones. The project is developing infrastructure around the Gazetteer of Historical Australian Placenames (GHAP) API and Recogito ( This includes Named Entity Recognition, time recognition, Sentiment Analysis, and other common text processing techniques. We will be working with MDAP to apply these techniques to existing datasets of Encyclopedia of Melbourne entries (, historical news stories from the National Library’s Trove (, podcasts (, tweets and three photographic collections. The application of these techniques will result in the improvement of the original datasets for use by amateur and professional researchers.
    Our methods will be documented in a cookbook-style publication, leveraging the strong background of the MDAP workforce in pedagogy and translating highly technical research methods to an audience that includes both Graduate Researchers and even some amateur historians.

  • Early Modern Women Translators in Europe (15th-18th c.)

    Chief Investigator: Prof Véronique Duché (Arts)
    MDAP Collaboration Lead: Bobbie Shaban and Dr Emily Fitzgerald

    Building on a previous research project (Histoire des Traductions en Langue Française, XVe- XVIe siècles, ANR project – equivalent to ARC – finished in 2015), chapters published (Duché, 2015, 2016, 2017) and papers given at international conferences (RSA, SRS, 2018), this project will further investigate Early modern Women translators in Europe. It seeks to promote cultural awareness by charting the contours and transnational connections of early modern women translators through the use of a sophisticated, interactive visualisation tool. In particular, the project aims to:

    • identify the early modern women translators (not only in France, but also in other European countries);
    • map the intellectual as well as personal and cultural networks to which these women belong (author, publisher, reader etc.);
    • create a repository for metadata on early-modern scholarship.

  • Unlocking published metagenomes as a source of information for microbial eukaryotes

    Chief Investigator: A/Prof Heroen Verbruggen (Science)
    MDAP Collaboration Lead: Dr Mar Quiroga and Bobbie Shaban

    Microbial eukaryotes play key roles in earth’s ecosystem and human lives, but knowledge of their biodiversity and evolution are very incomplete. This collaborative project between MDAP, the Faculty of Science, Faculty of Medicine, CSIRO and Royal Botanic Gardens Melbourne aims to bring big data to this field of research, advancing our knowledge of microbial eukaryotic biodiversity and evolution by extracting their organelle genomes (mitochondria & chloroplasts) from publicly available metagenomes. Through this project, we aim to achieve: (1) A fully functioning pipeline for automated download of metagenome reads, their assembly and calculation of contig statistics, (2) A large library of assembled metagenomes generated with this pipeline, permitting testing of alternative strategies for organelle genome binning, (3) Advanced proof of concept for externally funded project proposals based on these ideas.

  • Development of the PolyMuse Database, to map conservation issues with polymer based plastics in museum collections

    Chief Investigator: Dr Petronella Nel (Arts)
    MDAP Collaboration Lead: Karen Thompson

    PolyMuse is a collaborative Australian Research Council funded project between four universities and six heritage institutions, aiming to develop methods for extending the lifespan of vulnerable polymer based plastics in heritage collections. Data is being collected from different locations and collated into a relational database, with the ultimate aim of aiding analysis and providing access to heritage professionals. This MDAP collaboration will create a "universal-adapter" for bringing data already collected into the database, and to build a streamlined data collection process for future data collection.

  • Machine Learning and Patterns of Primary Care Utilisation in Cancer Diagnosis and Outcomes

    Chief Investigator: Professor Jon Emery (MDHS)
    MDAP Collaboration Lead: Zaher Joukhadar and Dr Mar Quiroga

    The Victorian Comprehensive Cancer Centre-funded Data-Driven Research Program was the first in Australia to link large-scale primary care and hospital data for the purpose of enabling increased capabilities in health services research. Until now, traditional analytical and statistical methods have been utilised to examine and map patterns in primary care attendances and how these are associated with cancer diagnoses, treatment and outcomes. These are generally based on the existing evidence base to test prevailing hypotheses and apply these to the local context. This project seeks to apply novel analytical and machine learning methods to the large-scale linked data sources in order to generate new hypotheses and further characterise how people with cancer engage with primary care services. Specifically, this project would aim to identify patterns in various aspects of primary cancer attendances prior to a definitive cancer diagnosis (as identified in linked data sources). These could include prescribing, test requests/results, semi-coded fields such as ‘reason for encounter’ and co-morbidities/other conditions captured in primary care management systems.
    The project would identify a specific cancer type (i.e. Upper Gastrointestinal) with sufficient linkages between hospital diagnosis and primary care data and apply tools and algorithms to detect patterns present within specific timeframes of diagnosis or treatment.

  • Evaluation of automated deidentification of hospital and general practice health records

    Chief Investigator: Professor Karin Verspoor (MSE) and A/Prof Douglas Boyle (MDHS)
    MDAP Collaboration Lead: Dr Simon Mutch and Dr Noel Faux

    Text analysis is an emerging field in health data research. Use of textual health data for research purposes has been limited in Australia to date, due to concerns around the personal identifying information they may contain. The aim of this project is to develop, test and evaluate methods to deidentify textual data sets from electronic medical records in both hospital and general practice settings so these data can be securely extracted for further analysis using natural language processing. The project is a collaboration between the School of Computing and Information Systems (Engineering), the Department of General Practice, the School of Population and Global Health, and the Centre for Digital Transformation of Health (MDHS). The results of this project will facilitate larger scale use of clinical textual data for research purposes in Australia and will provide opportunities for future research with world-class collaborators within the University and externally.

  • Using novel analytic methods to predict and prevent falls and fall-related injuries in older persons with dementia.

    Chief Investigator: Professor Gustavo Duque (MDHS)
    MDAP Collaboration Lead: Dr Noel Faux

    Six million Australians will be diagnosed with dementia in the next 20 years, at a cost of more than $1 trillion. One third of them will die due to traumatic falls, which are, in the majority, predictable and preventable. Although most of the larger longitudinal studies on ageing have included musculoskeletal and cognitive variables that could help to predict and prevent falls in people with dementia, no studies have developed accurate predictive tools to identify those at risk of falling. We propose the development of transdisciplinary links between several major databases including the Canadian Longitudinal Study on Aging (across Canada), the Quebec Longitudinal Study on Nutrition and Aging (Quebec, Canada), the Women’s Healthy Ageing Project (Metropolitan Melbourne), and the Geelong Osteoporosis Study (Geelong, VIC) to combine and compare cognitive and musculoskeletal data from 35,000 participants, and characterise and model the risk for incident falls and fall-related injuries in cognitively impaired and healthy older adults participating in these studies. Our preliminary analyses have demonstrated that Artificial Neural Networks and Factor Mixture Models may improve the performance criteria of fall prediction in older persons compared to classical linear models. We hypothesise that these models could be helpful in the identification of patterns of risk factors for falls and fall-related injuries in older persons with dementia. The results of this strong international collaboration (facilitated by MDAP) will pave the way for the development of new care pathways for people with dementia, including a future application that can be used in clinical practice to screen for fall risk in this population.

  • Melbourne Pollen: Using Machine Learning to predict pollen counts based on vegetation, weather, land use and geographical data

    Chief Investigator: A/Prof Ed Newbigin (Science)
    MDAP Collaboration Lead: Usha Nattala

    Our vision is to help the millions of Australians affected by allergies to airborne pollen and other allergens by discovering, translating and delivering research solutions to help better manage their condition. Our current immediate project goal is to use advanced machine learning algorithms to produce a novel forecast model based on our extensive pollen databases, citizen science responses and various environmental inputs including satellite weather, landuse and vegetation data. Machine learning techniques make it possible for us to deal with large multivariate datasets; set up forecasting processes that are capable of identifying and extracting patterns in data without explicit instructions; continuously learning and improving as they go along without the need for excessive human intervention

  • Rural land use classification algorithms for emergency animal disease responses

    Chief Investigator: Dr Simon Firestone (FVAS)
    MDAP Collaboration Lead: Jonathan Garber

    In an emergency animal disease situation, such as an outbreak of African Swine Fever in Australia, accurate classification of current land uses for different types of farming are needed to facilitate an effective response. The aim of this project is to leverage high resolution aerial imagery with state of the art image classification algorithms in order to identify the types of livestock being raised on a farm. Our goal is to build an artificial intelligence pipeline using high resolution imagery stored in mediaflux, and the GPU compute power of the Spartan HPC, in order to predict the land use of a given farm image. Additional GIS layers will be added to the predictive model to improve the accuracy. The output of this collaboration will be used by stakeholders in order to automatically map the geospatial risks and aid in the response to an animal epidemic.

  • Introducing new analysis modalities to the Stemformatics Stem Cell Atlas

    Chief Investigator: Professor Christine Wells and Dr Jarny Choi (MDHS)
    MDAP Collaboration Lead: Priyanka Pillai services a local and international community of stem cell researchers, who use the resource to share and interrogate stem cell studies. It is an online portal associated with a database of curated stem cell experiments. The collaboration with MDAP will enhance the capability of the Stemformatics atlas by creating new modalities for analysis and visualisation of data. This involves mining the data to find biologically meaningful features in sample space, developing new visualisation prototypes and query wizards pertinent to these, as well as improving the sample annotations in our databases to enable more powerful data analyses.

  • Sustainable Energy Advocacy Coalitions

    Chief Investigator: Dr Alfonso Martinez Arranz (MSE)
    MDAP Collaboration Lead: Kim Doyle

    Climate and energy policy is a highly divisive issue in Australian politics; however, beyond partisan caricatures, little is known about the exact nature of the divisions or their boundaries. This project addresses this gap in the scientific understanding of socio-political divisions around energy policy. The project adopts a data-intensive methodology structured in five parts: online data mining, quantitative surveys, semi-structured interviews, Delphi-method survey, and community meetings.


  • Forecasting global diversity in a changing world, Dr Payal Bal, Science

    Although the effects of threats to biodiversity can be characterised at local scales, linking this highly resolved, local knowledge to emergent patterns at the global scale is a massive conceptual and computational challenge. This collaboration aims to build an integrated data and analysis pipeline that will set a gold standard for the integration of computationally intensive analyses and data in ecology, including alternative climate and socio-economic scenarios at annual, decadal and millennial timescales, thereby identifying current and future hotspots of nature-land use conflict.

  • Using 6 billion social media posts to understand gendered hate speech, domestic violence and mental illness, Dr Khandis Blake, MDHS

    Having already established that social media posts can be used to test predictions about the effects of city, state, and country-level variables (eg. income inequality) on gendered behavior, this collaboration will interrogate an eight-year dataset of ~6 billion geolocated tweets to understand how sentiment on social media can predict and reveal real-world problems through the creation of a database that can be queried by tweet text, username, location, and date.

  • Sixth coupled model inter-comparison project climate model data analysis and visualisation, A/Prof Malte Meinshausen, Science

    The world's most comprehensive climate models are currently making projections of 21st Century climate as part of the Sixth Coupled Model Inter-comparison Project (CMIP6). These projections will inform climate science and decision making over the next decade and beyond. This collaboration aims to reformat this 10PB of specialised binary format data so that it can be easily used with widespread data analysis tools.

  • Ready, set, go LawTech: unleashing the power of information extraction in the legal domain, Prof Jeannie Paterson, Law

    Whilst freely available digital legal information (eg. most Australian court judgments since the 1980s) exists, it is generally not in a machine-readable form. And legal judgments are often very lengthy and its time-consuming to extract the pertinent sections. This forms a barrier to the bulk analysis of judgments that would enable the study of overarching trends and patterns. This collaboration aims to use Natural Language Processing to train a machine learning model to automatically extract information from legal judgments.

  • Vote compass: understanding campaign dynamics and political representation using data from a voter literacy platform, Dr Aaron Martin, Arts

    The voter literacy platform Vote Compass has data collected during the 2013, 2016 and 2019 federal elections in excess of 1 million responses from each election campaign. This collaboration will use this data to conduct in-depth research on campaign dynamics, political representation and political geography.

  • Machine learning for clinical and preclinical diagnosis of neurodegenerative disease using speech data, Dr Benjamin Schultz, MDHS

    Only Speech changes with altered brain function. Acoustic analysis of speech provides objective data on these changes. Acoustic features differ between healthy speakers and individuals with disease and evolve over the course of disease. On this basis, sophisticated speech biometrics can act as a proxy for brain integrity and may assist in optimising diagnostic pathways or identifying symptom onset in neurodegenerative diseases. The role of the Melbourne Data Analytics Platform is to liaise with the Neuroscience of Speech team and facilitate the development of machine learning algorithms using open source software.

  • Supporting Indigenous governance of Indigenous data: The Indigenous Data Network and the Kaiela Institute, Dr James Rose, MDHS

    The Indigenous Data Network (IDN) at the Indigenous Studies Unit, Melbourne School of Population and Global health, is an unprecedented national initiative aimed at providing support to Indigenous-controlled Research Organizations (ICROs) around the Australia. The IDN is assisting ICROs to consolidate, organise and leverage the data assets already in their possession, towards better evidence-based service delivery for the communities that they represent. Among numerous ICROs partnered with the IDN, the Kaiela Institute in Shepparton, Victoria is the first in that state to successful coordinate a practical governance strategy that brings these support streams to an implementation stage. Together with the Kaiela Institute and the IDN, MDAP data stewards and researchers will contribute to the development of a data governance platform for the Kaiela Institute, that will allow its Indigenous researchers to hold, control, and process data in the interests of better service delivery to their communities.