Back in Ancient Greece civic leaders could easily and directly consult the views of people in public spaces like the Agora. So too in Rome, the Forum provided a place to canvas diverse perspectives on issues from new ideas, laws and commercial propositions.

Surveys and focus groups replace the forum but with limitations

While the Agora and Forum worked well previously, with the growth of much larger cities and a move to written culture, finding out what a large number of people think has become more challenging. Two main strategies to date have been popular: you can either find a small but hopefully representative and diverse subset of people – a focus group and use their views to represent those of many others like them or, alternatively, you can ask lots of people via a poll with a simplified phone or online questionnaire which seeks to survey views on specific questions.

Focus groups try to understand the nature of preferences, and causes and seek unsolicited views and unstructured responses whereas Surveys and Polling is often used in quantitative research to try and find answers to specific questions from lots of people and thus predict things like who will likely win the next election, what the economy wide consumer sentiment is like and so on. While we know both of these have benefits, insiders know each also has major flaws and neither is particularly democratic. If you’re influential and an outlier in your views, your views are not likely to be well represented in focus groups and if the issues are complex and types of responses are not immediately obvious, important details may be missed by the crude instrument of the survey.

What if you could combine both approaches and seek input from large numbers of people in an open and unstructured way that suits them and yet still be able to make sense of it and yet develop a way to ensure everyone’s viewpoints are considered when analyzing what everyone thinks.

A new approach offers the best of both worlds

New techniques in data science provide a new way for Governments, Businesses and NGOs to analyze very large numbers of written public submissions on wide-ranging topics and to be able to make better sense of them all in a structured and fair way.

Earlier this year Australia faced some of the worst floods it has faced in recorded history in Sydney as well as up and down the Eastern Coast coast of Australia. Thirteen people died and 4000 homes were destroyed in New South Wales alone. In response, the State Government established an independent inquiry to examine the causes, consequences and impacts of this devastating natural disaster and how better to plan for likely future natural disasters.

The New South Wales Flood Inquiry led by former Police Commissioner Michael Fuller and former Chief Scientist Mary O’Kane has now published its findings in full. As part of the process, the inquiry received over 1,400 detailed submissions from the public, businesses and emergency services organisations impacted by the flooding.

My colleagues and I at League of Scholars assisted the inquiry with independent analytics using a variety of data science techniques including machine-learning-based topic modelling to “read all the submissions” and reveal that within the 1400 submissions there were six key distinct themes. Each of the submissions could be categorized into one of these six topics namely: Homes & Family (698); Water Engineering (414); Emergency Services (191); Planning in light of Climate Change (78); Recovery (54) and the Environment (15).

Further analysis of submissions by location revealed which themes were most relevant for different communities and in different towns. Furthermore, the process identified the most representative submission from each theme, so you can appraise and get a good understanding of the key issues by reading half a dozen of these submissions that are representative of many hundreds of others like them. As well, we tested the level of originality in each submission to see whether there was any plagiarism or campaigning where the same submission was sent by multiple people or cut and pasted from the internet. In this case, nearly all submissions were highly original and so there wasn’t any plagiarism found.

As well, we conducted detailed analytics of flood-related social media — over 50,000 posts that were made during the floods and subsequent cleanup efforts and that revealed the unfolding of events and which news services people most turned to as revealed by link sharing patterns.

We also ran a comprehensive analysis of search traffic data and found more than one hundred flood-related search phrases so for example seven days after the flooding many people were searching for information about landslides and mudslides indicating the risks associated with aftershocks.

A summary of our full findings on submissions, social and search analytics has now been published as part of the impressive Flood Inquiry three-volume report in the appendix.

These approaches combine many of the benefits of large-scale quantitative research initiatives (accurate, representative, large sample sizes, enables predictive analysis) as well as those of qualitative research (enabling responses to be unsolicited, unbiased and unlimited in their scope and can facilitate long and subtle responses to complex issues) into a more unbiased and balanced way to understand many peoples diverse views on complex or subtle topics.

New data science techniques like the ones illustrated above offer Governments, NGOs and companies new ways to analyze large-scale collections of public submissions and thus invite more input from lots of people using a more open-ended and unlimited process that can still distilled into actionable insights for improving policy, processes and products.


Leave a Reply

Your email address will not be published. Required fields are marked *