Monitoring Social Media using Machine Learning

Over the summer, professor Keith Vander Linden and Roy Adams from the Computer Science Department conducted a research on the Social License to Operate(SLO). The Social License to Operate (SLO) refers to a community’s level of approval towards a company’s operations. This community could contain stakeholders in the company, a geographic region within which a company operates, etc. A company needs permission from the government to operate, but they also need “social permission” to operate from the community. In Professor Vander Linden and Roy Adams analyzed data from the social media platform, Twitter in order to determine the SLO for different mining companies in Australia. They used a machine learning model to predict whether a particular tweet was in favor, in opposition or neutral to a given mining company.

Most of the work involved managing datasets. In machine learning, two unique datasets are required: one for training a model to classify tweets (a training set) and another for testing how that model is doing (a testing set). The team’s main job was to compile these two datasets.

When Twitter gives a tweet, the tweet is not labeled as for or against or neutral. The team’s training set required the tweet to be pre-labeled when they feed it into a model so that the model could learn how to classify new tweets.

According to the team, it was difficult to do that because training a computer to label tweets was the whole point of the project. The team could hand-label tweets, but that would have been expensive and time-consuming. Instead, Vander Linden and Adams tried to find tweets that they could automatically label as for or against a company. This process is called auto-coding. They devised a set of rules that they thought would return an accurate label, and then they applied those rules to their dataset to build the training set. An assumption they made was that even if they inaccurately labeled a tweet, the number of correctly labeled tweets would outweigh the number of incorrectly labeled tweets.

Because they treated the testing set as a “gold standard” to measure their model’s success, the testing set responded more strongly to errors in the labeling. Instead of using auto-coding to build the test set, they used hand-labeled tweets. They measured the accuracy of their predictions and how unified different coders were in their labeling. To further improve the accuracy of their test set, they removed all tweet instances where there was an inter-coder agreement of zero (in other words, there was no overlap).

The scoring metric that they used on their models is called an F-score. A score between 0 and 1 is calculated, and better results land closer to 1.

“Currently, we are hovering around the 0.6 mark, which we would like to get closer to 0.7. We have discovered that the text a user uses to describe him or herself can be beneficial to determine the stance of a user towards a target, making a difference of 0.1-0.3 in score,”  said Adams. “For example, one tweet in our data set responds to another tweet by energy giant Santos but does not reference the company’s policies. It is difficult to tell what this person thinks about the company from the text alone. However, this user’s profile has ‘Can’t eat coal or drink gas. Water is life!’ in it. This text reveals that this person is likely against coal and gas production, suggesting this tweet is against Santos.”