Creating a sentiment analysis classifier requires a large amount of labelled training data. Labelling this data is an expensive and time-consuming process. Because
of this, reducing the amount of labelled data required leads to classifiers that are
cheaper to train and more accessible to all disciplines. Many different methods
can be used to reduce the amount of labelled data. For this research, we focused
on combining active learning and lexical expansion techniques.
By combining these two techniques, this research examined an underutilized
area of study. Active learning focuses on letting the classifier select the data to
learn from, while lexical expansion creates more data for the classifier. While there
are a larger number of different techniques in both fields, there is little work to be
done to combine them. We felt this was a natural progression for these techniques
as they complement each other well. The active learning technique will select the
data to be labelled, and the lexical expansion technique will generate high-quality
artificial data from this hand-selected information. In addition to combining these
techniques, we examined how different neural network structures would interact
with our new technique.
Our research found that the combination of active learning and lexical expansion improved the performance of our classifiers for very small amounts of data.
We found a significant difference between the performance of our two classifiers.
While there was an improvement at low levels of training data, at higher levels,
we found that the combined techniques did not offer any improvements over the
active learning technique. Overall, we found potential benefits to combining the
two techniques and that future research is required to understand further how to
leverage these improvements best.