Combining active learning and data augmentation to reduce labelled training data for sentiment analysis

Colton Aarts

Summary

Abstract	Abstract Creating a sentiment analysis classifier requires a large amount of labelled training data. Labelling this data is an expensive and time-consuming process. Because of this, reducing the amount of labelled data required leads to classifiers that are cheaper to train and more accessible to all disciplines. Many different methods can be used to reduce the amount of labelled data. For this research, we focused on combining active learning and lexical expansion techniques. By combining these two techniques, this research examined an underutilized area of study. Active learning focuses on letting the classifier select the data to learn from, while lexical expansion creates more data for the classifier. While there are a larger number of different techniques in both fields, there is little work to be done to combine them. We felt this was a natural progression for these techniques as they complement each other well. The active learning technique will select the data to be labelled, and the lexical expansion technique will generate high-quality artificial data from this hand-selected information. In addition to combining these techniques, we examined how different neural network structures would interact with our new technique. Our research found that the combination of active learning and lexical expansion improved the performance of our classifiers for very small amounts of data. We found a significant difference between the performance of our two classifiers. While there was an improvement at low levels of training data, at higher levels, we found that the combined techniques did not offer any improvements over the active learning technique. Overall, we found potential benefits to combining the two techniques and that future research is required to understand further how to leverage these improvements best.
Persons	Persons Author (aut): Aarts, Colton Thesis advisor (ths): Jiang, Fan Degree committee member (dgc): Chen, Liang Degree committee member (dgc): Monu, Kafui
Degree Name	Degree Name Master of Science (MSc)
Department	Department Computer Science
DOI	DOI https://doi.org/10.24124/2025/30511
Collection(s)	Collection(s) Dissertations and Theses

Origin Information

Date Created/Date Issued	2025-04-16
Publisher	University of Northern British Columbia
Issuance	monographic

Organizations

Degree granting institution (dgg): University of Northern British Columbia. Computer Science

Degree Level

Masters

Resource Description

Extent	Extent 1 online resource (ix, 87 pages)
Digital Origin	Digital Origin born digital
Content type	Content type Digital Document
Resource Type	Resource Type Text
Genre	Genre thesis
Language	Language English

Access and Rights

Access Conditions	Access Conditions open access
Use and Reproduction	Use and Reproduction Author
Rights Statement	Rights Statement IN COPYRIGHT
Use License	Use License Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Language	English
Name	Combining active learning and data augmentation to reduce labelled training data for sentiment analysis
MIME type	application/pdf
File size	635533
Media Use	Original File
Authored by	bwillmer
Authored on	2025-07-18

Combining active learning and data augmentation to reduce labelled training data for sentiment analysis

Download

Share