Skip to main content

Research Repository

Advanced Search

Text Data Augmentations: Permutation, Antonyms and Negation

Haralabopoulos, Giannis; Torres, Mercedes Torres; Anagnostopoulos, Ioannis; McAuley, Derek

Text Data Augmentations: Permutation, Antonyms and Negation Thumbnail


Authors

Giannis Haralabopoulos

Mercedes Torres Torres

Ioannis Anagnostopoulos

Derek McAuley



Abstract

Text has traditionally been used to train automated classifiers for a multitude of purposes, such as: classification, topic modelling and sentiment analysis. State-of-the-art LSTM classifier require a large number of training examples to avoid biases and successfully generalise. Labelled data greatly improves classification results, but not all modern datasets include large numbers of labelled examples. Labelling is a complex task that can be expensive, time-consuming, and potentially introduces biases. Data augmentation methods create synthetic data based on existing labelled examples, with the goal of improving classification results. These methods have been successfully used in image classification tasks and recent research has extended them to text classification. We propose a method that uses sentence permutations to augment an initial dataset, while retaining key statistical properties of the dataset. We evaluate our method with eight different datasets and a baseline Deep Learning process. This permutation method significantly improves classification accuracy by an average of 4.1%. We also propose two more text augmentations that reverse the classification of each augmented example, antonym and negation. We test these two augmentations in three eligible datasets, and the results suggest an -averaged, across all datasets-improvement in classification accuracy of 0.35% for antonym and 0.4% for negation, when compared to our proposed permutation augmentation.

Citation

Haralabopoulos, G., Torres, M. T., Anagnostopoulos, I., & McAuley, D. (2021). Text Data Augmentations: Permutation, Antonyms and Negation. Expert Systems with Applications, 177, Article 114769. https://doi.org/10.1016/j.eswa.2021.114769

Journal Article Type Article
Acceptance Date Feb 19, 2021
Online Publication Date Mar 11, 2021
Publication Date Sep 1, 2021
Deposit Date Mar 15, 2021
Publicly Available Date Mar 12, 2022
Journal Expert Systems with Applications
Print ISSN 0957-4174
Electronic ISSN 0957-4174
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 177
Article Number 114769
DOI https://doi.org/10.1016/j.eswa.2021.114769
Keywords General Engineering; Artificial Intelligence; Computer Science Applications
Public URL https://nottingham-repository.worktribe.com/output/5396261
Publisher URL https://www.sciencedirect.com/science/article/abs/pii/S0957417421002104?via%3Dihub

Files





Downloadable Citations