Bassam Al-Salemi
Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms
Al-Salemi, Bassam; Ayob, Masri; Kendall, Graham; Noah, Shahrul Azman Mohd
Authors
Masri Ayob
Graham Kendall
Shahrul Azman Mohd Noah
Abstract
© 2018 Elsevier Ltd Multi-label text categorization refers to the problem of assigning each document to a subset of categories by means of multi-label learning algorithms. Unlike English and most other languages, the unavailability of Arabic benchmark datasets prevents evaluating multi-label learning algorithms for Arabic text categorization. As a result, only a few recent studies have dealt with multi-label Arabic text categorization on non-benchmark and inaccessible datasets. Therefore, this work aims to promote multi-label Arabic text categorization through (a) introducing “RTAnews”, a new benchmark dataset of multi-label Arabic news articles for text categorization and other supervised learning tasks. The benchmark is publicly available in several formats compatible with the existing multi-label learning tools, such as MEKA and Mulan. (b) Conducting an extensive comparison of most of the well-known multi-label learning algorithms for Arabic text categorization in order to have baseline results and show the effectiveness of these algorithms for Arabic text categorization on RTAnews. The evaluation involves four multi-label transformation-based algorithms: Binary Relevance, Classifier Chains, Calibrated Ranking by Pairwise Comparison and Label Powerset, with three base learners (Support Vector Machine, k-Nearest-Neighbors and Random Forest); and four adaptation-based algorithms (Multi-label kNN, Instance-Based Learning by Logistic Regression Multi-label, Binary Relevance kNN and RFBoost). The reported baseline results show that both RFBoost and Label Powerset with Support Vector Machine as base learner outperformed other compared algorithms. Results also demonstrated that adaptation-based algorithms are faster than transformation-based algorithms.
Citation
Al-Salemi, B., Ayob, M., Kendall, G., & Noah, S. A. M. (2019). Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms. Information Processing and Management, 56(1), 212-227. https://doi.org/10.1016/j.ipm.2018.09.008
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 29, 2018 |
Online Publication Date | Oct 22, 2018 |
Publication Date | 2019-01 |
Deposit Date | May 11, 2020 |
Journal | Information Processing & Management |
Print ISSN | 0306-4573 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 56 |
Issue | 1 |
Pages | 212-227 |
DOI | https://doi.org/10.1016/j.ipm.2018.09.008 |
Public URL | https://nottingham-repository.worktribe.com/output/1854608 |
Publisher URL | https://www.sciencedirect.com/science/article/pii/S0306457318300736?via%3Dihub |
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search