Xinyu Fu
An improved system for sentence-level novelty detection in textual streams
Fu, Xinyu; Ch'ng, Eugene; Aickelin, Uwe
Authors
Eugene Ch'ng
Uwe Aickelin
Abstract
Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space model. We present a novel event detection system based on the Incremental Term Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with Locality Sensitive Hashing (LSH). Our system could efficiently and effectively adapt to the changes within the data streams of any new terms with continual updates to the vector space model. Regarding miss probability, our proposed novelty detection framework outperforms a recognised baseline system by approximately 16% when evaluating a benchmark dataset from Google News.
Citation
Fu, X., Ch'ng, E., & Aickelin, U. An improved system for sentence-level novelty detection in textual streams. Presented at 3rd International Conference on Smart Sustainable City and Big Data (ICSSC)
Conference Name | 3rd International Conference on Smart Sustainable City and Big Data (ICSSC) |
---|---|
End Date | Jul 28, 2015 |
Acceptance Date | Jan 1, 2015 |
Online Publication Date | Apr 7, 2016 |
Deposit Date | Oct 14, 2015 |
Publicly Available Date | Apr 7, 2016 |
Peer Reviewed | Peer Reviewed |
DOI | https://doi.org/10.1049/cp.2015.0250 |
Keywords | first story detection, novelty detection, Locality Sensitive Hashing, text mining |
Public URL | https://nottingham-repository.worktribe.com/output/786061 |
Publisher URL | http://ieeexplore.ieee.org/document/7446433/ |
Additional Information | © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works |
Contract Date | Oct 14, 2015 |
Files
An-improved-system-for-Sentence-level-novelty-detection-in-textual-streams.pdf
(463 Kb)
PDF
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search