Osman A. S. Ibrahim
A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections
Ibrahim, Osman A. S.; Landa-Silva, Dario
Abstract
This paper introduces a new weighting scheme in information retrieval. It also proposes using the document centroid as a threshold for normalizing documents in a document collection. Document centroid normalization helps to achieve more effective information retrieval as it enables good discrimination between documents. In the context of a machine learning application, namely unsupervised document indexing and retrieval, we compared the effectiveness of the proposed weighting scheme to the 'Term Frequency - Inverse Document Frequency' or TF-IDF, which is commonly used and considered as one of the best existing weighting schemes. The paper shows how the document centroid is used to remove less significant weights from documents and how this helps to achieve better retrieval effectiveness. Most of the existing weighting schemes in information retrieval research assume that the whole document collection is static. The results presented in this paper show that the proposed weighting scheme can produce higher retrieval effectiveness compared with the TF-IDF weighting scheme, in both static and dynamic document collections. The results also show the variation in information retrieval effectiveness that is achieved for static and dynamic document collections by using a specific weighting scheme. This type of comparison has not been presented in the literature before.
Citation
Ibrahim, O. A. S., & Landa-Silva, D. (2014). A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections.
Conference Name | 14th UK Workshop on Computational Intelligence (UKCI2014) |
---|---|
End Date | Sep 10, 2014 |
Publication Date | Sep 1, 2014 |
Deposit Date | Jan 22, 2016 |
Publicly Available Date | Jan 22, 2016 |
Peer Reviewed | Peer Reviewed |
Keywords | information retrieval |
Public URL | https://nottingham-repository.worktribe.com/output/994522 |
Publisher URL | http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6930160&filter%3DAND%28p_IS_Number%3A6930143%29 |
Additional Information | © 2014 IEEE. Published in: 2014 14th UK Workshop on Computational Intelligence (UKCI): Student Central Lecture Theatre SC0.51, University of Bradford, Bradford, West Yorkshire, UK, 8-10 September 2014 / editors Daniel Neagu, Mariam Kiran, Paul Trundle. [Piscataway, N.J.] :IEEE,c2014. ISBN 9781479955381. |
Files
dls_ukci2014.pdf
(342 Kb)
PDF
You might also like
Local-global methods for generalised solar irradiance forecasting
(2024)
Journal Article
UAV Path Planning for Area Coverage and Energy Consumption in Oil and Gas Exploration Environment
(2023)
Presentation / Conference Contribution
Evolving Deep CNN-LSTMs for Inventory Time Series Prediction
(2019)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search