Skip to main content

Research Repository

Advanced Search

Invariance and identifiability issues for word embeddings

Carrington, Rachel; Bharath, Karthik; Preston, Simon

Invariance and identifiability issues for word embeddings Thumbnail


Authors

Rachel Carrington

SIMON PRESTON simon.preston@nottingham.ac.uk
Professor of Statistics and Applied Mathematics



Abstract

Word embeddings are commonly obtained as optimisers of a criterion function f of 1 a text corpus, but assessed on word-task performance using a different evaluation 2 function g of the test data. We contend that a possible source of disparity in 3 performance on tasks is the incompatibility between classes of transformations that 4 leave f and g invariant. In particular, word embeddings defined by f are not unique; 5 they are defined only up to a class of transformations to which f is invariant, and 6 this class is larger than the class to which g is invariant. One implication of this is 7 that the apparent superiority of one word embedding over another, as measured by 8 word task performance, may largely be a consequence of the arbitrary elements 9 selected from the respective solution sets. We provide a formal treatment of the 10 above identifiability issue, present some numerical examples, and discuss possible 11 resolutions.

Conference Name NeurIPS 2019
Start Date Dec 8, 2019
End Date Dec 14, 2019
Acceptance Date Sep 3, 2019
Online Publication Date Dec 14, 2019
Publication Date Dec 14, 2019
Deposit Date Oct 16, 2019
Publicly Available Date Feb 15, 2020
Book Title Advances in Neural Information Processing Systems 32 (NIPS 2019)
Public URL https://nottingham-repository.worktribe.com/output/2848777
Publisher URL https://papers.nips.cc/paper/9650-invariance-and-identifiability-issues-for-word-embeddings
Related Public URLs https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019

Files





You might also like



Downloadable Citations