Word Sense Disambiguation
-
Word sense disambiguation, also called lexical ambiguity resolution, is a
crucial part of many NLP systems.
-
What is a word sense? How do we know how many senses a word has?
-
These questions are easiest to answer when senses have different parts of speech,
or are more or less accidental homonyms (e.g. 'mole' can mean a small burrowing
insectivore, a spy within an organization, a skin blemish, a breakwater, ...)
Note: #Ken has two moles on his cheek and two in his backyard.
-
It gets harder when senses are systematically derived from one another (regular polysemy) and overlap
('book', 'newspaper', 'magazine', etc. can refer to the physical object or its
information content).
An "ambiguity test"; note: ?#
The book is clearly written but weighs four pounds. (not so bad)
-
Are these separate senses of 'spill'?
- The milk spilled on the rug.
- The children spilled the milk on the rug.
What about these uses of 'spray'?
- The mechanic sprayed the bearings with oil.
- The mechanic sprayed oil on the bearings.
-
Lexicographers are far from consistent about the number of senses of many words.
Just compare a few definitions of ambiguous words across dictionaries to see this.
-
We also have to worry about "on the fly" metaphorical and metonymic sense extentions,
which function like distinct senses (for instance, their selectional restrictions may
be totally different):
- This car drinks oil.
- The south side of Takoma Park voted Democratic.
-
POS-tagging is one type of word sense disambiguation. But because it typically
relies on syntactic information (what are the POS tags of the nearby words), we
need other methods to discriminate among senses that have the same part of speech.
(2)
(back to question-answering systems)
(return to syllabus)