Package com.articulate.sigma.wordNet
Class WSD
java.lang.Object
com.articulate.sigma.wordNet.WSD
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncollectSUMOFromFile(String filename) Extract SUMO terms from a file assuming one sentence per linecollectSUMOFromString(String lineStr) Extract SUMO terms from a file assuming one sentence per linecollectSUMOFromWords(String sentence) Collect all the SUMO terms that are found in the sentence.collectWordSenses(String text) Collect all the synsets that represent the best guess at meanings for all the words in a text given a larger linguistic context.static StringfindWordSenseInContext(String word, List<String> words) Return the best guess at the synset for the given word in the context of the sentence.static StringfindWordSenseInContextWithDomain(String word, List<String> words, String sumo) Return the best guess at the synset for the given word in the context of the sentence.static StringfindWordSenseInContextWithPos(String word, List<String> words, int pos, boolean lemma) Return the best guess at the synset for the given word in the context of the sentence with the given POS.static Set<com.articulate.sigma.utils.AVPair> findWordSensePOS(String word, List<String> words, int POS) Return a list of scored guesses at the synset for the given word in the context of the sentence.static StringgetBestDefaultSense(String word) static StringgetBestDefaultSense(String word, int pos) Get the POS-prefixed synset that represents the best guess at meaning for a word with a given part of speech.static StringgetBestDefaultSenseWithDomain(String word, String sumo) Get the POS-prefixed synset that represents the best guess at meaning for a word.static StringgetBestDefaultSUMO(String word) Get the SUMO term that represents the best guess at meaning for a word.static StringgetBestDefaultSUMOsense(String word, int pos) Get the SUMO term that represents the best guess at meaning for a word.static voidstatic voidA main method, used only for testing.static booleanpolysemous(String word) static booleanpolysemous(String word, int pos) static voidprintSUMOFromFileByLine(String filename) Extract SUMO terms from a file assuming one sentence per line print SUMO term keys and integer counts of their appearancereadFileIntoArray(String filename) static voidA method used only for testing.static voidA method used only for testing.static voidA method used only for testing.
-
Field Details
-
threshold
public static int threshold -
gap
public static int gap -
debug
public static boolean debug
-
-
Constructor Details
-
WSD
public WSD()
-
-
Method Details
-
collectSUMOFromWords
Collect all the SUMO terms that are found in the sentence. -
polysemous
-
polysemous
-
collectWordSenses
Collect all the synsets that represent the best guess at meanings for all the words in a text given a larger linguistic context.- Returns:
- 9 digit synset IDs
-
findWordSenseInContextWithDomain
Return the best guess at the synset for the given word in the context of the sentence. @return the 9-digit synset but only if there's a reasonable amount of data, otherwise return the most frequent sense. In all cases, filter by the given SUMO term, and pick the next best synset according to context cooccurrence frequency or word frequency if the top scoring synset doesn't fit the given SUMO type. TODO - create an option to prefer the SUMO term but fall back to the best other option if not such synset is found -
findWordSenseInContext
Return the best guess at the synset for the given word in the context of the sentence. @return the 9-digit synset but only if there's a reasonable amount of data, otherwise return the most frequent sense. -
findWordSenseInContextWithPos
public static String findWordSenseInContextWithPos(String word, List<String> words, int pos, boolean lemma) Return the best guess at the synset for the given word in the context of the sentence with the given POS.- Parameters:
word- - word to disambiguatewords- - words in contextpos- - part of speech of @word- Returns:
- the 9-digit synset but only if there's a reasonable amount of data.
-
findWordSensePOS
public static Set<com.articulate.sigma.utils.AVPair> findWordSensePOS(String word, List<String> words, int POS) Return a list of scored guesses at the synset for the given word in the context of the sentence. Returns a TreeSet consisting AVPairs of the key score reflecting the quality of the guess the given synset is the right one and a value of a 9-digit WordNet synset -
getBestDefaultSUMOsense
Get the SUMO term that represents the best guess at meaning for a word. This method attempts to convert to root form. -
getBestDefaultSUMO
Get the SUMO term that represents the best guess at meaning for a word. -
getBestDefaultSenseWithDomain
Get the POS-prefixed synset that represents the best guess at meaning for a word. If there is no wordFrequency entry for the given word then it returns any sense. @return a 9 digit synset number. Require that the synset have a mapping to SUMO that is a subclass or instance of @param sumo. Ignore @param sumo if it's an empty string. -
getBestDefaultSense
-
getBestDefaultSense
Get the POS-prefixed synset that represents the best guess at meaning for a word with a given part of speech. It picks the most frequent sense for the word in the Brown Corpus.- Returns:
- a 9 digit synset number
-
readFileIntoArray
- Returns:
- each line of a file into an array. The first element of each interior array is the whole line, and subsequent elements are the individual words.
-
readFile
- Returns:
- each line of a file into an array of String.
-
collectSUMOFromFile
Extract SUMO terms from a file assuming one sentence per line- Returns:
- a Map of SUMO term keys and integer counts of their appearance
-
printSUMOFromFileByLine
Extract SUMO terms from a file assuming one sentence per line print SUMO term keys and integer counts of their appearance -
collectSUMOFromString
Extract SUMO terms from a file assuming one sentence per line- Returns:
- a Map of SUMO term keys and integer counts of their appearance
-
testWordWSD
public static void testWordWSD()A method used only for testing. It should not be called during normal operation. -
testSentenceWSD
public static void testSentenceWSD()A method used only for testing. It should not be called during normal operation. -
testSentenceWSD2
public static void testSentenceWSD2()A method used only for testing. It should not be called during normal operation. -
interactive
public static void interactive() -
main
A main method, used only for testing. It should not be called during normal operation.
-