Class WSD

java.lang.Object
com.articulate.sigma.wordNet.WSD

public class WSD extends Object
  • Field Details

    • threshold

      public static int threshold
    • gap

      public static int gap
    • debug

      public static boolean debug
  • Constructor Details

    • WSD

      public WSD()
  • Method Details

    • collectSUMOFromWords

      public static List<String> collectSUMOFromWords(String sentence)
      Collect all the SUMO terms that are found in the sentence.
    • polysemous

      public static boolean polysemous(String word)
    • polysemous

      public static boolean polysemous(String word, int pos)
    • collectWordSenses

      public static List<String> collectWordSenses(String text)
      Collect all the synsets that represent the best guess at meanings for all the words in a text given a larger linguistic context.
      Returns:
      9 digit synset IDs
    • findWordSenseInContextWithDomain

      public static String findWordSenseInContextWithDomain(String word, List<String> words, String sumo)
      Return the best guess at the synset for the given word in the context of the sentence. @return the 9-digit synset but only if there's a reasonable amount of data, otherwise return the most frequent sense. In all cases, filter by the given SUMO term, and pick the next best synset according to context cooccurrence frequency or word frequency if the top scoring synset doesn't fit the given SUMO type. TODO - create an option to prefer the SUMO term but fall back to the best other option if not such synset is found
    • findWordSenseInContext

      public static String findWordSenseInContext(String word, List<String> words)
      Return the best guess at the synset for the given word in the context of the sentence. @return the 9-digit synset but only if there's a reasonable amount of data, otherwise return the most frequent sense.
    • findWordSenseInContextWithPos

      public static String findWordSenseInContextWithPos(String word, List<String> words, int pos, boolean lemma)
      Return the best guess at the synset for the given word in the context of the sentence with the given POS.
      Parameters:
      word - - word to disambiguate
      words - - words in context
      pos - - part of speech of @word
      Returns:
      the 9-digit synset but only if there's a reasonable amount of data.
    • findWordSensePOS

      public static Set<com.articulate.sigma.utils.AVPair> findWordSensePOS(String word, List<String> words, int POS)
      Return a list of scored guesses at the synset for the given word in the context of the sentence. Returns a TreeSet consisting AVPairs of the key score reflecting the quality of the guess the given synset is the right one and a value of a 9-digit WordNet synset
    • getBestDefaultSUMOsense

      public static String getBestDefaultSUMOsense(String word, int pos)
      Get the SUMO term that represents the best guess at meaning for a word. This method attempts to convert to root form.
    • getBestDefaultSUMO

      public static String getBestDefaultSUMO(String word)
      Get the SUMO term that represents the best guess at meaning for a word.
    • getBestDefaultSenseWithDomain

      public static String getBestDefaultSenseWithDomain(String word, String sumo)
      Get the POS-prefixed synset that represents the best guess at meaning for a word. If there is no wordFrequency entry for the given word then it returns any sense. @return a 9 digit synset number. Require that the synset have a mapping to SUMO that is a subclass or instance of @param sumo. Ignore @param sumo if it's an empty string.
    • getBestDefaultSense

      public static String getBestDefaultSense(String word)
    • getBestDefaultSense

      public static String getBestDefaultSense(String word, int pos)
      Get the POS-prefixed synset that represents the best guess at meaning for a word with a given part of speech. It picks the most frequent sense for the word in the Brown Corpus.
      Returns:
      a 9 digit synset number
    • readFileIntoArray

      public static List<List<String>> readFileIntoArray(String filename)
      Returns:
      each line of a file into an array. The first element of each interior array is the whole line, and subsequent elements are the individual words.
    • readFile

      public static List<String> readFile(String filename)
      Returns:
      each line of a file into an array of String.
    • collectSUMOFromFile

      public static Map<String,Integer> collectSUMOFromFile(String filename)
      Extract SUMO terms from a file assuming one sentence per line
      Returns:
      a Map of SUMO term keys and integer counts of their appearance
    • printSUMOFromFileByLine

      public static void printSUMOFromFileByLine(String filename)
      Extract SUMO terms from a file assuming one sentence per line print SUMO term keys and integer counts of their appearance
    • collectSUMOFromString

      public static Map<String,Integer> collectSUMOFromString(String lineStr)
      Extract SUMO terms from a file assuming one sentence per line
      Returns:
      a Map of SUMO term keys and integer counts of their appearance
    • testWordWSD

      public static void testWordWSD()
      A method used only for testing. It should not be called during normal operation.
    • testSentenceWSD

      public static void testSentenceWSD()
      A method used only for testing. It should not be called during normal operation.
    • testSentenceWSD2

      public static void testSentenceWSD2()
      A method used only for testing. It should not be called during normal operation.
    • interactive

      public static void interactive()
    • main

      public static void main(String[] args)
      A main method, used only for testing. It should not be called during normal operation.