Class WordNetUtilities

java.lang.Object
com.articulate.sigma.wordNet.WordNetUtilities

public class WordNetUtilities extends Object
Author:
Adam Pease
  • Field Details

    • mappings

      public static Map<String,String> mappings
      POS-prefixed mappings from a new synset number to the old one.
    • TPTPidCounter

      public static int TPTPidCounter
    • errorCount

      public static int errorCount
    • patternNum

      public static int patternNum
    • WordNetRelations

      protected static List<String> WordNetRelations
    • withThoughtEmotion

      public static boolean withThoughtEmotion
  • Constructor Details

    • WordNetUtilities

      public WordNetUtilities()
  • Method Details

    • getBareSUMOTerm

      public static String getBareSUMOTerm(String term)
      Get a SUMO term minus its invalid input: '&'% prefix and one character mapping suffix.
    • isValidSynset8

      public static boolean isValidSynset8(String synset)
      Check whether a synset format is valid
    • verbFrameNum

      public static int verbFrameNum(String frame)
      get the number of the verb frame
    • isValidSynset9

      public static boolean isValidSynset9(String synset)
      Check whether a synset format is valid
    • isValidKey

      public static boolean isValidKey(String senseKey)
      Check whether a sense key format is valid
    • posAlphaKeyToWord

      public static String posAlphaKeyToWord(String alphaKey)
    • posWordToAlphaKey

      public static String posWordToAlphaKey(String word)
    • getPOSfromKey

      public static String getPOSfromKey(String senseKey)
      Extract the POS from a word_POS_num sense key. Should be an alpha key, such as "VB".
    • getWordFromKey

      public static String getWordFromKey(String senseKey)
      Extract the word from a word_POS_num sense key.
    • getNumFromKey

      public static String getNumFromKey(String senseKey)
      Extract the sense number from a word_POS_num sense key.
    • getSenseFromKey

      public static String getSenseFromKey(String senseKey)
      Extract the synset corresponding to a word_POS_num sense key.
    • parseColonKey

      public static List<String> parseColonKey(String colonKey)
      Extract the info in a word%num:num:num sense key. colonp = Pattern.compile("([^%]+)%([^:]*):([^:]*):([^:]*):([^:]*)");
    • getWordFromColonKey

      public static String getWordFromColonKey(String key)
      Extract the word from a word%num:num:num sense key.
    • getPOSNumFromColonKey

      public static String getPOSNumFromColonKey(String key)
      Extract the sense number from a word%num:num:num sense key.
    • getSenseFromColonKey

      public static String getSenseFromColonKey(String key)
      Extract the synset corresponding to a word%num:num:num sense key.
    • getKeyFromSense

      public static String getKeyFromSense(String synset)
      Get the word_POS_num sense key corresponding to a 9 digit synset. Note that some adjective keys are listed as "adjuncts" with id '3' instead of '5' so we try that too in case of failure.
    • synsetFromOntoNotes

      public static String synsetFromOntoNotes(String onKey)
      Extract the nine digit synset ID corresponding to a word-POS.num sense key. see nlp.corpora.OntoNotes
    • removeTermPrefixes

      public static String removeTermPrefixes(String formula)
    • convertTermList

      public static List<String> convertTermList(String termList)
      Convert a list of Terms in the format "invalid input: '&'%term1 invalid input: '&'%term2" to an ArrayList of bare term Strings
    • getSUMOMappingSuffix

      public static char getSUMOMappingSuffix(String term)
      Get a SUMO term mapping suffix.
    • convertWordNetPointer

      public static String convertWordNetPointer(String ptr)
    • posLetterToNumber

      public static char posLetterToNumber(char POS)
    • posNumberToLetter

      public static char posNumberToLetter(char POS)
    • posPennToNumber

      public static char posPennToNumber(String penn)
    • posNumberToLetters

      public static String posNumberToLetters(String pos)
      Convert a part of speech number to the two letter format used by the WordNet sense index code. Defaults to noun "NN".
    • posLettersToNumber

      public static String posLettersToNumber(String pos)
      Convert a part of speech number to the two letter format used by the WordNet sense index code. Defaults to noun "NN".
    • sensePOS

      public static int sensePOS(String sense)
      Take a WordNet sense identifier, and return the integer part of speech code.
    • mappingCharToName

      public static String mappingCharToName(char mappingType)
    • subst

      public static String subst(String result, String match, String subst)
      A utility function that mimics the functionality of the perl substitution feature (s/match/replacement/). Note that only one replacement is made, not a global replacement.
      Parameters:
      result - is the string on which the substitution is performed.
      match - is the substring to be found and replaced.
      subst - is the string replacement for match.
      Returns:
      is a String containing the result of the substitution.
    • substTest

      public static boolean substTest(String result, String match, String subst, Map<String,Set<String>> hash)
      A utility function that mimics the functionality of the perl substitution feature (s/match/replacement/) but rather than returning the result of the substitution, just tests whether the result is a key in a hashtable. Note that only one replacement is made, not a global replacement.
      Parameters:
      result - is the string on which the substitution is performed.
      match - is the substring to be found and replaced.
      subst - is the string replacement for match.
      hash - is a hashtable to be checked against the result.
      Returns:
      is a boolean indicating whether the result of the substitution was found in the hashtable.
    • verbPlural

      public static String verbPlural(String verb)
      Return the plural form of the verb. Handle multi-word phrases to modify only the first word.
    • nounPlural

      public static String nounPlural(String noun)
      Return the plural form of the noun. Handle multi-word phrases to modify only the last word.
    • formatWords

      public static String formatWords(Map<String,String> words, String kbName)
      HTML format a TreeMap of word senses and their associated synset
    • formatWordsList

      public static String formatWordsList(Map<String,List<String>> words, String kbName)
      HTML format a TreeMap of ArrayLists word senses
    • mergeUpdates

      public static void mergeUpdates() throws IOException
      Read in a file with a nine-digit synset number followed by a space and a SUMO term. If the term is more specific than the current mapping for that synset, replace the old term. This is a utility that is not normally called from the interactive Sigma system.
      Throws:
      IOException
    • processMissingLinks

      public static void processMissingLinks(String fileName, String pattern, String posNum) throws IOException
      This is a utility routine that should not be called during normal Sigma operation. It does most of the actual work for deduceMissingLinks()
      Throws:
      IOException
    • deduceMissingLinks

      public static void deduceMissingLinks() throws IOException
      Use the WordNet hyper-/hypo-nym links to deduce a likely link for a SUMO term that has not yet been manually linked. This is a utility routine that should not be called during normal Sigma operation.
      Throws:
      IOException
    • updateWNversionProcess

      public static void updateWNversionProcess(String fileName, String pattern, String posNum) throws IOException
      This is a utility routine that should not be called during normal Sigma operation. It does most of the actual work for updateWNversion(). The output is a set of WordNet data files with a "-new" suffix.
      Throws:
      IOException
    • readWNversionMap

      public static void readWNversionMap(String fileName, String pattern, String posNum) throws IOException
      Read the version mapping files and store in the HashMap called "mappings". Note that the "old" synset should be the second element of each line
      Throws:
      IOException
    • updateWNversionReading

      public static void updateWNversionReading(String path, String versionPair) throws IOException
      Note that the "old" synset should be the second element of each line
      Throws:
      IOException
    • updateWNversion

      public static void updateWNversion(String path, String versionPair) throws IOException
      Port the mappings from one version of WordNet to another. It calls updateWNversionReading to do most of the work. It assumes that the mapping file has the new synset first and the old one second. File names are for the new WordNet version, which will need to have different names from the old version that WordNet.java needs to read in order to get the existing mappings. This is a utility which should not be called during normal Sigma operation. Mapping files are in a simple format produced by University of Catalonia and available at http://www.lsi.upc.edu/~nlp/web/index.php?option=com_contentinvalid input: '&task'=viewinvalid input: '&id'=21invalid input: '&Itemid'=57 If that address changes you may also start at http://www.lsi.upc.edu/~nlp/web/ and go to Resources and then an item on WordNet mappings.
      Throws:
      IOException
    • numSynsets

      public static int numSynsets(char pos)
      Returns:
      the number of synsets in WordNet for the given part of speech
    • printStatistics

      public static String printStatistics()
    • imageNetLinks

      public void imageNetLinks() throws IOException
      Import links from www.image-net.org that are linked to WordNet and links them to SUMO terms when the synset has a directly equivalent SUMO term
      Throws:
      IOException
    • extractMeronyms

      public static void extractMeronyms()
      A utility to extract meronym relations as relations between SUMO terms. Filter out relations between genus and species, which shouldn't be meronyms
    • searchCoherence

      public static void searchCoherence(String fileWithPath)
      Take a file of tabtab and calculate the average Levenshtein distance for each ID.
    • commentSentiment

      public static void commentSentiment(String fileWithPath)
    • writeTPTPWordNet

      public static void writeTPTPWordNet(PrintWriter pw) throws IOException
      Write TPTP format for WordNet
      Throws:
      IOException
    • findLeavesInTree

      public static Set<String> findLeavesInTree(Set<String> rels)
      Find all the leaf nodes for a particular relation in WordNet. Note that the leaf must have a link from another node to be a leaf. No isolated nodes can be considered leaves.
      Returns:
      a list of POS-prefixed synsets
    • findPathsToRoot

      public static List<List<String>> findPathsToRoot(List<String> base, String synset)
      Find the complete path from a given synset. If multiple inheritance results in multiple paths, return them all.
    • lowestCommonParent

      public static String lowestCommonParent(String s1, String s2)
    • findLeaves

      public static Set<String> findLeaves(String rel)
      Find all the leaf nodes for a particular relation in WordNet. Note that a node may be a leaf simply because it has no such link to another node.
      Returns:
      a list of POS-prefixed synsets
    • showAllLeaves

      public static void showAllLeaves()
    • showAllRoots

      public static void showAllRoots()
    • wordsToSynsets

      public static Set<String> wordsToSynsets(String word)
      Returns:
      POS-prefixed synsets
    • synsetToOneWord

      public static String synsetToOneWord(String s)
    • nonWNsynset

      public static boolean nonWNsynset(String s)
      Is the given 9 digit sysnset one constructed from SUMO termFormat expressions?
    • collapseSenses

      public static Map<String,Set<String>> collapseSenses()
      Returns:
      a Set of Sets where each interior Set consists of WordNet word senses that all map to a single SUMO term. The goal is to provide a way to collapse WordNet synsets that embody overly fine grained distinctions.
    • getAllHyponyms

      public static Set<String> getAllHyponyms(String s)
      Returns:
      all the hyponyms of a given POS-prefixed synset
    • getAllHyponymsTransitive

      public static Set<String> getAllHyponymsTransitive(String s)
      Returns:
      all the hyponyms of a given POS-prefixed synset
    • isHyponymousWord

      public static boolean isHyponymousWord(String word, Set<String> synsets)
      Returns:
      whether the word is a possible hyponym of a given POS-prefixed synset
    • generateHyponymSets

      public static void generateHyponymSets(String filename)
      Generate sets of all hyponymous words for each synset in a file
    • generateSUMOfromWNsubtree

      public static void generateSUMOfromWNsubtree(String synset, String sumo)
      Generate notional SUMO terms from WordNet
    • generateSUMOfromWN

      public static void generateSUMOfromWN(String synset, String sumo)
      Generate notional SUMO terms from WordNet
    • generateSUMOfromWN

      public static void generateSUMOfromWN()
      Generate notional SUMO terms from WordNet. Start with an equivalence Make each synset a notional SUMO term with its parent either the synset parent or the equivalence.
    • getSynsetsFromSUMO

      public static List<String> getSynsetsFromSUMO(String sumo)
      get all synsets corresponding to a SUMO term
    • convertVerbFrameNumbersToFrames

      public static List<String> convertVerbFrameNumbersToFrames(List<String> numbers)
      Convert verb frame indexes as Strings into actual vrb frame strings. For example "1" becomes "Something ----s"
    • getVerbFramesForSynset

      public static List<String> getVerbFramesForSynset(String synset)
      get all verb frames corresponding to a synset.
      Parameters:
      synset - is a 9-digit synset Note! The verb frame key takes an 8-digit synset
    • getVerbFramesForWord

      public static List<String> getVerbFramesForWord(String synset, String word)
      get all verb frames corresponding to a word in a synset. Include verb frames common to all words in the synset.
      Parameters:
      synset - is a 9-digit synset Note! The verb frame key takes an 8-digit synset
    • doVerbFrameSubstitution

      public static List<String> doVerbFrameSubstitution(Map<String,List<String>> map, List<String> words)
      get all verb frames corresponding to a word in a synset.
      Parameters:
      map - is a set of word keys and the values are the verb frames
      words - are all the words in a given synset
    • getAllVerbFrames

      public static Map<String,List<String>> getAllVerbFrames(String synset, List<String> words)
      get all verb frames corresponding to a synset.
      Parameters:
      synset - is a 9-digit synset Note! The verb frame key takes an 8-digit synset
    • showVerbFrames

      public static String showVerbFrames(String synset)
      get all verb frames corresponding to a synset.
      Parameters:
      synset - is a 9-digit synset Note! The verb frame key takes an 8-digit synset
    • getEquivalentVerbSynsetsFromSUMO

      public static List<String> getEquivalentVerbSynsetsFromSUMO(String sumo)
      get all verb synsets corresponding to a SUMO term that are equivalence links
    • getVerbSynsetsFromSUMO

      public static List<String> getVerbSynsetsFromSUMO(String sumo)
      get all verb synsets corresponding to a SUMO term
    • getEquivalentSynsetsFromSUMO

      public static List<String> getEquivalentSynsetsFromSUMO(String sumo)
      get all synsets corresponding to a SUMO term that are equivalence links
    • getSynsetsFromSUMOList

      public static Set<String> getSynsetsFromSUMOList(Collection<String> sumo)
      get all synsets corresponding to a list of SUMO terms
    • getWordsFromSynsetList

      public static Set<String> getWordsFromSynsetList(Collection<String> synsets)
      get all words corresponding to a list of synsets
    • rootFormOf

      public static String rootFormOf(String word)
    • sensoryWords

      public static Map<String,Set<String>> sensoryWords()
      Find all words associated with sensory, psychological and emotional concepts. Return a set of words with String keys as to the human sense plus "emotion" and "thought"
    • synestheticSynsets

      public static Set<String> synestheticSynsets(Map<String,Set<String>> words)
      Find all the words that exhibit links to multiple sensory modes in SUMO
    • synesthesiaCompare

      public static void synesthesiaCompare(Map<String,Set<String>> words, Set<String> synwords)
      Compare Lievers list of synesthetic words with those derived from SUMO-WordNet
    • testCommonParent

      public static void testCommonParent()
      A method used only for testing. It should not be called during normal operation.
    • sensoryOrMentalWord

      public static boolean sensoryOrMentalWord(String word)
      test if a word is sensory or mental and return true if so
    • testWord

      public static void testWord()
    • testSynesthesia

      public static void testSynesthesia()
    • testGetPOS

      public static void testGetPOS()
    • testIsValidKey

      public static void testIsValidKey()
      A method used only for testing. It should not be called during normal operation.
    • showHelp

      public static void showHelp()
    • main

      public static void main(String[] args)
      A main method, used only for testing. It should not be called during normal operation.