Package com.articulate.sigma.wordNet
Class WordNetUtilities
java.lang.Object
com.articulate.sigma.wordNet.WordNetUtilities
- Author:
- Adam Pease
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic intPOS-prefixed mappings from a new synset number to the old one.static intstatic intstatic boolean -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voidcommentSentiment(String fileWithPath) convertTermList(String termList) Convert a list of Terms in the format "invalid input: '&'%term1 invalid input: '&'%term2" to an ArrayList of bare term StringsconvertVerbFrameNumbersToFrames(List<String> numbers) Convert verb frame indexes as Strings into actual vrb frame strings.static Stringstatic voidUse the WordNet hyper-/hypo-nym links to deduce a likely link for a SUMO term that has not yet been manually linked.get all verb frames corresponding to a word in a synset.static voidA utility to extract meronym relations as relations between SUMO terms.findLeaves(String rel) Find all the leaf nodes for a particular relation in WordNet.findLeavesInTree(Set<String> rels) Find all the leaf nodes for a particular relation in WordNet.findPathsToRoot(List<String> base, String synset) Find the complete path from a given synset.static StringformatWords(Map<String, String> words, String kbName) HTML format a TreeMap of word senses and their associated synsetstatic StringHTML format a TreeMap of ArrayLists word sensesstatic voidgenerateHyponymSets(String filename) Generate sets of all hyponymous words for each synset in a filestatic voidGenerate notional SUMO terms from WordNet.static voidgenerateSUMOfromWN(String synset, String sumo) Generate notional SUMO terms from WordNetstatic voidgenerateSUMOfromWNsubtree(String synset, String sumo) Generate notional SUMO terms from WordNetgetAllVerbFrames(String synset, List<String> words) get all verb frames corresponding to a synset.static StringgetBareSUMOTerm(String term) Get a SUMO term minus its invalid input: '&'% prefix and one character mapping suffix.get all synsets corresponding to a SUMO term that are equivalence linksget all verb synsets corresponding to a SUMO term that are equivalence linksstatic StringgetKeyFromSense(String synset) Get the word_POS_num sense key corresponding to a 9 digit synset.static StringgetNumFromKey(String senseKey) Extract the sense number from a word_POS_num sense key.static StringgetPOSfromKey(String senseKey) Extract the POS from a word_POS_num sense key.static StringExtract the sense number from a word%num:num:num sense key.static StringExtract the synset corresponding to a word%num:num:num sense key.static StringgetSenseFromKey(String senseKey) Extract the synset corresponding to a word_POS_num sense key.static chargetSUMOMappingSuffix(String term) Get a SUMO term mapping suffix.getSynsetsFromSUMO(String sumo) get all synsets corresponding to a SUMO termget all synsets corresponding to a list of SUMO termsgetVerbFramesForSynset(String synset) get all verb frames corresponding to a synset.getVerbFramesForWord(String synset, String word) get all verb frames corresponding to a word in a synset.getVerbSynsetsFromSUMO(String sumo) get all verb synsets corresponding to a SUMO termstatic StringExtract the word from a word%num:num:num sense key.static StringgetWordFromKey(String senseKey) Extract the word from a word_POS_num sense key.getWordsFromSynsetList(Collection<String> synsets) get all words corresponding to a list of synsetsvoidImport links from www.image-net.org that are linked to WordNet and links them to SUMO terms when the synset has a directly equivalent SUMO termstatic booleanisHyponymousWord(String word, Set<String> synsets) static booleanisValidKey(String senseKey) Check whether a sense key format is validstatic booleanisValidSynset8(String synset) Check whether a synset format is validstatic booleanisValidSynset9(String synset) Check whether a synset format is validstatic StringlowestCommonParent(String s1, String s2) static voidA main method, used only for testing.static StringmappingCharToName(char mappingType) static voidRead in a file with a nine-digit synset number followed by a space and a SUMO term.static booleanIs the given 9 digit sysnset one constructed from SUMO termFormat expressions?static StringnounPlural(String noun) Return the plural form of the noun.static intnumSynsets(char pos) parseColonKey(String colonKey) Extract the info in a word%num:num:num sense key.static StringposAlphaKeyToWord(String alphaKey) static StringposLettersToNumber(String pos) Convert a part of speech number to the two letter format used by the WordNet sense index code.static charposLetterToNumber(char POS) static charposNumberToLetter(char POS) static StringposNumberToLetters(String pos) Convert a part of speech number to the two letter format used by the WordNet sense index code.static charposPennToNumber(String penn) static StringposWordToAlphaKey(String word) static Stringstatic voidprocessMissingLinks(String fileName, String pattern, String posNum) This is a utility routine that should not be called during normal Sigma operation.static voidreadWNversionMap(String fileName, String pattern, String posNum) Read the version mapping files and store in the HashMap called "mappings".static StringremoveTermPrefixes(String formula) static StringrootFormOf(String word) static voidsearchCoherence(String fileWithPath) Take a file oftab tab and calculate the average Levenshtein distance for each ID. static intTake a WordNet sense identifier, and return the integer part of speech code.static booleansensoryOrMentalWord(String word) test if a word is sensory or mental and return true if soFind all words associated with sensory, psychological and emotional concepts.static voidstatic voidstatic voidshowHelp()static StringshowVerbFrames(String synset) get all verb frames corresponding to a synset.static StringA utility function that mimics the functionality of the perl substitution feature (s/match/replacement/).static booleanA utility function that mimics the functionality of the perl substitution feature (s/match/replacement/) but rather than returning the result of the substitution, just tests whether the result is a key in a hashtable.static voidCompare Lievers list of synesthetic words with those derived from SUMO-WordNetsynestheticSynsets(Map<String, Set<String>> words) Find all the words that exhibit links to multiple sensory modes in SUMOstatic StringsynsetFromOntoNotes(String onKey) Extract the nine digit synset ID corresponding to a word-POS.num sense key.static Stringstatic voidA method used only for testing.static voidstatic voidA method used only for testing.static voidstatic voidtestWord()static voidupdateWNversion(String path, String versionPair) Port the mappings from one version of WordNet to another.static voidupdateWNversionProcess(String fileName, String pattern, String posNum) This is a utility routine that should not be called during normal Sigma operation.static voidupdateWNversionReading(String path, String versionPair) Note that the "old" synset should be the second element of each linestatic intverbFrameNum(String frame) get the number of the verb framestatic StringverbPlural(String verb) Return the plural form of the verb.wordsToSynsets(String word) static voidWrite TPTP format for WordNet
-
Field Details
-
mappings
POS-prefixed mappings from a new synset number to the old one. -
TPTPidCounter
public static int TPTPidCounter -
errorCount
public static int errorCount -
patternNum
public static int patternNum -
WordNetRelations
-
withThoughtEmotion
public static boolean withThoughtEmotion
-
-
Constructor Details
-
WordNetUtilities
public WordNetUtilities()
-
-
Method Details
-
getBareSUMOTerm
Get a SUMO term minus its invalid input: '&'% prefix and one character mapping suffix. -
isValidSynset8
Check whether a synset format is valid -
verbFrameNum
get the number of the verb frame -
isValidSynset9
Check whether a synset format is valid -
isValidKey
Check whether a sense key format is valid -
posAlphaKeyToWord
-
posWordToAlphaKey
-
getPOSfromKey
Extract the POS from a word_POS_num sense key. Should be an alpha key, such as "VB". -
getWordFromKey
Extract the word from a word_POS_num sense key. -
getNumFromKey
Extract the sense number from a word_POS_num sense key. -
getSenseFromKey
Extract the synset corresponding to a word_POS_num sense key. -
parseColonKey
Extract the info in a word%num:num:num sense key. colonp = Pattern.compile("([^%]+)%([^:]*):([^:]*):([^:]*):([^:]*)"); -
getWordFromColonKey
Extract the word from a word%num:num:num sense key. -
getPOSNumFromColonKey
Extract the sense number from a word%num:num:num sense key. -
getSenseFromColonKey
Extract the synset corresponding to a word%num:num:num sense key. -
getKeyFromSense
Get the word_POS_num sense key corresponding to a 9 digit synset. Note that some adjective keys are listed as "adjuncts" with id '3' instead of '5' so we try that too in case of failure. -
synsetFromOntoNotes
Extract the nine digit synset ID corresponding to a word-POS.num sense key. see nlp.corpora.OntoNotes -
removeTermPrefixes
-
convertTermList
Convert a list of Terms in the format "invalid input: '&'%term1 invalid input: '&'%term2" to an ArrayList of bare term Strings -
getSUMOMappingSuffix
Get a SUMO term mapping suffix. -
convertWordNetPointer
-
posLetterToNumber
public static char posLetterToNumber(char POS) -
posNumberToLetter
public static char posNumberToLetter(char POS) -
posPennToNumber
-
posNumberToLetters
Convert a part of speech number to the two letter format used by the WordNet sense index code. Defaults to noun "NN". -
posLettersToNumber
Convert a part of speech number to the two letter format used by the WordNet sense index code. Defaults to noun "NN". -
sensePOS
Take a WordNet sense identifier, and return the integer part of speech code. -
mappingCharToName
-
subst
A utility function that mimics the functionality of the perl substitution feature (s/match/replacement/). Note that only one replacement is made, not a global replacement.- Parameters:
result- is the string on which the substitution is performed.match- is the substring to be found and replaced.subst- is the string replacement for match.- Returns:
- is a String containing the result of the substitution.
-
substTest
public static boolean substTest(String result, String match, String subst, Map<String, Set<String>> hash) A utility function that mimics the functionality of the perl substitution feature (s/match/replacement/) but rather than returning the result of the substitution, just tests whether the result is a key in a hashtable. Note that only one replacement is made, not a global replacement.- Parameters:
result- is the string on which the substitution is performed.match- is the substring to be found and replaced.subst- is the string replacement for match.hash- is a hashtable to be checked against the result.- Returns:
- is a boolean indicating whether the result of the substitution was found in the hashtable.
-
verbPlural
Return the plural form of the verb. Handle multi-word phrases to modify only the first word. -
nounPlural
Return the plural form of the noun. Handle multi-word phrases to modify only the last word. -
formatWords
HTML format a TreeMap of word senses and their associated synset -
formatWordsList
HTML format a TreeMap of ArrayLists word senses -
mergeUpdates
Read in a file with a nine-digit synset number followed by a space and a SUMO term. If the term is more specific than the current mapping for that synset, replace the old term. This is a utility that is not normally called from the interactive Sigma system.- Throws:
IOException
-
processMissingLinks
public static void processMissingLinks(String fileName, String pattern, String posNum) throws IOException This is a utility routine that should not be called during normal Sigma operation. It does most of the actual work for deduceMissingLinks()- Throws:
IOException
-
deduceMissingLinks
Use the WordNet hyper-/hypo-nym links to deduce a likely link for a SUMO term that has not yet been manually linked. This is a utility routine that should not be called during normal Sigma operation.- Throws:
IOException
-
updateWNversionProcess
public static void updateWNversionProcess(String fileName, String pattern, String posNum) throws IOException This is a utility routine that should not be called during normal Sigma operation. It does most of the actual work for updateWNversion(). The output is a set of WordNet data files with a "-new" suffix.- Throws:
IOException
-
readWNversionMap
public static void readWNversionMap(String fileName, String pattern, String posNum) throws IOException Read the version mapping files and store in the HashMap called "mappings". Note that the "old" synset should be the second element of each line- Throws:
IOException
-
updateWNversionReading
Note that the "old" synset should be the second element of each line- Throws:
IOException
-
updateWNversion
Port the mappings from one version of WordNet to another. It calls updateWNversionReading to do most of the work. It assumes that the mapping file has the new synset first and the old one second. File names are for the new WordNet version, which will need to have different names from the old version that WordNet.java needs to read in order to get the existing mappings. This is a utility which should not be called during normal Sigma operation. Mapping files are in a simple format produced by University of Catalonia and available at http://www.lsi.upc.edu/~nlp/web/index.php?option=com_contentinvalid input: '&task'=viewinvalid input: '&id'=21invalid input: '&Itemid'=57 If that address changes you may also start at http://www.lsi.upc.edu/~nlp/web/ and go to Resources and then an item on WordNet mappings.- Throws:
IOException
-
numSynsets
public static int numSynsets(char pos) - Returns:
- the number of synsets in WordNet for the given part of speech
-
printStatistics
-
imageNetLinks
Import links from www.image-net.org that are linked to WordNet and links them to SUMO terms when the synset has a directly equivalent SUMO term- Throws:
IOException
-
extractMeronyms
public static void extractMeronyms()A utility to extract meronym relations as relations between SUMO terms. Filter out relations between genus and species, which shouldn't be meronyms -
searchCoherence
Take a file oftab tab and calculate the average Levenshtein distance for each ID. -
commentSentiment
-
writeTPTPWordNet
Write TPTP format for WordNet- Throws:
IOException
-
findLeavesInTree
Find all the leaf nodes for a particular relation in WordNet. Note that the leaf must have a link from another node to be a leaf. No isolated nodes can be considered leaves.- Returns:
- a list of POS-prefixed synsets
-
findPathsToRoot
Find the complete path from a given synset. If multiple inheritance results in multiple paths, return them all. -
lowestCommonParent
-
findLeaves
Find all the leaf nodes for a particular relation in WordNet. Note that a node may be a leaf simply because it has no such link to another node.- Returns:
- a list of POS-prefixed synsets
-
showAllLeaves
public static void showAllLeaves() -
showAllRoots
public static void showAllRoots() -
wordsToSynsets
- Returns:
- POS-prefixed synsets
-
synsetToOneWord
-
nonWNsynset
Is the given 9 digit sysnset one constructed from SUMO termFormat expressions? -
collapseSenses
- Returns:
- a Set of Sets where each interior Set consists of WordNet word senses that all map to a single SUMO term. The goal is to provide a way to collapse WordNet synsets that embody overly fine grained distinctions.
-
getAllHyponyms
- Returns:
- all the hyponyms of a given POS-prefixed synset
-
getAllHyponymsTransitive
- Returns:
- all the hyponyms of a given POS-prefixed synset
-
isHyponymousWord
- Returns:
- whether the word is a possible hyponym of a given POS-prefixed synset
-
generateHyponymSets
Generate sets of all hyponymous words for each synset in a file -
generateSUMOfromWNsubtree
Generate notional SUMO terms from WordNet -
generateSUMOfromWN
Generate notional SUMO terms from WordNet -
generateSUMOfromWN
public static void generateSUMOfromWN()Generate notional SUMO terms from WordNet. Start with an equivalence Make each synset a notional SUMO term with its parent either the synset parent or the equivalence. -
getSynsetsFromSUMO
get all synsets corresponding to a SUMO term -
convertVerbFrameNumbersToFrames
Convert verb frame indexes as Strings into actual vrb frame strings. For example "1" becomes "Something ----s" -
getVerbFramesForSynset
get all verb frames corresponding to a synset.- Parameters:
synset- is a 9-digit synset Note! The verb frame key takes an 8-digit synset
-
getVerbFramesForWord
get all verb frames corresponding to a word in a synset. Include verb frames common to all words in the synset.- Parameters:
synset- is a 9-digit synset Note! The verb frame key takes an 8-digit synset
-
doVerbFrameSubstitution
public static List<String> doVerbFrameSubstitution(Map<String, List<String>> map, List<String> words) get all verb frames corresponding to a word in a synset.- Parameters:
map- is a set of word keys and the values are the verb frameswords- are all the words in a given synset
-
getAllVerbFrames
get all verb frames corresponding to a synset.- Parameters:
synset- is a 9-digit synset Note! The verb frame key takes an 8-digit synset
-
showVerbFrames
get all verb frames corresponding to a synset.- Parameters:
synset- is a 9-digit synset Note! The verb frame key takes an 8-digit synset
-
getEquivalentVerbSynsetsFromSUMO
get all verb synsets corresponding to a SUMO term that are equivalence links -
getVerbSynsetsFromSUMO
get all verb synsets corresponding to a SUMO term -
getEquivalentSynsetsFromSUMO
get all synsets corresponding to a SUMO term that are equivalence links -
getSynsetsFromSUMOList
get all synsets corresponding to a list of SUMO terms -
getWordsFromSynsetList
get all words corresponding to a list of synsets -
rootFormOf
-
sensoryWords
Find all words associated with sensory, psychological and emotional concepts. Return a set of words with String keys as to the human sense plus "emotion" and "thought" -
synestheticSynsets
Find all the words that exhibit links to multiple sensory modes in SUMO -
synesthesiaCompare
Compare Lievers list of synesthetic words with those derived from SUMO-WordNet -
testCommonParent
public static void testCommonParent()A method used only for testing. It should not be called during normal operation. -
sensoryOrMentalWord
test if a word is sensory or mental and return true if so -
testWord
public static void testWord() -
testSynesthesia
public static void testSynesthesia() -
testGetPOS
public static void testGetPOS() -
testIsValidKey
public static void testIsValidKey()A method used only for testing. It should not be called during normal operation. -
showHelp
public static void showHelp() -
main
A main method, used only for testing. It should not be called during normal operation.
-