SUMO Student Project Ideas

    [ Home | Browse | Download | Publications | Projects | Tools | What‘s New | About]

    Interested students should see the introductory presentation on SUMO, with accompanying audio.

    1. Natural Language Generation. Translate the SUMO natural language generation templates into a language not currently covered here. These templates allow Sigma to generate paraphrases of axioms automatically in different languages. (1 week of work) For example, you can see for the definitions shown that the formal axiom on the left is automatically converted to the English paraphrase on the right.
          (=>
            (and
              (instance ?AMBULATE Ambulating)
              (agent ?AMBULATE ?AGENT))
            (attribute ?AGENT Standing)) 
                  
      • if a process is an instance of ambulating and an agent is an agent of process
      • then standing is an attribute of agent
      This process uses several language templates, including
          (termFormat EnglishLanguage Ambulating "ambulating")
          (format EnglishLanguage agent "%2 is %n an &%agent of %1")
      1. Expand the coverage of this file in both English and the target language to handle all 20,000 terms (including relations) in all of SUMO (several months, maybe less with some scripts to help automate part of it)
      2. Extend the functionality of Sigma's com.articulate.sigma.LanguageFormatter to handle any linguistic features of the target language not present in English that affect the language paraphrase presentation and possibly extend the simplistic formatting symbol language (%p, %n etc as described at the top of the format file) (several months of work in Java)
    2. Improved NL Generation - The current NL generation method in Sigma is pretty simplistic. Improving it in part by applying existing methods from the literature could be either a graduate semester project for doing something of finite scope like making action sentences more natural (instead of "the agent of cooking is John" show "John cooked"), to the more general task of improving the system in any way possible across multiple linguistic features that would be appropriate for a PhD thesis.
    3. Knowledge Creation for Measurements. Create knowledge about typical measures and ranges of measures for physical things in SUMO.
      1. Manually create assertions for the measures of physical objects for the top few levels of the SUMO CorpuscularObject hierarchy (1 month)
      2. Create a machine learning system that uses an existing IR system to extract numerical measure information from the web, verifies them against the manually created defaults and ranges in the upper levels of SUMO created in task above and then asserts them as KIF/SUMO formulas if ok (master's or PhD project).
      3. General knowledge creation - Additional knowledge entry - pick a general topic and add the information to all the relevant SUMO terms - create statements about all the parts, material and composition of physical objects (a chair typically has legs and some are made of wood), or usage and capability (a chair can be used as the instrument in a sitting event, a lawyer has the skill of practicing law etc)
    4. Proof step type reporting - Add a feature to Sigma to analyze and report the type of inference step used in a proof. For example, report if an inference engine's proof step was derived using Modus Ponens or De Morgan's rule. (half-year graduate project)
    5. Wikipedia extraction. Create SUMO definitions for the 1300 Wikipedia infobox relations (6 months at a rate of 50 per week that is the standard I use for estimating knowledge engineering for creating each new fully formalized term by a mature knowledge engineer, might take an additional 3 months of training to get to that point). This should be coordinated with the YAGO folks.
      1. Extract all the information in DBPedia for these relations and assert them as KIF formulas in SUMO, just like the existing YAGO (It's possible it could be just a month by reusing the tools employed for YAGO).
    6. Intro to knowledge engineering - 1 week homework (towards of an undergraduate or first year graduate logic course) Pick a topic you know and are interested in, write down 10 sentences that are true about the topic. Using the Sigma SUMO-WordNet browser, enter the English words from your sentences and find the most specific SUMO term available for defining the word (be careful to look at the SUMO definition and not just make assumptions based on the term name!). If the most specific term available is more general than the term you want, create a subclass and write a documentation statement for it. Rewrite your English sentences in SUO-KIF using your newly defined terms and existing SUMO terms. Example: "A piano has keys."

          
          "piano" -> Piano
          WordNet: 103928116 a keyboard instrument that is played by depressing
          keys that cause hammers to strike tuned strings and produce sounds.
          SUMO Mappings: Piano (equivalent mapping) 
          
          "key" -> Device
          WordNet: 103613592 a lever (as in a keyboard) that actuates a mechanism
          when depressed. 
          SUMO Mappings: Device (subsuming mapping) 
          define KeyboardKey
          (subclass KeyboardKey Device)
          (documentation KeyboardKey "A lever on a musical keyboard that actuates
          a mechanism when depressed in order to produce sounds. ")
          
          (=>
            (instance ?P Piano)
            (exists (?K)
              (and
                (instance ?K KeyboardKey)
                (part ?K ?P))))
          

      This project can be expanded to any scope desired. A senior project could work on formalizing a small area of knowledge with 100 terms and 500 axioms, for example, automobile engines, parts, function, connectivity and materials. Product catalogs like McMaster-Carr would make a good concrete source. A master's or PhD-level project would examine a more difficult area of knowledge, such as augmenting SUMO's theory of space and spatial relations to create a consistent theory of 2d and 3d space with abstract shapes and their relations to real-world objects, and with a corpus of queries that show how the augmented theory can be used with an existing first order theorem prover.

    7. Logical expressiveness reporting and extraction. Create a Java module for Sigma that analyzes a knowledge base for whether it is propositional, DL, horn, FOL, HOL, etc Make it possible also to analyze and report which portions of a KB are of each form. Create a function in Sigma that allows the user to extract statements of a given level of expressiveness from a KB that may also have more expressive statements. For example, be able to extract just horn clauses from a first order KB. Master's thesis or PhD.
    8. Done - Graphical KB analysis - Perform a graph analysis of a KB to show the connectedness and dependencies of files within a knowledge base or portions of a knowledge base. Add this as Java code to Sigma. (One or two month graduate or senior undergraduate project)
    9. CELT-based Knowledge Entry - Semi-manual Wikipedia text entry with the Controlled English to Logic Translation (CELT) system. This is a bit speculative since CELT is still in a relatively early form, but it should be possible for people to learn its limitations and manually simplify Wikipedia content and enter simplified sentences into CELT which will result in new FOL expressions in Sigma that extend SUMO. (undergraduate semester senior project or first year Master's semester project. This could evolve into a PhD project for a linguist fluent in SWI-Prolog who wants to add a certain new class of grammatical interpretation capabilities to CELT).
    10. Language Translation - Using CELT and Sigma's language generation capability pick a pair of languages and create a bi-directional language translation capability. This would involve work at the level of a PhD dissertation to convert CELT to handle a language other than English, as well as a smaller task of improving language generation in the target language.
    11. Done - Ontology alignment. Integrate into Sigma a module that creates a list of candidate correspondences between pairs of selected knowledge bases. In previous work we wrote a prolog program that created a set of correspondences through different methods and then presented a weighted list of probably candidates. Having a nice interface to allow a user to select a given correspondence and automatically assert a &%synonymousExternalConcept or &%subsumingExternalConcept statement would also be helpful. The different alignment methods we thought of are: (1) identical term names (2) substrings of term names are equal (3) terms align to words in the same WordNet synset (4) extra "points" for having terms that align with the same structural arrangement (i.e. B is a subclass of A, Y is a subclass of X, A is a candidate alignment with X etc). (6 month senior project or could be a Master's thesis with additional features and a controlled evaluation experiment in knowledge creation with and without the mapping tool)
    12. CASC - Contribute tests to the LTB SUMO division of CASC.
    13. SUMO-WordNet Mappings - Revise the SUMO-WordNet mappings to use the most specific SUMO terms possible. As SUMO terms get added, they aren't always used to revise the existing mappings. Find SUMO terms that don't have any mappings such as Sandal (note that there are no words listed in the browser at the upper right). Update the WordNet mappings file to point to the new term. (Each mapping takes only a few minutes, but a month of training to get familiar enough with SUMO to be that efficient)

    Webmaster

    Hosted by CIM3.NET