Ontology Development Pitfalls

Ontology Development Pitfalls

Failure to distinguish between an instance-of relationship and a subclass-of relationship. "Bob" is an instance of Mammal. Human is a subclass of Mammal.
Failure to distinguish part-of from subclass-of. A wheel is a part of a car, not a more specific type of car. An opening argument by a trial lawyer is part of a criminal legal proceeding, not a specific type of criminal legal proceeding. This error is surprisingly common in OWL since it has a convenient "rdf:type" relationship, but not a corresponding one for part-of. That's a danger with a simplistic upper ontology.
Modeling events as relations. One can see this occasionally in the linguistics literature. (eats Bill HamSandwich) looks simple and convenient, but if one then wants to say when the eating happened, there's a problem. The typical solution is to say (occurs Tuesday (eats Bill HamSandwich)), but most languages that have some use in inference, like OWL, don't allow statements as arguments to relations, because it's extremely difficult to reason with. Another example of this problem, which has its roots in the classic paper, [Davidson, Donald (1967): "The Logical Form of Action Sentences"] and is developed further in the very readable [Parsons, 1991], is that of (stabs Brutus Caesar) or "Brutus stabs Caesar". One can quickly add that "Brutus stabbed Caesar deeply with a knife on Tuesday" and see that having events as relations is a problem.
"Ontological promiscuity". Creation of terms which aren't sufficiently distinguished from other terms. Each term should have to "fight" for its existence in an ontology, and not just be added on a whim. Many of the same folks who create ontologies with tons of terms that aren't well defined know well that they couldn't take the same approach with a database, or Java code. If one just adds tons of procedures into a program with the thought that they might be used at some point, an unmangeable mess is the result. A related issue is that the ontology modeling language must be able to express differences in concepts for this problem to be apparent to the modeler. In a simple taxonomy language for example, since the modeling isn't able to state definitions, other than in natural language, it may not become apparent when two terms have the same meaning, or their intended meaning isn't really clear.
Confusing language and concepts. This is related to, and often a cause of "ontological promiscuity". People use a lot of different names for the same thing in communication. The lazy solution is just to make all the words or phrases terms in the ontology. But that results in a drastic loss of utility for interoperability because then software can use different formal terms to refer to the same notion.
Modeling roles as classes. Teacher is a role. Human is a class. If I define "Bob is a teacher" or (instance Bob Teacher), then I lose a way to refer to Bob once he retires. This is a very common and serious error, usually caused by developers not thinking about how facts can change over time. There is a tension here however if one uses an ontology in a natural language processing application. MILO contains a number of concepts which map cleanly to WordNet synsets, but which are a bit questionable as terms in a strictly ontological sense.
Failure to reuse. The first impulse of many programmers is to create from scratch. It's fun to create new content that one understands intimately. It's a chore to learn someone else's model, because inevitably you're not going to agree with every facet of it. It takes time to understand someone else's model or code, and that's time that could be spent actually writing code. But in software, all developers accept the principle of reuse, in part, simply because it's now impractical to write your own operating system just because you want to write a group calendar application. The results of writing from scratch are the same for ontologies as they are for procedural software - wasted effort, lack of standardization, and usually poorer quality than would have been the case if content were reused.

There are many other pitfalls of course, but we might venture to say that most other pitfalls aren't a problem in practice because too many people are still making the simple ones. Someday, we'll be warning against the problems of modeling that encapsulates modal ("can", "may", "should") or normative ("obligation", "permission") force into specific action types, but most people aren't working with models (or even languages) that would allow them to make such mistakes. Ontologies are currently dominated by the more simple mistakes.

It's also the case that some of these guidelines have exceptions, especially for work in progress.

Webmaster