Saturday, July 5, 2008

The Schema Is Data

Apparently a defined ontology is not a precondition of semantic data capability. The process of tagging content relationships with triples (or n-tuples in the MetaWeb system) defines ontology on the fly.

Schema is data. Data defines schema. Graph stores are fundamentally different from relational data stores. Of course, it seems so simple now ;)

Blame my vast experience in the SQL methodology: define a data structure optimized to certain use cases, and your application is locked in. With semantic n-tuple tagging, the data structure is constantly changing, and searches navigate the relationship edges regardless of the underlying structure. As tags are added, the ontology grows. Two different ontologies can be combined by defining relationships between related topics and combining identical topics. In this way the relationship graph grows and new data paths are created.

The RDF triple can be described in terms of traditional data stores if you consider cell-by-cell storage; the subject is the row identifier, the predicate is the column identifier, and the object is the data field. In RDF, this is represented in three URIs. Triples are mathematically extremely simple and powerful, but for those of us who made a career in traditional data systems, it takes some getting used to. But the upshot is SPARQL can execute natural language queries with a fraction of the complexity required with traditional SQL.

How promising is it? I hear NASA is creating a semantic web knowledge base to power a 30 year plan to put a man on Mars. A biosciences knowledge base went online a few weeks ago. Everyone is asking about it, and the semantic web standards are still being defined.