articles

Beautiful Ontologies

By Paul Ford, 13 July 2004

‘Ontologies’ structure the information in databases (think Semantic Webs or library card indexes). But who gets to design these semantic motors and what are they built to do? Taking Friendster and Wikipedia as two collective databasing projects which respectively fail to realise the social and structuring potential of ontologies, Paul Ford suggests a better application based on a fusion of the two

These are not all friends, on Friendster. Some are people I don’t like, and their thumbnail visages remind me of awkward interactions and pained conversations. But they requested my Friendstership, and I gave it, for lack of a graceful way to say no. I worry that I have made similar requests, and been cursed as I was added, politely, to someone else’s database of faces.

A social network should be liberating, an expanding circle of personal and professional connections, each individual reaching out across degrees of separation. But after the sense of novelty ended, I began to shrink from these sites, to see the emails which they generated as intrusions, facing what Lev Manovich calls ‘the projection of the ontology of a computer onto culture itself.’[1]

The ontology of the social network is a rigid affair, defined by programmers, with no subtlety in its connections between people, words, and ideas, no room for rudeness, or even mild criticism. A friend is a friend is a friend. The social network databases are filled with half-truths, false connections, all of it impermanent.

But I was taught at age 12 that the database is the holder of truth and the database operation is the pursuit of truth. These ideas were implicit in research, which followed this method: pick a topic, whether trains, DNA, or beavers. Visit the library and negotiate the wooden drawers of the card catalogue. Write down the decimal codes for each book, and retrieve them.

Use the index of the book to find the material relevant to your topic. When something is worth noting, take a clean three by five inch note card from a stack and write the full text of the quotation on the card. On the top left of the card, put the title of the book, and on the right, a page number. On a separate set of cards, copy all of the book’s bibliographic information.

Sort your bibliography cards alphabetically by author’s last name, and number them in that sequence. With all of this in hand, write. Reference each fact to its source with the author’s name and a page number. The last paragraph may include your personal opinion. Copy out the bibliography at the end. Punctuate cautiously. The result will be a pure knowledge, something useful and authentic, and above suspicion.

I was taught that references – links in ink – were evidence of truth. A reference indicated that someone besides me had considered the topic. The links joined my work to the cited work, which was listed in the card catalogue, registered at the Library of Congress, and copyrighted. Manovich points out that the database projects onto culture, that the objects within a database are arranged into narratives. But when I wrote up my adolescent bibliography, and my concluding paragraph, I was projecting in the other direction, pushing back with my narrative at the ontology and making my own place within it. It felt powerful.

Publishing on paper provides a place in the ontology. It gratifies authors to be numbered, placed on shelves, listed and indexed. If the card catalogue is an index of human knowledge, placement in its indexed ranks ensures persistence. What makes the social networks already antiquated, and doomed, is that they do not permit ontologies to be changed. There is no room for ambition, aside from gathering the largest number of friends, which is a hollow goal at best. If individuals could define their own networks and connections, what would they build?

Naturally, an encyclopedia. Wikipedia.org works this way, aggregating people in pursuit of a common task, allowing all to add, edit, or delete anything, keeping the software model as loose as possible. It is the antithesis of Friendster’s controlled database. The task of encyclopedia-creation is primary, the community secondary, which leads to a vibrant, focused community in result.

An organising influence has emerged on Wikipedia. The community created a rhetorical approach that emphasises fairness and balance, and describes both sides of contentious issues. Contributing is an unrewarded task, but, from experience, it is also highly pleasurable: articles are read and edited instantaneously; formulations and ideas are repaired and narrowed. If your words are accepted, you have added to the sum of knowledge. If edited, you have prompted improvement. Either way the map is changed, because of your act.

But Wikipedia has limits. The syntactic looseness of composition, essential to ease the work of contributors, means that the results cannot be fathomed by machines. One can search for Liberia, but not for its borders. One cannot ask of the site, ‘show me all the countries which export diamonds,’ or ‘all of the events that occurred in 1854.’ Such pages might exist, but they have been created by hand, and are incomplete. It should be possible for the computer to do the searching and sorting, to write out a timeline of 1854 without human intervention.

The means to this is the Semantic Web. Here, every resource, fact, country, person, event, is given a unique ID, using layers of XML, RDF [http://www.w3.org/RDF/], and OWL [http://www.w3.org/2001/sw/WebOnt/]. This information is structured according to an ontology, a map of reality with nouns as buildings and prepositions as roads between them. ‘Socrates IsA Man.’ ‘Dogs HavePart Paws.’ Now the timeline can be created in a simple query, sans-intervention.

These ontologies, in the computer science sense, are ultimately proscribed by taste. Ontologies are, at their best, elegance and efficiency encoded. They are useful only as much as they are used, and so they must strive to be appealing, to offer a model of the world that provides the fact-creators with a corresponding sorting and structuring power. This is their measure of beauty; the most ugly are fogs of fact, like the CYC project [http://www.cyc.com/], which expects intelligence to emerge from its expert system. But there are other, simpler models, like the Suggested Upper Merged Ontology (SUMO) [http://ontology.teknowledge.com], which offers a means of encoding facts about tea, Tokyo, text, and terrorism, and the legendary WordNet ontology [http://www.cogsci.princeton.edu/~wn/], which uses a small set of linguistic concepts to sort well over 100,000 words. It ‘knows’ that dogs have paws, and cars have motors, or that a motor might be a part of either a car or boat.

Wikipedia has pathways – links – but they lack semantics, and so the experience of browsing, and especially of searching, is ultimately less pleasurable than it would be were the content encoded semantically, and much more work must be done by hand than might otherwise be needed. Friendster and it siblings have a limited semantics, but no taste: its model of friendship feels awkward and undignified.

Neither site knows exactly what to do about trust: Wikipedia’s open door is also open to vandals, who must be policed, whereas Friendster keeps all of its members in a relational jail.

Trust is often described as the key to the Semantic Web, the problem that must be solved. The way to measure trust is to see who trusts something, and who trusts the trusters. Converted into a number, you can decide whether to believe a fact as it is provided, or view it with suspicion. It brings the link back into the domain of trust, makes it an unambiguous identifier. With trust, it is implied, with permanent identifiers, ontologies, and databases of facts, we can re-create that faith in the library, the search for truth in the links and references, and build knowledge collectively, with machines as our helpmates in the stead of librarians

FOOTNOTE

[1] Lev Manovich, Database as a Symbolic Form. 1998. [http://www.manovich.net/docs/database.rtf]

Paul Ford is the author and programmer of Ftrain.com. He lives in Brooklyn, NY