[openbiblio-dev] FW: Re: Object mapping for RDF

Tue Feb 15 19:05:56 UTC 2011

Some follow-up I sent after a discussion with Peter
Buneman and some of his students this afternoon. 
Perhaps of use for understanding the architecture of
the openbiblio software.

----- Forwarded message from William Waites <ww at styx.org> -----

Hi there, as we were talking about,

   http://packages.python.org/ordf/odm.html

The module ordf.vocab.owl is derived from InfixOWL
from Chimezie Ogbuji of the Chicago Clinic,

   http://code.google.com/p/fuxi/wiki/InfixOwl

mainly to make working with instances easier.

This only tells part of the story though -- there
is no interaction with a real datastore. That is
taken care of by ordf.handler 

   http://packages.python.org/ordf/message_handling.html

which implements get(), set() and append() methods
that operate on graphs and some kind of storage
back-end. There is also a query() method that will
work if one of the configured storages supports
querying.

The way these two pieces are glued together is
an area of active development being prototyped in

   https://bitbucket.org/okfn/openbiblio/src/tip/openbiblio/model/base.py

One implementation detail to note is that when a
graph/instance is fetched from the store it is not
directly backed by the store but is in a temporary
in-memory graph. Typically the save() method will
effectively do a graph diff and store the diff and
the new version. An important side effect is that
since queries happen in the store, they will not
reflect unsaved changes.

This openbiblio software is a web application using
the pylons framework, this part concerns the models,
but there is also a controllers directory which
contains the logic for user interactions. The basic
interaction is GET PUT and POST requests, which
are intended to retrieve, replace (or create) and
append to graphs. In the controller is where you
would have some sort of access policy which might
depend on existence or not of certain triples in
the local store. This is basically the same structure
as any other MVC kind of web application with a
relational back-end.

Inferencing. There actually is support for inferencing
both by letting the store do it if it supports it
or by passing graphs through a forward chaining
inference engine on save (this is documented in
the ORDF docs). Right now that isn't used at all.
But it could be. What is used, and is proving to
not work well at all due to CPU usage problems is
owl:sameAs processing in the special case that a
graph with a particular name doesn't exist, we
try to infer statements that would be in it if
it did and return that. Turns out with even a
moderate number of sameAs statements the queries
take unacceptably long to return.

Aspect oriented. What I mean by this is kind of
a cross between duck-typing and OO. If you have
an object, you ask it for an "aspect" that
implements a particular interface without caring
about its class. This is an idea used a lot in
twisted python but not fleshed out well in our
codebase. So the way I'm imagining this working
is that say you have Person and Artist which 
are in a super/subclass relationship as far as
RDFS is concerned. In the python, I would just 
have them both inherit directly from DomainObject.
Then I would do Person.get_by_uri(uri) or 
Artist.get_by_uri(uri) according to whether I
am interested in their Person or their Artist
aspect. I wouldn't try to mirror the RDFS class
hierarchy in python for a few reasons. First doing
it by hand means keeping things in sync in two
places which is tedious and error-prone, second
trying to automatically generate python classes
by introspection on the data is tricky, and
third, if we were to do introspection where would
methods come from -- it is primarily specialised
methods that python is useful for.

Cheers,
-w

-- 
William Waites                <mailto:ww at styx.org>
http://river.styx.org/ww/        <sip:ww at styx.org>
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

----- End forwarded message -----