[okfn-help] Informal progress report

Mon Jun 7 14:02:50 BST 2010

On 10-06-05 17:39, Graham Higgins wrote:
> >  The primary way we've been using FuXi with clinical data is
> > to forward chain rdflib graphs in memory, persist the results and send
> > queries using derived terms against the dataset that are simply
> > evaluated directly.  This is the materialized view approach to an RDF
> > warehouse.

FWIW this is the same approach that we are taking. I'm not convinced
about back-chaining on largeish datasets (or the suitability of any of
the rdflib back-ends for this).

The main reason for using fourstore is that it handles large datasets
better than any of other free software triplestores.

> > So, recently I've been trying to make the re-writing strategy less
> > naive by updating it so it doesn't solve triple patterns one at a time
> > but instead rolls up re-written queries into as large a form as is
> > possible before sending them off.  

This seems like a promising approach. Given an rdflib back-end
that uses a remote SPARQL endpoint, if the queries can be rewritten
in such a way that they are efficient and effectively implement
the inferencing rules it might work well and remove the need for
production rules.

> Following the new rdflib 3.0.0 destructions (and using rdflib 3.0.0),
> I plugged in the SPARQL functionality and that's where I am at the
> moment, in the middle of chasing down and replacing the remaining
> extant fourstore bindings and replacing them throughout ORDF with
> bindings to rdflib (in order to avoid the results being returned in
> the form of a py4s object) 

Mind that py4s is compatible with rdflib. If you simply use the
store.query() method you'll get back exactly the same thing as
you would with rdflib's SPARQL. The only reason for the cursor
stuff in the code is because it allows you to get at information
that can't easily be exposed in a way that is compatible with
rdflib -- namely warnings from fourstore, particularly the
"soft limit reached, returning xxx results, increasing soft limit
may yield more" which is a useful thing to know.

> However, it strikes me that this perhaps rather regrettably precludes
> any simple mechanism for offering the ORDF-using developer a
> config-switchable choice of approach (rdflib vs 4store) which may, or
> may not, be a Bad Thing(tm).

The intent is to have it configurable, I guess there are two
alternatives. One is changing the existing SPARQL view so
that it just uses the common rdflib api the other is to have two
flavours of the SPARQL view. I favour the latter but not very
strongly.

> I presume that the ultimate intention is/was to make the rdflib vs
> 4store choice switchable at config level or is that an unwarranted
> assumption?

Yes it is/was.

> Perhaps one way to obviate the issue entirely would be to follow up on
> Niklas' recommendation of adding a ticket (and subsequently perhaps, a
> patch+tests) for the optional store-provided query engine [6] - and in
> that way migrate the rdflib-4store choice mechanism out of ORDF and
> into rdflib itself. Was it just time constraints that prevented you
> from following up on that one Will, or is there some other factor in
> play of which I am unaware?

It turned out to be harder than it appeared. It's easy enough to
implement "if the store supports sparql use that, otherwise use
the rdflib one" in rdflib's Graph.query. The problem is scope.
If you have a graph with an identifier you'll expect the query
to only return triples in that graph. If you use the 4store
SPARQL you end up having to rewrite

SELECT * WHERE { ?a ?b ?c }

to

SELECT * WHERE { GRAPH <x> { ?a ?b ?c } }

and this is hard to to in a general way.

> On a slightly different tack --- I hope you'll forgive me if, as a
> personal side-task, I re-organise and add more substance to the ORDF
> documentation. I'm really impressed with the approach taken by ORDF
> and I'm quite keen to create a Shabti template that delivers a
> ready-to-rock, ORDF-enabled Pylons app, along with some richly
> detailed documentation that explains the workings and how to get the
> best out of ORDF.

That's brilliant!

Cheers,
-w

-- 
William Waites           <william.waites at okfn.org>
Mob: +44 789 798 9965    Open Knowledge Foundation
Fax: +44 131 464 4948                Edinburgh, UK