[okfn-help] ORDF ticket #53: work with vanilla rdflib, remove 4store dependencies.

Graham Higgins gjh at bel-epa.com
Tue Jun 15 12:05:06 UTC 2010


We have an existing ticket to "make semantic + ordf work with vanilla  
rdflib and remove the 4store dependencies":

<http://knowledgeforge.net/pdw/trac/ticket/53>

I have marked this as 'wontfix', at least for the present, as i) the  
change promises to prove costly in terms of providing support to users  
and ii) may well introduce instability in ORDF.

The main issue is the SPARQL support for rdflib. An ability to pose  
SPARQL queries of an RDF store is critical to ORDF's functioning.

In rdflib, SPARQL support was originally provided (c2005) by a pure  
Python implementation, authored by Ivan Herrman. This implementation  
was superseded about a year later by (a more complete?) implementation  
developed by Chimezie Ogbuji that used C/bison to compile the queries,  
speeding re-use.

Over time, the C compilation requirements of this latter  
implementation have persistently presented problems to users trying to  
install rdflib and (not least) to the rdflib development team in terms  
of providing support.

In the very latest release of rdflib (3.0.0, 05/2010) SPARQL support  
has been removed entirely from the core rdflib library and migrated to  
a new, independent, "rdfextras" package [1]. In this new package, the  
original 2005 vintage pure Python implementation has been reinstated  
and the C/bison-using implementation has not yet been formally  
included (I have been trying to lay some groundwork for this to take  
place [2])

There are important ramifications stemming from this situation:

Some existing rdflib 2.4.2 SPARQL issues have been marked as "wontfix"  
in favour of rdflib 3.0. These include one that has significant impact  
on the execution time of SPARQL queries, logged as "querying a tiny  
Sleepycat db takes 30 seconds". I have been able to confirm that this  
issue also affects the pure Python implementation in rdfextras in that  
the provided sample "problematic" SPARQL query continues to take  
between 25s and 80s to execute (on a variety of back-end stores). I  
have raised a ticket to this effect for the rdfextras package [3].

If we took the questionable step of standardising on rdflib < 3.0 (as  
the last stable release supporting a complete SPARQL implementation),  
we would still be facing the long-standing difficulties that users can  
meet in compiling the C-based part of the SPARQL implementation. In  
effect, we'd be swapping one set of compilation problems for another.

Taking the more supportable step of standardising on rdflib >= 3.0.0  
doesn't get us out of the woods, however. There is reasonably clear  
evidence that, at the moment, the rdfextras package is a work-in- 
progress. For example, the source defines a class 'NotImplemented'  
that takes two args: 'code' and 'msg'. However, elsewhere in same  
source code file, the class is instantiated with just one arg, the  
'msg'. This is reasonably trivial to fix but, in conjunction with the  
fact that the tests for the SPARQL implementation nearly all fail, it  
does indicate that a move to rdflib 3.0.0 would require a significant  
amount of work on our part to ensure that the ORDF library continued  
to function at its current level.

I shall be keeping an eye on rdflib 3.0/rdfextras as work is  
continuing and I anticipate that the situation will change for the  
better in the short-term future. But for now, it seems sensible to  
mark the ticket as 'wontfix'.

Comments, questions, observations are always welcome.

[1] http://code.google.com/p/rdfextras/
[2] http://bitbucket.org/gjhiggins/rdfextras
[3] http://code.google.com/p/rdfextras/issues/detail?id=2

-- 
Cheers,

Graham Higgins

http://www.linkedin.com/in/ghiggins










More information about the okfn-help mailing list