[okfn-help] ORDF ticket #53: work with vanilla rdflib, remove 4store dependencies.
Graham Higgins
gjh at bel-epa.com
Tue Jun 15 13:05:06 BST 2010
We have an existing ticket to "make semantic + ordf work with vanilla
rdflib and remove the 4store dependencies":
<http://knowledgeforge.net/pdw/trac/ticket/53>
I have marked this as 'wontfix', at least for the present, as i) the
change promises to prove costly in terms of providing support to users
and ii) may well introduce instability in ORDF.
The main issue is the SPARQL support for rdflib. An ability to pose
SPARQL queries of an RDF store is critical to ORDF's functioning.
In rdflib, SPARQL support was originally provided (c2005) by a pure
Python implementation, authored by Ivan Herrman. This implementation
was superseded about a year later by (a more complete?) implementation
developed by Chimezie Ogbuji that used C/bison to compile the queries,
speeding re-use.
Over time, the C compilation requirements of this latter
implementation have persistently presented problems to users trying to
install rdflib and (not least) to the rdflib development team in terms
of providing support.
In the very latest release of rdflib (3.0.0, 05/2010) SPARQL support
has been removed entirely from the core rdflib library and migrated to
a new, independent, "rdfextras" package [1]. In this new package, the
original 2005 vintage pure Python implementation has been reinstated
and the C/bison-using implementation has not yet been formally
included (I have been trying to lay some groundwork for this to take
place [2])
There are important ramifications stemming from this situation:
Some existing rdflib 2.4.2 SPARQL issues have been marked as "wontfix"
in favour of rdflib 3.0. These include one that has significant impact
on the execution time of SPARQL queries, logged as "querying a tiny
Sleepycat db takes 30 seconds". I have been able to confirm that this
issue also affects the pure Python implementation in rdfextras in that
the provided sample "problematic" SPARQL query continues to take
between 25s and 80s to execute (on a variety of back-end stores). I
have raised a ticket to this effect for the rdfextras package [3].
If we took the questionable step of standardising on rdflib < 3.0 (as
the last stable release supporting a complete SPARQL implementation),
we would still be facing the long-standing difficulties that users can
meet in compiling the C-based part of the SPARQL implementation. In
effect, we'd be swapping one set of compilation problems for another.
Taking the more supportable step of standardising on rdflib >= 3.0.0
doesn't get us out of the woods, however. There is reasonably clear
evidence that, at the moment, the rdfextras package is a work-in-
progress. For example, the source defines a class 'NotImplemented'
that takes two args: 'code' and 'msg'. However, elsewhere in same
source code file, the class is instantiated with just one arg, the
'msg'. This is reasonably trivial to fix but, in conjunction with the
fact that the tests for the SPARQL implementation nearly all fail, it
does indicate that a move to rdflib 3.0.0 would require a significant
amount of work on our part to ensure that the ORDF library continued
to function at its current level.
I shall be keeping an eye on rdflib 3.0/rdfextras as work is
continuing and I anticipate that the situation will change for the
better in the short-term future. But for now, it seems sensible to
mark the ticket as 'wontfix'.
Comments, questions, observations are always welcome.
[1] http://code.google.com/p/rdfextras/
[2] http://bitbucket.org/gjhiggins/rdfextras
[3] http://code.google.com/p/rdfextras/issues/detail?id=2
--
Cheers,
Graham Higgins
http://www.linkedin.com/in/ghiggins
More information about the okfn-help
mailing list