[okfn-help] catalogue rdf redux...

William Waites william.waites at okfn.org
Fri Jul 16 10:51:52 BST 2010


On 10-07-16 09:12, Rob Styles wrote:
> Hi William,
>
> I've had a few moments over the past week to download and get this running.
>
> Thank you for accurate instructions that seem to have worked without a
> hitch. I changed the hmg.ini to work with the new parsers and
> passwords and so on. I had a 2 minute upset early on as I didn't have
> python-devel installed, but that was easily worked out.
>   

I'm very happy you had so easy a time installing it!

> I'll hopefully have time over the coming week to take a look at the
> output and see what I can do to help. Any pointers to the structure of
> the code would be great, otherwise I'll just dive in with grep.
>   

(note, I've just made a minor tweak to the ckanrdf package,
adding some trailing slashes to some default URIs -- which
shouldn't affect the data.gov.uk config -- so you may want
to do an "hg pull" in there)

The main part of the script is in ckanrdf/command.py and
it is a fairly dumb loop that just lists packages using the API
and then iterates through them (or just processes whatever
was requested on the command line).

The implementations of e.g. changesets and provenance is
in the ORDF library, documentation is at http://ordf.org/doc/
Storage and indexes are also there (cf. ordf.readers and
ordf.writers in the config file). The config you are working
with uses pretty much the simplest case, just the "native"
rdflib triplestore. It can use 4store, do full-text indexing,
production rule inferencing, etc. as well.

The ordf(1) command might be useful to you as well, it can
read and write RDF from the storage back-end directly,
but of course doesn't itself know anything about ckan or
datasets or anything.

I think the ideal thing to do would actually be to write a
storage back-end (HandlerPlugin subclass) that will save
things to a Talis store directly -- so no need to do dumps
and imports. The API that needs to be supported is very
simple, basically two methods, get() and put() and there
are, I believe, two python modules for talking to your
stuff. This should be pretty straightforward.

Not sure when you last updated ordf, but I would suggest
doing "pip install -e hg+http://ordf.org/src/#egg=ordf"
(just say "switch" when it asks, as the repository URL has
changed, just an HTTP redirect so the old one will work
too) since I've done some  significant work on it in the past
while -- particularly useful might be the trig serializer.

Cheers,
-w

-- 
William Waites           <william.waites at okfn.org>
Mob: +44 789 798 9965    Open Knowledge Foundation
Fax: +44 131 464 4948                Edinburgh, UK

RDF Indexing, Clustering and Inferencing in Python
		http://ordf.org/



More information about the okfn-help mailing list