[ckan-discuss] Harvesting Dublin Core documents
John Bywater
john.bywater at appropriatesoftware.net
Wed Nov 24 16:53:56 GMT 2010
John Bywater wrote:
> William Waites wrote:
>> Sound reasonable?
>
> Sounds very reasonable. ;-)
Narrowing this down, CKAN has the following method:
def read_values(self):
if "gmd:MD_Metadata" in self.content:
gemini_document = GeminiDocument(self.content)
else:
raise HarvesterError, "Can't identify type of document
content: %s" % self.content
return gemini_document.read_values()
I would like to adjust that method, to do something like:
def read_values(self):
document_class = self.get_document_class()
document = document_class(self.content)
return document.read_values()
def get_document_class(self):
if self.is_gemini_content():
document_class = GeminiDocument
elif self.is_rdf_content():
document_class = RdfDocument
else:
raise HarvesterError, "Can't identify document class from
content: %s" % self.content
return document_class
That is, it would be very useful to have a class that is constructed
with an RDF string, which returns a CKAN Package dict from a
read_values() method. All the harvesting machinery would then work with RDF.
That class could process the RDF "programmatically either with SPARQL or
directly according to the library or bindings that you are using."
It could be used on either side of the API. CKAN's harvester talks to
the catalogue model via the presentation layer, CKAN's presentation
layer has CKAN Package dicts, and the CKAN API just exposes that
presentation model on the system boundary.
J.
More information about the ckan-discuss
mailing list