[ckan-discuss] CKAN packages containing packages query

David Read david.read at okfn.org
Fri Jul 15 14:06:04 BST 2011


On 15 July 2011 07:32, Jo Walsh <jo.walsh at ed.ac.uk> wrote:
> Wrote some ScraperWiki scrapers for our Datashare repository.
> No RDF yet, hoping to do that with OpenOrg Grinder quite soon.
> However this is enough to result in questions...
>
> http://ckanvm.inf.ed.ac.uk/package/edinburgh-datashare
>
> Are there recommendations for what to do with CKAN packages containing other
> packages? For example. There is a list here of the packages in Edinburgh's
> research data repository (yes, all 12 of them).
>
> http://scraperwiki.com/scrapers/dspace_package_metadata/
> This won't work with ERA as it has a different table-based layout.
> Thinking about turning the scraper into a command-line module and ensuring
> it works with more DSpaces.
>
> Would like to make a CKAN package per dataset in Datashare - adding extra
> metadata about downloadable files, links to papers, websites etc. We *could
> think about* a bridge between CKAN and DSpace (so one can post a package to
> DSpace storage by entering it in the CKAN - issues here with multi-hop
> login/authentication (EASE, email registration for Datashare) - may be
> unrealistic to hope to resolve this in the next two weeks.)

It is good to see the Edinburgh Data Catalogue is being scraped. A
CKAN harvester could be easily written to read the ScraperWiki data,
adding the datasets as packages into the Edinburgh CKAN and be kept up
to date. (I think this is preferable to copying the scraper code into
our harvester, despite the increased number of moving parts to
maintain, since ScraperWiki scrapers are easy to maintain and we could
standardise our harvester interface with ScraperWiki.)

This would be useful first step for Edinburgh, but as far as the
following ideas you outline, such as the two-way bridge CKAN to
DSpace, I'm less sure. The down-side of harvesting (as I understand
it) is that it is one-way, leaving a read-only copy. @Adria is this
correct?

We previously had a prototype DVCS-style system that covers these use
cases - allowing package edits on any CKAN instance and
round-tripping. The system worked behind the scenes, but we paused at
the stage of doing the UI, as it was difficult to explain differences
in copies between different CKAN instances and resolving conflicts etc
- the usual git/mercurial workflow issues. Thoughts on this area are
most welcome.

> So Datashare is a package (?) which in turn contains other packages.
> - thoughts on customising the UI to show 'higher-level' but not
> 'lower-level' packages? some ckanjs thing needed?

You could use the tags Rufus suggests and facet on it. e.g. you could
have a page on your site that does the search for just 'catalogue
packages' or one that does 'dataset packages', if the distinction
merited UI for it. You could customise this in normal CKAN or ckanjs
if that's what you're using.

> - thoughts on structuring containment relations in package metadata?

It seems a good use of the Package Relationships that Rufus mentions
- "parent_of" / "child_of". The ckanclient doesn't cover them yet, but
would not be hard to provide it. See the API docs:
http://packages.python.org/ckan/api/version2.html#model-resources

When you add a relationship between two packages, a link between them
will be provided when you view either package in the web UI. This will
work well for your 12 packages, automatically created.

Bear in mind that we don't yet have a web interface for editing the
Package Relationships (just the API). And the UI yet doesn't handle
large quantities of relationships satisfactorially yet - it's a bit
chicken and egg. So if you start creating lots, I'm sure we can
improve the UI for them.

> - thoughts about publications from a much larger DSpace repository - they
> don't need to be individual packages but we should index them.

For longer lists of publications, if you just want to import the title
and a link then a resource per publication suffices. But I'd have
thought adding subject tags, author info etc. is most valuable to a
publication, and so a package would be more suitable. And this is not
difficult if the import is automatic.

David

>
>
>
>
> --
> Jo Walsh
>
> Unlock places - http://unlock.edina.ac.uk/
> phone: +44 (0)131 650 2973
> skype: metazool
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>



More information about the ckan-discuss mailing list