[ckan-dev] Create Package programmatically

Adrià Mercader adria.mercader at okfn.org
Wed Mar 13 10:59:07 UTC 2013


Hi Ryan and Peder,

I'll try to answer all comments, see below.
Note that the release-v2.0 branches of both harvest and spatial are
still WIP, so any feedback is much appreciated.



On 11 March 2013 17:59, Ryan Clark <ryan.clark at azgs.az.gov> wrote:
> My first harvest attempt gathered IDs, but failed validation as soon as
> fetching started: lxml had trouble parsing the schemas. What versions of
> lxml - libxml2 - libxslt do I need to have installed? libxml2 2.9.0 I
> assume? Setting that up on OSX is difficult.
IIRC some of the featurs on the new harvesters required a newer
version of lxml that needed to be installed manually. I didn't make
this change so I don't know all the details, but here are some docs:

https://github.com/okfn/ckanext-spatial#installing-libxml2


> I wonder how the pycsw integration effort is going? I worry a bit about the
> "lightweight" csw service that is built at
> https://github.com/okfn/ckanext-spatial/blob/release-v2.0/ckanext/spatial/controllers/csw.py.
> I built pycsw integration into my own custom extension last week before I
> knew that ckanext-spatial had developed as far as it has. Is there any way
> that I can help bring pycsw into the spatial extension more quickly?
The "lightweight" service you mentioned is certainly very limited, In
terms of pycsw integration we plotted a plan a while ago with the
pycsw guys and we basically have an implementation plan. Unfortunately
it slipped of the first batch of features that we needed to delivered
for geo.data.gov so we haven't been able to spend time with it yet. It
is definitely in our roadmap and we should start working on it when we
get our current stuff out of the way.
If you could share how did you do it that would be really helpful to
see if is similar (or better) to our approach.



On 11 March 2013 19:00, Peder Jakobsen <pjakobsen at gmail.com> wrote:
> According to the ckanext-spatial docs:
>
> "These harvesters were are designed to harvest metadata records in the
> GEMINI2 format, which is an XML spatial metadata format very similar to
> ISO19139. This was developed for the UK Location Programme and GEMINI2, but
> it would be simple to adapt them for other INSPIRE or ISO19139-based
> metadata"
>
> Is it still necessary to tweak the code to harvest plain ISO19139, or has
> this already been done (with GEMINI2 just adding a few extra fields)?
>
No, sorry. These docs need update. You should be able to harvest any
ISO19139 document without any change on the release-v2.0
We are planning to do a big documentation effort before the actual 2.0
release, addressing all these out of date points.



On 11 March 2013 19:18, Peder Jakobsen <pjakobsen at gmail.com> wrote:
>
> Any examples of what a configuration object might look like for harvesting
> Spatial ISO 19139, or any other settings required to make this work?
>
You won't need any special configuration object on the harvest source.
Set up ckanext-harvest and spatial following the install docs or
Ryan's notes and add these plugins to your ini file:

ckan.plugins = harvest spatial_metadata spatial_query csw_harvester

If you get stuck at a particular point let us know and we'll try to help



On 12 March 2013 18:03, Ryan Clark <ryan.clark at azgs.az.gov> wrote:
> I tried the release-v2.0 on ckanext-harvest and ckanext-spatial yesterday.
> There is a little difference in the aims, I think, between what I want to do
> and what OKFN goals are. To be clear, I want to:
>
> Create packages using the web-interface.
> Harvest records from other CSW servers - ingest ISO XML to build packages
> All packages (harvested and created) can be edited
> All packages are exposed via CSW as ISO XML -- built through something like
> a Package.to_iso_xml() kind of method.

> I think where my situation diverges is that all packages can be edited, and
> even those that were not harvested are exposed via CSW. It looks to me like
> harvested packages cannot be edited, and it still looks like the CSW only
> exposes harvested records and not manually generated packages.
There is a divergence in one point but not the other one.
Harvested packages are normal packages, exactly the same as the ones
created via the UI. If they can not be edited it must be an
authorization issue (or a bug).
By default, harvested packages are created via an internal sysadmin
user and assigned to the organization their harvest source belongs to.
If you are logged in as a sysadmin on the web interface, can you edit
the harvested datasets?
Note than in CKAN 2.0, authorization is based around organizations and
datasets inherit the authorization settings from the organization they
belong to, so all users of an organization (with editor role) should
be able to edit all datasets from this organization.

You are right though in that the CSW only exposes those datasets that
were harvested, and for the first stage of the pycsw integration this
will still be the case. The reason is because is trickier to ensure
that datasets created via the frontend have all the necessary fields
to generate a valid iso 19139. We'd need a custom form, valiadators
etc and created the actual XML doc. It is definitely possible, only it
requires quite a lot of work and thought.


> The harvesting seems really solid, and if I'm not mistaken it looks like
> there's configurability in what get mined out of XML and stored as
> package.extras. This is awesome.
You can extend the CSW harvester with a custom get_package_dict method
to modify the dictionary that will be used to create the dataset, ie
to add custom extras, tags, etc

What I'd still love to see though is:
>
> pycsw used for the CSW server
We'll get there eventually
> ALL packages exposed through the CSW
See above for why this may take a while

> I've written code to accomplish these two things, and would love to see it
> worked into the OKFN extensions, if the functionality seems relevant.
We'd love to have a look it and see if it can help


Hope all this helps,

Adrià




More information about the ckan-dev mailing list