[ckan-dev] R: Dpm and DataDeck rewriting. What about ckanclient?

Rufus Pollock rufus.pollock at okfn.org
Fri Feb 15 16:57:03 UTC 2013


On 15 February 2013 13:36, Daniel Graziotin <dgraziotin at task3.cc> wrote:
> Hello Sean and Rufus,
>> Date: Fri, 15 Feb 2013 11:02:46 +0100
>> From: Sean Hammond <sean.hammond at okfn.org>

[...]

>> Date: Fri, 15 Feb 2013 11:37:22 +0000
>> From: Rufus Pollock <rufus.pollock at okfn.org>
>>
>> The development hasn't really stopped but taken another form. The simple data
>> package route has in some sense split out from CKAN and exists as specs such
>> as:
>>
>> http://www.dataprotocols.org/en/latest/data-packages.html
>>
>> And as a new mini-project: http://datasets.okfnlabs.org/
>>
>> This approach is completely complementary to CKAN - the data packages spec is
>> directly based on the CKAN JSON structure and there can be reuse from both
>> directions (e.g. use CKAN as the catalog or datastore, while CKAN may be able
>> to utilize some of the pure data packages tooling and approach ...)
>>
>> For an intro to the idea, see this slide deck: http://bit.ly/datasets-slides
>>
>
> Whoa, Rufus (nice to hear back from you).
> You are always involved in crazy stuff. Cheers for all the material.
> I am more familiar with the datapackage.json format than the dataset format employed by CKAN (e.g. http://datahub.io/api/rest/dataset/osm) to be honest.
>
> Why is there a difference between a CKAN package/dataset and a dpm dapata package?
> Why aren't both project using the same data format/structure?

I assume you are referring to the datapackage.json file (as opposed to
the other structure specified in the spec).

The datapackage.json spec and the CKAN JSON are almost identical. The
only differences are:

* Some minor normalizations of metadata - CKAN has accreted some
things over time, or is missing (currently) some things one might need
(e.g. multiple source fields). We also have a tendency for CKAN to
serialize flat (from its DB) which isn't always the optimal way for a
JSON format

* Files rather than resources as field for data. This is a really
tough one. The point is datapackage.json has a scripts field too. I
put resources into CKAN and also had the idea along with others (such
as Richard Cyganiak) of expanding it beyond just "data". Looking back
this was a mistake. I think, the "related" approach we have now is
right and "resources" should just be data files. But for backwards
compatability we probably need to keep the resources naming in CKAN
for now. One could argue we should adopt this in datapackage.json for
convenience but I think "resources" is a poor term because it is too
generic ...

Note that the datapackage spec and CKAN have co-evolved together since 2007 ...

[...]
>> I think you should use the actual ckanclient if you need a python API to ckan - it's
>> fully functional right now and would not need much work if improvements are
>> needed.
>>
>
> I see a ckanclient rewriting / enhancement as the creation of "libckanclient" for Python.
> On top of that well-made library there might come sdpm (or whatever name it would have) and then sdpm gui.
> I still think there is value in obtaining datasets through a command line or GUI.

I *very* much agree!

> On the other hand, I am interested in this difference between CKAN datasets/packages and dpm data packages.
>
> What if this libckanclient returned Python objects following datapackage.json and metadata specs?
> Could this be of any help for both projects?  This is just an idea.

I think standardizing on datapackage.json on disk but communicating
with the CKAN API would be very nice ...

Rufus




More information about the ckan-dev mailing list