[ckan-dev] publisher model
Richard Cyganiak
richard at cyganiak.de
Tue Apr 26 20:41:43 UTC 2011
Hi Seb,
The workbench/workshop metaphor doesn't work for me at all. The first-class items on CKAN should be, broadly speaking, some sort of collections of data. A workbench/workshop, on the other hand, is a tool for manipulating these first-class items. It is something where I take my dataset, do some stuff with it, and bring it back in some modified form (at which point it may still be the same or a new artifact -- that's a separate question). If I were a new visitor of ckan.net and heard that it's about workshops/workbenches for data, I'd totally get the wrong idea. It's kind of confusing the tools and their inputs/outputs, IMHO.
Best,
Richard
On 26 Apr 2011, at 15:19, Seb Bacon wrote:
> Hi,
>
> On 26 April 2011 10:02, Friedrich Lindenberg
> <friedrich.lindenberg at okfn.org> wrote:
>> Hi Seb,
>>
>> On Fri, Apr 22, 2011 at 11:39 AM, Seb Bacon <seb.bacon at okfn.org> wrote:
>>> Something I meant to follow up with you was your dream about sorting
>>> the publisher model.
>>>
>>> I'm not entirely clear what you mean by this. Could you explain?
>>
>> By now, I'm actually a proponent of "going Github": have a single
>> domain entity "Publisher" from which both users and institutional
>> publishers are derived and a n-1 relation between datasets and
>> publishers
>
> So publisher has many datasets, and a dataset has only one publisher.
> I presume by "going Github" you mean to preserve the provenance of a
> dataset through maintaining a graph of publishers or datasets?
>
> <snip>
>> I think at the moment, CKAN embodies a false notion of shared
>> ownership which is neither true for institutional environments, nor
>> for data wranglers (listen to what we actually say: "I have this
>> dataset that I worked on"). We want CKAN to become a public data
>> workbench, but at the same time workbenches are things that are very
>> specific to their owners (everything else is an assembly line).
>
> Yes. I suspect we may have to change the terminology here:
>
> (1) "Publisher" has several different and specific meanings depending
> on the metadata standard or other context. Perhaps we could invent
> our own term to disambiguate, e.g. "Foundry" or "Workshop" (following
> your workbench analogy). (We also need to preserve the original
> author somehow; which term do we currently use for this? Author?)
>
> (2) To me, "Dataset" has some implication of a package of resources
> that were originally released together, somehow -- it implies intent
> by the original author. I understand that you are talking about a
> collection of resources for the purposes of data wrangling;
> personally, I like "Workbench" for this.
>
> So we could have something like:
>
> - Resource: a CSV file or TXT file or similar
> - e.g. lat/lon of fire incidents in England
> - Workbench: a collection of Resources which a user has gathered
> together to answer some data question
> - example question: "what are the top five administrative areas in
> the UK for fire incidents?"
> - example resources:
> - lat/lon of fire incidents in England / Wales / Scotland /
> Northern Ireland
> - UK local administrative boundary shapefiles
> - UK local administrative area names
> - Workshop: corresponds to a user account or a institutional account
> - e.g. "UK Cabinet Office" or "Joe Smith"
>
> Question: for an institutional user, would a primary source release
> (e.g. http://data.gov.uk/dataset/financial-transactions-data-whittington-nhs-trust)
> still be a Workbench, albeit a specially flagged one?
>
>> (this starts to make sense with resources that are
>> independent of datasets, so my dataset and your dataset may share a
>> resource; plus it makes authz a lot simpler).
>
> I believe there's general agreement that this is the right direction.
>
> Exactly how does it make authz simpler? Something like: a Resource
> and a Workbench would only have one owner (Workshop), and people would
> fork Workbenches or make brand new ones if they wanted to edit them?
>
> We also, of course, need to preserve some notions of authz groups,
> etc. For example, institutional environments I've worked with want to
> be able to assert some of the following statements:
>
> - The official originator of this data is Foo Department
> - Only Sue and Fred of Foo Department can change this data
>
> If we moved to a workshop / workbench type model, which are
> collections of Resources as first-class citizens
>
>> Hope this makes some sense,
>
> I think so -- does my (re)interpretation above match your sense?
>
>> [OT]
>> re data wrangling:
>>
>> https://bitbucket.org/pudo/iati/src
>> https://bitbucket.org/okfn/ukgov-25k-spending/src
>
> These are very good use cases for the kinds of data wrangling we want
> users to be able to do easily, I think.
>
> Seb
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
More information about the ckan-dev
mailing list