[ckan-dev] publisher model

Tue Apr 26 20:41:43 UTC 2011

Hi Seb,

The workbench/workshop metaphor doesn't work for me at all. The first-class items on CKAN should be, broadly speaking, some sort of collections of data. A workbench/workshop, on the other hand, is a tool for manipulating these first-class items. It is something where I take my dataset, do some stuff with it, and bring it back in some modified form (at which point it may still be the same or a new artifact -- that's a separate question). If I were a new visitor of ckan.net and heard that it's about workshops/workbenches for data, I'd totally get the wrong idea. It's kind of confusing the tools and their inputs/outputs, IMHO.

Best,
Richard

On 26 Apr 2011, at 15:19, Seb Bacon wrote:

> Hi,
> 
> On 26 April 2011 10:02, Friedrich Lindenberg
> <friedrich.lindenberg at okfn.org> wrote:
>> Hi Seb,
>> 
>> On Fri, Apr 22, 2011 at 11:39 AM, Seb Bacon <seb.bacon at okfn.org> wrote:
>>> Something I meant to follow up with you was your dream about sorting
>>> the publisher model.
>>> 
>>> I'm not entirely clear what you mean by this.  Could you explain?
>> 
>> By now, I'm actually a proponent of "going Github": have a single
>> domain entity "Publisher" from which both users and institutional
>> publishers are derived and a n-1 relation between datasets and
>> publishers
> 
> So publisher has many datasets, and a dataset has only one publisher.
> I presume by "going Github" you mean to preserve the provenance of a
> dataset through maintaining a graph of publishers or datasets?
> 
> <snip>
>> I think at the moment, CKAN embodies a false notion of shared
>> ownership which is neither true for institutional environments, nor
>> for data wranglers (listen to what we actually say: "I have this
>> dataset that I worked on"). We want CKAN to become a public data
>> workbench, but at the same time workbenches are things that are very
>> specific to their owners (everything else is an assembly line).
> 
> Yes.  I suspect we may have to change the terminology here:
> 
> (1) "Publisher" has several different and specific meanings depending
> on the metadata standard or other context.  Perhaps we could invent
> our own term to disambiguate, e.g. "Foundry" or "Workshop" (following
> your workbench analogy).  (We also need to preserve the original
> author somehow; which term do we currently use for this?  Author?)
> 
> (2) To me, "Dataset" has some implication of a package of resources
> that were originally released together, somehow -- it implies intent
> by the original author.  I understand that you are talking about a
> collection of resources for the purposes of data wrangling;
> personally, I like "Workbench" for this.
> 
> So we could have something like:
> 
> - Resource: a CSV file or TXT file or similar
>   - e.g. lat/lon of fire incidents in England
> - Workbench: a collection of Resources which a user has gathered
> together to answer some data question
>   - example question: "what are the top five administrative areas in
> the UK for fire incidents?"
>   - example resources:
>      - lat/lon of fire incidents in England / Wales / Scotland /
> Northern Ireland
>      - UK local administrative boundary shapefiles
>      - UK local administrative area names
> - Workshop: corresponds to a user account or a institutional account
>   - e.g. "UK Cabinet Office" or "Joe Smith"
> 
> Question: for an institutional user, would a primary source release
> (e.g. http://data.gov.uk/dataset/financial-transactions-data-whittington-nhs-trust)
> still be a Workbench, albeit a specially flagged one?
> 
>> (this starts to make sense with resources that are
>> independent of datasets, so my dataset and your dataset may share a
>> resource; plus it makes authz a lot simpler).
> 
> I believe there's general agreement that this is the right direction.
> 
> Exactly how does it make authz simpler?  Something like: a Resource
> and a Workbench would only have one owner (Workshop), and people would
> fork Workbenches or make brand new ones if they wanted to edit them?
> 
> We also, of course, need to preserve some notions of authz groups,
> etc.  For example, institutional environments I've worked with want to
> be able to assert some of the following statements:
> 
> - The official originator of this data is Foo Department
> - Only Sue and Fred of Foo Department can change this data
> 
> If we moved to a workshop / workbench type model, which are
> collections of  Resources as first-class citizens
> 
>> Hope this makes some sense,
> 
> I think so -- does my (re)interpretation above match your sense?
> 
>> [OT]
>> re data wrangling:
>> 
>> https://bitbucket.org/pudo/iati/src
>> https://bitbucket.org/okfn/ukgov-25k-spending/src
> 
> These are very good use cases for the kinds of data wrangling we want
> users to be able to do easily, I think.
> 
> Seb
> 
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev