[ckan-dev] publisher model

Richard Cyganiak richard at cyganiak.de
Wed Apr 27 14:38:20 UTC 2011


Hi Seb,

On 27 Apr 2011, at 10:41, Seb Bacon wrote:
> Perhaps my working-name terminology needs expanding to cover this
> distinction.  We could consider a collection which has been made
> meaningful a "Data Story", which has a "Workbench" view for Data
> Wranglers.

“Data Story” appeals a bit more to me, because it puts the end product (metadata about data) into the foreground, and not the tools used to build them.

A part of the story is still missing from the picture you're painting: ckan.net is a wiki-like site. Many people can make small contributions towards making a good package. This, I think, is a useful and important part of the story.

I see a package as something a bit like a Wikipedia article: If well done, it's a carefully crafted and curated account that tells what one needs to know about a particular collection of resources. And while this collection of resources should have a primary source (a publisher from whom the data originated), several individuals may have contributed to shaping the package.

That's why I also don't really have a problem with having resources made by third parties attached to a package. The package comes from A, but B wrote some conversion tool, C uploaded the converted data somewhere, and D made a nice API for it available -- why not list those as resources for the same package?

Best,
Richard




> 
> Thus, a Data Story is primarily a collection of data which is *useful
> for the Visitor*; with supporting information, on the Workbench, of
> interest to Researchers and Wranglers, showing the collections it's
> derived from, and the transformations that were performed on those
> collections.  The more interested, geeky visitor would be able to
> click something analogous to the "fork me on github" ribbon in the
> Data Story, which would then take them to a Workbench for that Story.
> 
> Importantly, the idea of a Story which can be forked would then not
> just be for Data Wranglers but also for other, less technical users;
> at its simplest, a new Story is just a collection of Resources that
> make sense together in some way.  Right now, a Resource can only be in
> a single Package.
> 
> You talk about new visitors getting the wrong idea; I'd be really
> interested to understand what you think the right idea should be for
> such visitors?  Right now, I think they have No Idea :)
> 
> Thanks,
> 
> Seb
> 
> 
> 
> 
>> On 26 Apr 2011, at 15:19, Seb Bacon wrote:
>> 
>>> Hi,
>>> 
>>> On 26 April 2011 10:02, Friedrich Lindenberg
>>> <friedrich.lindenberg at okfn.org> wrote:
>>>> Hi Seb,
>>>> 
>>>> On Fri, Apr 22, 2011 at 11:39 AM, Seb Bacon <seb.bacon at okfn.org> wrote:
>>>>> Something I meant to follow up with you was your dream about sorting
>>>>> the publisher model.
>>>>> 
>>>>> I'm not entirely clear what you mean by this.  Could you explain?
>>>> 
>>>> By now, I'm actually a proponent of "going Github": have a single
>>>> domain entity "Publisher" from which both users and institutional
>>>> publishers are derived and a n-1 relation between datasets and
>>>> publishers
>>> 
>>> So publisher has many datasets, and a dataset has only one publisher.
>>> I presume by "going Github" you mean to preserve the provenance of a
>>> dataset through maintaining a graph of publishers or datasets?
>>> 
>>> <snip>
>>>> I think at the moment, CKAN embodies a false notion of shared
>>>> ownership which is neither true for institutional environments, nor
>>>> for data wranglers (listen to what we actually say: "I have this
>>>> dataset that I worked on"). We want CKAN to become a public data
>>>> workbench, but at the same time workbenches are things that are very
>>>> specific to their owners (everything else is an assembly line).
>>> 
>>> Yes.  I suspect we may have to change the terminology here:
>>> 
>>> (1) "Publisher" has several different and specific meanings depending
>>> on the metadata standard or other context.  Perhaps we could invent
>>> our own term to disambiguate, e.g. "Foundry" or "Workshop" (following
>>> your workbench analogy).  (We also need to preserve the original
>>> author somehow; which term do we currently use for this?  Author?)
>>> 
>>> (2) To me, "Dataset" has some implication of a package of resources
>>> that were originally released together, somehow -- it implies intent
>>> by the original author.  I understand that you are talking about a
>>> collection of resources for the purposes of data wrangling;
>>> personally, I like "Workbench" for this.
>>> 
>>> So we could have something like:
>>> 
>>> - Resource: a CSV file or TXT file or similar
>>>   - e.g. lat/lon of fire incidents in England
>>> - Workbench: a collection of Resources which a user has gathered
>>> together to answer some data question
>>>   - example question: "what are the top five administrative areas in
>>> the UK for fire incidents?"
>>>   - example resources:
>>>      - lat/lon of fire incidents in England / Wales / Scotland /
>>> Northern Ireland
>>>      - UK local administrative boundary shapefiles
>>>      - UK local administrative area names
>>> - Workshop: corresponds to a user account or a institutional account
>>>   - e.g. "UK Cabinet Office" or "Joe Smith"
>>> 
>>> Question: for an institutional user, would a primary source release
>>> (e.g. http://data.gov.uk/dataset/financial-transactions-data-whittington-nhs-trust)
>>> still be a Workbench, albeit a specially flagged one?
>>> 
>>>> (this starts to make sense with resources that are
>>>> independent of datasets, so my dataset and your dataset may share a
>>>> resource; plus it makes authz a lot simpler).
>>> 
>>> I believe there's general agreement that this is the right direction.
>>> 
>>> Exactly how does it make authz simpler?  Something like: a Resource
>>> and a Workbench would only have one owner (Workshop), and people would
>>> fork Workbenches or make brand new ones if they wanted to edit them?
>>> 
>>> We also, of course, need to preserve some notions of authz groups,
>>> etc.  For example, institutional environments I've worked with want to
>>> be able to assert some of the following statements:
>>> 
>>> - The official originator of this data is Foo Department
>>> - Only Sue and Fred of Foo Department can change this data
>>> 
>>> If we moved to a workshop / workbench type model, which are
>>> collections of  Resources as first-class citizens
>>> 
>>>> Hope this makes some sense,
>>> 
>>> I think so -- does my (re)interpretation above match your sense?
>>> 
>>>> [OT]
>>>> re data wrangling:
>>>> 
>>>> https://bitbucket.org/pudo/iati/src
>>>> https://bitbucket.org/okfn/ukgov-25k-spending/src
>>> 
>>> These are very good use cases for the kinds of data wrangling we want
>>> users to be able to do easily, I think.
>>> 
>>> Seb
>>> 
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>> 
>> 
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>> 
> 
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev





More information about the ckan-dev mailing list