[ckan-dev] publisher model

Wed Apr 27 17:08:20 UTC 2011

On 27 April 2011 15:38, Richard Cyganiak <richard at cyganiak.de> wrote:
> Hi Seb,
>
> On 27 Apr 2011, at 10:41, Seb Bacon wrote:
>> Perhaps my working-name terminology needs expanding to cover this
>> distinction.  We could consider a collection which has been made
>> meaningful a "Data Story", which has a "Workbench" view for Data
>> Wranglers.
>
> “Data Story” appeals a bit more to me, because it puts the end product (metadata about data) into the foreground, and not the tools used to build them.
>
> A part of the story is still missing from the picture you're painting: ckan.net is a wiki-like site. Many people can make small contributions towards making a good package. This, I think, is a useful and important part of the story.

I agree.  On the other hand, curating the data, wiki-style, is quite a
specialist role.  I feel we are too optimised for that case when it's
not very mainstream.

Also the majority institutional CKAN users don't want to allow lots of
people to edit the original metadata.

> I see a package as something a bit like a Wikipedia article: If well done, it's a carefully crafted and curated account that tells what one needs to know about a particular collection of resources. And while this collection of resources should have a primary source (a publisher from whom the data originated), several individuals may have contributed to shaping the package.
>
> That's why I also don't really have a problem with having resources made by third parties attached to a package. The package comes from A, but B wrote some conversion tool, C uploaded the converted data somewhere, and D made a nice API for it available -- why not list those as resources for the same package?

I agree, but also think we need to make it easy for users to put C's
resources in many different packages, and conversely to list all the
packages that C's resources are used in.  I suspect the
copy-and-edit-this-data-story workflow is a more common case than the
directly-edit-this-data-story, and therefore one that it would be
better to optimise for in the UI.  It allows very similar yet
importantly different packages to be grouped together -- a data
stories around "UK spending in the last 12 months", "Wales spending in
the last 12 months", "Wales spending over £10k in the last 12 months"
would all have their own pages yet have nearly identical sets of
resources.

The tools / Workbench part would be some easy way to take a resource
and just filter out, say, the Wales entries, to build the new Story.

Seb

>> Thus, a Data Story is primarily a collection of data which is *useful
>> for the Visitor*; with supporting information, on the Workbench, of
>> interest to Researchers and Wranglers, showing the collections it's
>> derived from, and the transformations that were performed on those
>> collections.  The more interested, geeky visitor would be able to
>> click something analogous to the "fork me on github" ribbon in the
>> Data Story, which would then take them to a Workbench for that Story.
>>
>> Importantly, the idea of a Story which can be forked would then not
>> just be for Data Wranglers but also for other, less technical users;
>> at its simplest, a new Story is just a collection of Resources that
>> make sense together in some way.  Right now, a Resource can only be in
>> a single Package.
>>
>> You talk about new visitors getting the wrong idea; I'd be really
>> interested to understand what you think the right idea should be for
>> such visitors?  Right now, I think they have No Idea :)
>>
>> Thanks,
>>
>> Seb
>>
>>
>>
>>
>>> On 26 Apr 2011, at 15:19, Seb Bacon wrote:
>>>
>>>> Hi,
>>>>
>>>> On 26 April 2011 10:02, Friedrich Lindenberg
>>>> <friedrich.lindenberg at okfn.org> wrote:
>>>>> Hi Seb,
>>>>>
>>>>> On Fri, Apr 22, 2011 at 11:39 AM, Seb Bacon <seb.bacon at okfn.org> wrote:
>>>>>> Something I meant to follow up with you was your dream about sorting
>>>>>> the publisher model.
>>>>>>
>>>>>> I'm not entirely clear what you mean by this.  Could you explain?
>>>>>
>>>>> By now, I'm actually a proponent of "going Github": have a single
>>>>> domain entity "Publisher" from which both users and institutional
>>>>> publishers are derived and a n-1 relation between datasets and
>>>>> publishers
>>>>
>>>> So publisher has many datasets, and a dataset has only one publisher.
>>>> I presume by "going Github" you mean to preserve the provenance of a
>>>> dataset through maintaining a graph of publishers or datasets?
>>>>
>>>> <snip>
>>>>> I think at the moment, CKAN embodies a false notion of shared
>>>>> ownership which is neither true for institutional environments, nor
>>>>> for data wranglers (listen to what we actually say: "I have this
>>>>> dataset that I worked on"). We want CKAN to become a public data
>>>>> workbench, but at the same time workbenches are things that are very
>>>>> specific to their owners (everything else is an assembly line).
>>>>
>>>> Yes.  I suspect we may have to change the terminology here:
>>>>
>>>> (1) "Publisher" has several different and specific meanings depending
>>>> on the metadata standard or other context.  Perhaps we could invent
>>>> our own term to disambiguate, e.g. "Foundry" or "Workshop" (following
>>>> your workbench analogy).  (We also need to preserve the original
>>>> author somehow; which term do we currently use for this?  Author?)
>>>>
>>>> (2) To me, "Dataset" has some implication of a package of resources
>>>> that were originally released together, somehow -- it implies intent
>>>> by the original author.  I understand that you are talking about a
>>>> collection of resources for the purposes of data wrangling;
>>>> personally, I like "Workbench" for this.
>>>>
>>>> So we could have something like:
>>>>
>>>> - Resource: a CSV file or TXT file or similar
>>>>   - e.g. lat/lon of fire incidents in England
>>>> - Workbench: a collection of Resources which a user has gathered
>>>> together to answer some data question
>>>>   - example question: "what are the top five administrative areas in
>>>> the UK for fire incidents?"
>>>>   - example resources:
>>>>      - lat/lon of fire incidents in England / Wales / Scotland /
>>>> Northern Ireland
>>>>      - UK local administrative boundary shapefiles
>>>>      - UK local administrative area names
>>>> - Workshop: corresponds to a user account or a institutional account
>>>>   - e.g. "UK Cabinet Office" or "Joe Smith"
>>>>
>>>> Question: for an institutional user, would a primary source release
>>>> (e.g. http://data.gov.uk/dataset/financial-transactions-data-whittington-nhs-trust)
>>>> still be a Workbench, albeit a specially flagged one?
>>>>
>>>>> (this starts to make sense with resources that are
>>>>> independent of datasets, so my dataset and your dataset may share a
>>>>> resource; plus it makes authz a lot simpler).
>>>>
>>>> I believe there's general agreement that this is the right direction.
>>>>
>>>> Exactly how does it make authz simpler?  Something like: a Resource
>>>> and a Workbench would only have one owner (Workshop), and people would
>>>> fork Workbenches or make brand new ones if they wanted to edit them?
>>>>
>>>> We also, of course, need to preserve some notions of authz groups,
>>>> etc.  For example, institutional environments I've worked with want to
>>>> be able to assert some of the following statements:
>>>>
>>>> - The official originator of this data is Foo Department
>>>> - Only Sue and Fred of Foo Department can change this data
>>>>
>>>> If we moved to a workshop / workbench type model, which are
>>>> collections of  Resources as first-class citizens
>>>>
>>>>> Hope this makes some sense,
>>>>
>>>> I think so -- does my (re)interpretation above match your sense?
>>>>
>>>>> [OT]
>>>>> re data wrangling:
>>>>>
>>>>> https://bitbucket.org/pudo/iati/src
>>>>> https://bitbucket.org/okfn/ukgov-25k-spending/src
>>>>
>>>> These are very good use cases for the kinds of data wrangling we want
>>>> users to be able to do easily, I think.
>>>>
>>>> Seb
>>>>
>>>> _______________________________________________
>>>> ckan-dev mailing list
>>>> ckan-dev at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>>
>>>
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>