[ckan-dev] publisher model

Seb Bacon seb.bacon at okfn.org
Wed Apr 27 09:41:36 UTC 2011


Hi Richard,

This is a very useful discussion :)

On 26 April 2011 21:41, Richard Cyganiak <richard at cyganiak.de> wrote:
> The workbench/workshop metaphor doesn't work for me at all.

As a side issue, to be clear, my intent with "Workbench" and
"Workshop" was that they are quite different things.  Re. Workshop, I
was trying to come up with a term for a user account, which can
represent both an organisation or an individual user (as per
Friedrich's idea).  Perhaps "Foundry" is better, so at least the words
actually *sound* different :)

The point about a Foundry is that we are making the role of a user
explicit.  At the moment, a user is a type of librarian.  In this
alternative vision, a user is someone who not only catalogs the data,
but usually *but not necessarily) also does stuff with it to make it
interesting or useful.

A Visitor, IMO, is a separate role we need to be clear about.  I think
they're pretty much always someone who has googled for something like
"fire incidents in the UK" and wants to know the number of fire
incidents in their area in the past year; or someone who is checking
the source of a journalist's report about local council budgeting.

> The first-class items on CKAN should be, broadly speaking, some sort of
> collections of data. A workbench/workshop, on the other hand, is a
> tool for manipulating these first-class items.

No, I see it as primarily a collection of data, with some
supplementary tools for making it useful / meaningful.  Perhaps the
metaphor is flawed and we should just call it something neutral like a
Dataset, but I'm worried that this term is overloaded.  We need a more
precise definition of what our collections are *for*.  Right now,
Packages are things (i.e. "resources") that the uploader considers
related, for *any* value of "related"; but that relationship is then
more-or-less set, for better or worse, in that Package, because a
resource can only ever be in one Package.

> It is something where I take my dataset, do some stuff with it,
> and bring it back in some modified form (at which point it may still
> be the same or a new artifact -- that's a separate question). If I
> were a new visitor of ckan.net and heard that it's about
> workshops/workbenches for data, I'd totally get the wrong idea.
> It's kind of confusing the tools and their inputs/outputs, IMHO.

As per the other thread today, casual users *already* find CKAN
confusing.  They don't understand what it's for.  Even people somewhat
connected with the Open Data scene get confused.  Re-imagining what
CKAN is f(I think) part of the solution for actually clarifying its
purpose.

There are three roles that I think currently understand (to varying
degrees) ckan.net, what we could call the Researcher, the Author, and
the Data Wrangler.  The Researcher is typically an academic; the
Author is typically a representative of an organisation that wants to
"do open data"; the Wranglers is typically a hacktivist.

My argument is that by providing a means for Wranglers to make their
own collections which are first-class citizens, which will go
alongside the collections made by Researchers, we would be creating an
opportunity to generate collections that are actually meaningful for a
wider, more casual audience.

Perhaps my working-name terminology needs expanding to cover this
distinction.  We could consider a collection which has been made
meaningful a "Data Story", which has a "Workbench" view for Data
Wranglers.

Thus, a Data Story is primarily a collection of data which is *useful
for the Visitor*; with supporting information, on the Workbench, of
interest to Researchers and Wranglers, showing the collections it's
derived from, and the transformations that were performed on those
collections.  The more interested, geeky visitor would be able to
click something analogous to the "fork me on github" ribbon in the
Data Story, which would then take them to a Workbench for that Story.

Importantly, the idea of a Story which can be forked would then not
just be for Data Wranglers but also for other, less technical users;
at its simplest, a new Story is just a collection of Resources that
make sense together in some way.  Right now, a Resource can only be in
a single Package.

You talk about new visitors getting the wrong idea; I'd be really
interested to understand what you think the right idea should be for
such visitors?  Right now, I think they have No Idea :)

Thanks,

Seb




> On 26 Apr 2011, at 15:19, Seb Bacon wrote:
>
>> Hi,
>>
>> On 26 April 2011 10:02, Friedrich Lindenberg
>> <friedrich.lindenberg at okfn.org> wrote:
>>> Hi Seb,
>>>
>>> On Fri, Apr 22, 2011 at 11:39 AM, Seb Bacon <seb.bacon at okfn.org> wrote:
>>>> Something I meant to follow up with you was your dream about sorting
>>>> the publisher model.
>>>>
>>>> I'm not entirely clear what you mean by this.  Could you explain?
>>>
>>> By now, I'm actually a proponent of "going Github": have a single
>>> domain entity "Publisher" from which both users and institutional
>>> publishers are derived and a n-1 relation between datasets and
>>> publishers
>>
>> So publisher has many datasets, and a dataset has only one publisher.
>> I presume by "going Github" you mean to preserve the provenance of a
>> dataset through maintaining a graph of publishers or datasets?
>>
>> <snip>
>>> I think at the moment, CKAN embodies a false notion of shared
>>> ownership which is neither true for institutional environments, nor
>>> for data wranglers (listen to what we actually say: "I have this
>>> dataset that I worked on"). We want CKAN to become a public data
>>> workbench, but at the same time workbenches are things that are very
>>> specific to their owners (everything else is an assembly line).
>>
>> Yes.  I suspect we may have to change the terminology here:
>>
>> (1) "Publisher" has several different and specific meanings depending
>> on the metadata standard or other context.  Perhaps we could invent
>> our own term to disambiguate, e.g. "Foundry" or "Workshop" (following
>> your workbench analogy).  (We also need to preserve the original
>> author somehow; which term do we currently use for this?  Author?)
>>
>> (2) To me, "Dataset" has some implication of a package of resources
>> that were originally released together, somehow -- it implies intent
>> by the original author.  I understand that you are talking about a
>> collection of resources for the purposes of data wrangling;
>> personally, I like "Workbench" for this.
>>
>> So we could have something like:
>>
>> - Resource: a CSV file or TXT file or similar
>>   - e.g. lat/lon of fire incidents in England
>> - Workbench: a collection of Resources which a user has gathered
>> together to answer some data question
>>   - example question: "what are the top five administrative areas in
>> the UK for fire incidents?"
>>   - example resources:
>>      - lat/lon of fire incidents in England / Wales / Scotland /
>> Northern Ireland
>>      - UK local administrative boundary shapefiles
>>      - UK local administrative area names
>> - Workshop: corresponds to a user account or a institutional account
>>   - e.g. "UK Cabinet Office" or "Joe Smith"
>>
>> Question: for an institutional user, would a primary source release
>> (e.g. http://data.gov.uk/dataset/financial-transactions-data-whittington-nhs-trust)
>> still be a Workbench, albeit a specially flagged one?
>>
>>> (this starts to make sense with resources that are
>>> independent of datasets, so my dataset and your dataset may share a
>>> resource; plus it makes authz a lot simpler).
>>
>> I believe there's general agreement that this is the right direction.
>>
>> Exactly how does it make authz simpler?  Something like: a Resource
>> and a Workbench would only have one owner (Workshop), and people would
>> fork Workbenches or make brand new ones if they wanted to edit them?
>>
>> We also, of course, need to preserve some notions of authz groups,
>> etc.  For example, institutional environments I've worked with want to
>> be able to assert some of the following statements:
>>
>> - The official originator of this data is Foo Department
>> - Only Sue and Fred of Foo Department can change this data
>>
>> If we moved to a workshop / workbench type model, which are
>> collections of  Resources as first-class citizens
>>
>>> Hope this makes some sense,
>>
>> I think so -- does my (re)interpretation above match your sense?
>>
>>> [OT]
>>> re data wrangling:
>>>
>>> https://bitbucket.org/pudo/iati/src
>>> https://bitbucket.org/okfn/ukgov-25k-spending/src
>>
>> These are very good use cases for the kinds of data wrangling we want
>> users to be able to do easily, I think.
>>
>> Seb
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>




More information about the ckan-dev mailing list