[ckan-discuss] ckan.net linking open data group and lod cloud

Mon Apr 26 18:13:58 BST 2010

On 25 April 2010 21:24, Richard Cyganiak <richard at cyganiak.de> wrote:
[...]
>> Going forward we do anticipate a time when package's start having
>> dedicated maintainers who may wish to switch off "anyone can edit"
>> capability (that facility is in there currently but atm we discourage
>> people from using that capability).
>
> Ok, so one option would be for Anja and me to create the packages that
> represent the LOD Cloud as locked-down access controlled packages, and
> create a group that contains all of them.
>
> The problem with this is that many of these packages already exist in
> ckan.net/group/lod and we would either have to duplicate those, or lock down
> packages that are currently editable by anyone. Would any of those options
> be acceptable to CKAN? They don't seem too much in the spirit of the site

Right, I think it would be better at the moment to see if you could
live with allow packages to be editable by others. If that is abused
then you could start locking them down. You seem to suggest this
option yourself below so it seems like this is a workable option :)

>>> ckan.net would need to have at least a discussion page for each dataset.
>>
>> This sounds very like a comments feature we've been discussing for a
>> while.
>
> By "discussion page" I mean something like the "talk pages" on Wikipedia,
> where meta-discussions around shaping the entry could take place. I'm
> thinking of discussion like:
>
> A: "I think the license is stated incorrectly here, accoring to
> http://foobar this dataset is in the public domain"
> B: "The source for the currently stated license is here: http://xyz"
> A: "That page is clearly outdated, and here's a post from John Doe that
> confirms public domain license ..."
> B: "Ok, I updated the record"
>
> A Comments feature is not good for this IMO, it would just add a lot of
> noise to the CKAN catalog pages.

Very good point. There is a big difference between comments and
discussion pages. I think you are right that a discussion page is
probably more valuable. I've created a ticket here:

<http://knowledgeforge.net/ckan/trac/ticket/301>

> The key idea at Wikipedia is that they separate the artifact (the Wikipedia
> article) from the discussion that shapes the artifact (the separate talk
> pages). This keeps the artifact page free from process discussion, and
> reduces resistance that users might have against posting on the highly
> visible artifact page.

Agreed.

>> The other thing we've thought about is the ability to having
>> pending changes -- i.e. someone makes a change to package and it
>> doesn't get applied immediately but put on a stack waiting for admin
>> approval (this is rather like a patch queue).
>
> This is probably the right model for an environment where strict quality
> control is necessary (e.g., program code, where a bad change breaks the
> build, causes bugs, and is a security problem). In a wiki, a spirit of "just
> do it, it can always be reverted" is more appropriate. Now, CKAN is
> somewhere in the middle -- currently probably closer to a wiki with imposed
> structure, but in the long term you probably want much more automated
> processing to happen around your data. So the patch queue model might be the
> right one.

Sounds sensible. One point about this is we already have quite a bit
of the potential infrastructure for this in place at the moment with
the Changeset/Revision model but work would be needed to put this in
the user interface.

[...]

>> We've also just added ability to get package specific feeds ...
>
> I don't want to manually subscribe to an additional feed whenever I add a
> package to a group.
>
> Feeds for groups, please! This is a tool that group admins really need, IMO.

We already have a ticket: <http://knowledgeforge.net/ckan/trac/ticket/272>

I think this should be easy to implement so your request definitely
ups its priority! I think it could be done very soon ...

> I wonder wether I can filter only the stuff for one group from the
> all-changes feed with Yahoo Pipes...

You probably can.

>>> Or do you think my worries are unfounded here?
>>
>> I think at a start you could see what happened :) So far vandalism and
>> spam have been kept very effectively in check and I think, in general,
>> most edits to packages you were curating would be useful. At the same
>> time see comments above for features that may already do some of what
>> you want ...
>
> Well. So let's say: If I can get a per-group feed, and a separate lod-cloud
> group (with Anja and me as initial group managers, more welcome), then I'm
> in, using unrestricted editing for the packages. So the per-group feed is
> the one extra thing I need to be confident that the curated data is
> reasonably safe.
>
> I'll try building that feed with Yahoo Pipes. If that doesn't work, then
> I'll probably have to wait until you implement group feeds natively.

OK, that's great -- I think we can add a per-group feed very easily :)

>> I've already thought that we could start using agreed prefixes in
>> ckan.net extras fields as a way of storing RDF info (which can then be
>> proper RDF on semantic.ckan.net) --
>
> Are extra fields in the RDF output at all? If they are, then I don't worry
> too much about this. I'd be willing to code something that runs CONSTRUCT
> queries or some other processing to get voiD data out of the extra fields.
>
> A related question. In the LOD Cloud data, we track links between datasets.
> This could be done in CKAN using an extra field "links to", where the value
> is some identifier for the target dataset, e.g., its CKAN page. Now the
> problem is, sometimes we also want to keep track of the number of links
> between two given datasets. For example, dataset5 links to dataset23 and
> there are 50k links between them. Do you have an idea how to represent this
> in CKAN? Could this be recorded using some convention with extra fields?
> Again, I don't have a problem doing post-processing that turns this into the
> final format -- just don't want to abuse the CKAN schema too much.

I don't think this is abusing the CKAN schema at all. I think the
probably convention is to use the package prefix when an extra field
value is another package, e.g. "package:xxx" would indicate package
xxx. So you could have:

Extra key: Number of links
Extra value: 500000

I'm also thinking it might be nice to hack a bit of an RDF hack in
CKAN extra fields to have a set of existing RDF prefixes (dc, dct,
void etc) which could be use in Extra Keys.

>> also, before you ask, we have been
>> thinking hard about moving ckan.net to a full RDF store backend
>
> I don't really care what you use under the hood as long as there's some RDF
> on the surface. If you have a fixed schema, there's limited value in moving
> the backend to an RDF store.

Right :) Well's there definitely RDF on the surface!

> So, let me know if we can get our own group, and I'll try the Pipes thing,
> and if both work out then I'd be happy to migrate the LOD Cloud database
> into CKAN.

OK on both of these. You can register the group right now (you'll just
need to login):

<http://www.ckan.net/group/>

Let me know how you do on pipes, but it will be very easy for us to
add per-group feeds.

Rufus