[ckan-discuss] ckan.net linking open data group and lod cloud
Richard Cyganiak
richard at cyganiak.de
Sun Apr 25 21:24:29 BST 2010
Hi Rufus,
(Adding ckan-discuss to cc list. Context: We are discussing wether it
would be feasible to curate the dataset information that goes into
making the LOD Cloud diagram [1] within ckan.net.)
On 20 Apr 2010, at 09:42, Rufus Pollock wrote:
>> Your message made me ponder wether we could actually maintain the
>> data
>> behind the LOD cloud inside ckan.net. It's an intriguing idea. But
>> I'm
>> afraid that the completely open wiki-style nature of CKAN wouldn't be
>> compatible with the way we work.
>
> You understand that groups are curated? That is, only group admins can
> add/remove packages from their group. (This is unlike a tag that
> anyone can add/remove)
Yes, I've seen the groups feature, and it would be useful here.
> That of course still leaves the packages themselves. I do note that
> ckan (and hence ckan.net) has full acl support:
>
> <http://wiki.okfn.org/ckan/doc/accesscontrol/>
Oh, nice. I hadn't seen this.
> Going forward we do anticipate a time when package's start having
> dedicated maintainers who may wish to switch off "anyone can edit"
> capability (that facility is in there currently but atm we discourage
> people from using that capability).
Ok, so one option would be for Anja and me to create the packages that
represent the LOD Cloud as locked-down access controlled packages, and
create a group that contains all of them.
The problem with this is that many of these packages already exist in
ckan.net/group/lod and we would either have to duplicate those, or
lock down packages that are currently editable by anyone. Would any of
those options be acceptable to CKAN? They don't seem too much in the
spirit of the site ...
>> ckan.net would need to have at least a discussion page for each
>> dataset.
>
> This sounds very like a comments feature we've been discussing for a
> while.
By "discussion page" I mean something like the "talk pages" on
Wikipedia, where meta-discussions around shaping the entry could take
place. I'm thinking of discussion like:
A: "I think the license is stated incorrectly here, accoring to http://foobar
this dataset is in the public domain"
B: "The source for the currently stated license is here: http://xyz"
A: "That page is clearly outdated, and here's a post from John Doe
that confirms public domain license ..."
B: "Ok, I updated the record"
A Comments feature is not good for this IMO, it would just add a lot
of noise to the CKAN catalog pages.
The key idea at Wikipedia is that they separate the artifact (the
Wikipedia article) from the discussion that shapes the artifact (the
separate talk pages). This keeps the artifact page free from process
discussion, and reduces resistance that users might have against
posting on the highly visible artifact page.
> The other thing we've thought about is the ability to having
> pending changes -- i.e. someone makes a change to package and it
> doesn't get applied immediately but put on a stack waiting for admin
> approval (this is rather like a patch queue).
This is probably the right model for an environment where strict
quality control is necessary (e.g., program code, where a bad change
breaks the build, causes bugs, and is a security problem). In a wiki,
a spirit of "just do it, it can always be reverted" is more
appropriate. Now, CKAN is somewhere in the middle -- currently
probably closer to a wiki with imposed structure, but in the long term
you probably want much more automated processing to happen around your
data. So the patch queue model might be the right one.
>> Without this, I'd be afraid that anyone could just come in and mess
>> things
>> up, and I'd have to chase them down out-of-band. The site would
>> also need
>> group- and dataset-level watchlists. This would give me reassurance
>> that I
>> myself -- and hopefully a few other folks -- would look over the
>> changes and
>> revert or fix/improve anything that's not good enough.
>
> There's already an atom feed (and API) with all changes.
The feed with all changes does not work for me, because only a small
fraction of those changes will be relevant to myself, and I already
have too much noise in my inbox.
> We've also just added ability to get package specific feeds ...
I don't want to manually subscribe to an additional feed whenever I
add a package to a group.
Feeds for groups, please! This is a tool that group admins really
need, IMO.
I wonder wether I can filter only the stuff for one group from the all-
changes feed with Yahoo Pipes...
>> Or do you think my worries are unfounded here?
>
> I think at a start you could see what happened :) So far vandalism and
> spam have been kept very effectively in check and I think, in general,
> most edits to packages you were curating would be useful. At the same
> time see comments above for features that may already do some of what
> you want ...
Well. So let's say: If I can get a per-group feed, and a separate lod-
cloud group (with Anja and me as initial group managers, more
welcome), then I'm in, using unrestricted editing for the packages. So
the per-group feed is the one extra thing I need to be confident that
the curated data is reasonably safe.
I'll try building that feed with Yahoo Pipes. If that doesn't work,
then I'll probably have to wait until you implement group feeds
natively.
> I've already thought that we could start using agreed prefixes in
> ckan.net extras fields as a way of storing RDF info (which can then be
> proper RDF on semantic.ckan.net) --
Are extra fields in the RDF output at all? If they are, then I don't
worry too much about this. I'd be willing to code something that runs
CONSTRUCT queries or some other processing to get voiD data out of the
extra fields.
A related question. In the LOD Cloud data, we track links between
datasets. This could be done in CKAN using an extra field "links to",
where the value is some identifier for the target dataset, e.g., its
CKAN page. Now the problem is, sometimes we also want to keep track of
the number of links between two given datasets. For example, dataset5
links to dataset23 and there are 50k links between them. Do you have
an idea how to represent this in CKAN? Could this be recorded using
some convention with extra fields? Again, I don't have a problem doing
post-processing that turns this into the final format -- just don't
want to abuse the CKAN schema too much.
> also, before you ask, we have been
> thinking hard about moving ckan.net to a full RDF store backend
I don't really care what you use under the hood as long as there's
some RDF on the surface. If you have a fixed schema, there's limited
value in moving the backend to an RDF store.
So, let me know if we can get our own group, and I'll try the Pipes
thing, and if both work out then I'd be happy to migrate the LOD Cloud
database into CKAN.
All the best,
Richard
--
Linked Data Technologist • Linked Data Research Centre
Digital Enterprise Research Institute (DERI), NUI Galway, Ireland
http://linkeddata.deri.ie/
skype:richard.cyganiak
tel:+353-91-49-5711
More information about the ckan-discuss
mailing list