[ckan-discuss] Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

Mon Sep 6 14:08:14 BST 2010

(cc'ing ckan-discuss)

Yes - I think the front page used to say:

"CKAN is a registry of open data and content packages (and some closed ones)"

We should probably revert to something like this wording to avoid
confusion. The main focus of CKAN is, of course, data which is open as
in opendefinition.org as a baseline (though at the OKF we also promote
different standards in different domains - such as pantoprinciples.org
for science). In my opinion a major reason for adding non-open data,
or data where licensing is not clear, is to highlight this to a
broader community of prospective users, to use in combination with
services to clarify legal status (like isitopen.org which Chris
mentioned) and to ultimately to encourage the adoption of an open
license.

Perhaps a good analogy is main, universe and multiverse repositories
in free/open source software package management?

https://help.ubuntu.com/community/Repositories/Ubuntu

On Sunday, September 5, 2010, Kingsley Idehen <kidehen at openlinksw.com> wrote:
>  On 9/5/10 11:00 AM, Alan Ruttenberg wrote:
>
> On Sun, Sep 5, 2010 at 5:08 AM, Chris Bizer<chris at bizer.de>  wrote:
>
> Hi Alan,
>
>
> I have just spent some time evaluating one source and reported to you
> the result. Perhaps you might act on this investment in time and thank
> me for doing so. You might find that the result was myself and more
> people doing such quality control.
>
> Sorry that my reply yesterday might have been a bit too harsh.
>
> I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html)
> and added a reference to the description of the CAS dataset at
>
> http://ckan.net/package/bio2rdf-cas
>
> Please also note that CKAN provides a rating function for the datasets and
> also provides for commenting and discussing the datasets.
>
> Maybe people could use these features as a start to collect quality-related
> meta-information about the datasets.
>
> CKAN also provides a link to the http://www.isitopendata.org/ service, which
> might be used for license inquiries.
>
> Dear Chris,
>
> As I said, the first line on the CKAN home page says: "CKAN is a
> registry of open data and content packages.". Therefore I think there
> is a reasonable expectation that the packages registered there are
> open. I maintain that CKAN should either change how it explains itself
> to make clear that it is a registry of packages that may or may not be
> open, or it should remove the packages that are not known to be open.
> I'm not taking a position one way or another which they should do
> (that's their business), but they should say what they do, and do what
> they say.
>
> Thank you for your pointers to further information on how to find
> licenses. I'm fairly familiar with this area given that I work for
> Creative Commons.
>
>
> Chris,
>
> The critical point here is that CKAN should simply make the correct Alan is suggesting. As you know, we don't need misleading headlines in the LOD realm, it ultimately causes problems.
>
> Anyway, this is maybe more of a CKAN issue, so I am hoping that Jonathan is reading this thread and takes this as a cue to fix the title, that's all. Basically, this is about publicly available structured data that may or may not be "Open". Basically, making something available to the public still doesn't imply that it's actually "Open" etc..
>
>
> I think we can fix this little issue.
>
> Kingsley
>
>
> I agree with you that the quality of Linked Data published on the Web is
> crucial, but we also have to take into account that much of the data in the
> LOD cloud is currently still published by research projects in order to
> demonstrate the technologies.
>
> As the Web of Data is evolving and more and more actual owners of the
> datasets start to provide them as Linked Data, I hope that the quality will
> also increase and the datasets will be keep current. Encouraging
> developments into this direction currently happen in the libraries,
> eGovernment, and eCommerce domains.
>
> I agree that these are good examples. I would suggest that you focus
> on including the good examples in the LOD cloud, or at a minimum
> remove those, like CAS, that fall below the minimal standard of
> supplying *some* data and being *open*, so that "linked open data"
> means something coherent.
>
>
> On the other hand, the Web is an open system and we will thus always see
> people publishing low-quality, wrong and misleading data. Google handles
> this fact rather successfully using PageRank. As the Web of Data provides
> more structure then the classic Web, I think we might even be able to apply
> more sophisticated data-quality assessment heuristics to decide which data
> we want to use in our applications and which to ignore. Some of these
> methods are listed in [1].
>
> Look, Chris, I just did a "manual page rank" on the CAS dataset. It is
> meaningless.  This is a high quality assessment. If the movement can't
> act on known good quality information I (and others) will doubt that
> automatic algorithms will be credible.
>
> Moreover, the LOD cloud diagram is an advertisement. There are enough
> data sets now that inclusion in the diagram can become a reward for
> good work. It's not good advertising for Google when junk sites come
> up at the top of search results and they do their best to minimize
> this occurrence. The LOD cloud is your front page, and to a certain
> extent mine as well as I invest all my time in doing work towards
> building the web of data in the Sciences.
>
> Regards,
> Alan
>
>
> Best,
>
> Chris
>
> [1] Christian Bizer, Richard Cyganiak: Quality-driven information filtering
> using the WIQA policy framework. Journal of Web Semantics: Science, Services
> and Agents on the World Wide Web, Volume 7, Issue 1, January 2009, Pages
> 1-10.
> http://dx.doi.org/10.1016/j.websem.2008.02.005
>
>
> -----Ursprüngliche Nachricht-----
> Von: Alan Ruttenberg [mailto:alanruttenberg at gmail.com]
> Gesendet: Samstag, 4. September 2010 22:20
> An: Chris Bizer
> Cc: Anja Jentzsch; public-lod at w3.org; Leigh Dodds; Jonathan Gray
> Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
> that your dataset is included.
>
> On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizer<chris at bizer.de>  wrote:
>
> So rather than to criticize the work that other people do on collecting
> meta-information about the datasets in the LOD cloud
>
> Did you read what I wrote? I made no comment on the adequacy of
> metainformation. In fact I *used* that metainformation to point out
> that the data source in question did not satisfy the "open" provision
> of linked *open* data. In addition I criticized the *inclusion* of the
> data set in the *lod cloud diagram* because of this lack of openness
> and because the actual content of that resource didn't resemble any
> data in the resource that it was derived from (a registry of
> information about chemical compounds), suggesting that it would hurt
> the LOD effort as inclusion would be a kind of "false advertising".
>
> -Alan
>
>
>
>
>
>
>
> --
>
> Regards,
>
> Kingsley Idehen
> President&  CEO
> OpenLink Software
> Web: http://www.openlinksw.com
> Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca: kidehen
>
>
>
>
>
>

-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://blog.okfn.org

http://twitter.com/jwyg
http://identi.ca/jwyg