[okfn-coord] Draft proposal for CKAN

Mon Nov 5 19:08:10 UTC 2007

Jonathan wrote:
> Hi guys,
> 
> I've just posted a draft letter of inquiry (and associated notes) to
> the Mellon Foundation for CKAN here:
> 
> http://www.okfn.org/board/wiki/CKAN-Mellon
> 
> (Its rather long, but I can post whole thing, or just letter, inline
> if that would be useful.)
> 
> As the programme leader is Ira Fuchs, who (as indicated in my brief 
> biographical summary) has a technical background, we can afford to
> flesh out the technical aspects. Input here (particularly with
> respect to the development process, features list, timeline, budget,
> etc.) would be very much appreciated, as I'm not so technical.. :-)

I think this looks a good start but there is quite a bit we could do to
improve it. I've posted the main part of your draft inline below and
will comment there.

[snip]

> Finally as Mellon places an emphasis on high degree of collaboration
> and the inclusion of wide communities in technologies they fund, in
> the final paragraph I've attempted to summarise the more experienced
>  organisations and 'constituencies' we will be able to solicit
> feedback from. If anyone's got any good ideas for this...
> 
> I've thought we might leave out a project team for this (as they
> don't ask for one), but nevertheless it could be good to start
> contacting

Definitely at this point.

> people. I understand a current list would include Jo, Rufus, John and
>  possibly Aaron Straup Cope? Can anyone think of any suitable
> 'official'

Hmm, I think we'd want to be cautious about who was actually going to do
coding. The whole idea of having funds would be that we were able to pay
for the coding here. I think it would be better to focus on the project
and the organization itself as guarantees that things were going to be
run properly rather than specifically listing who would do what.

> advisors, if we need any?

I don't know whether these would be needed here. As 'advisors' we could
pick people from relevant communities (but I think we'd want to keep
things reasonably focused).

~rufus

Proposal
========

> Dear Mr. Fuchs,
> 
> I am writing to enquire whether the Mellon Foundation would consider
> funding the Comprehensive Knowledge Archive Network (CKAN) under its

I think one would want to start a little stronger and less cautious, e.g.

At present, in respect of knowledge 'development' we stand where 
software stood almost 30 years ago. Tools and techniques are crude, and 
methodologies are limited. When we distribute material openly such as a 
database, a learning module or a scientific paper (if we distribute it 
at all) we do so in forms that are hard to reuse and work with (often 
significant effort must be expended to get the data back into a usable 
form).

... segue in componentization ...

   * atomization
   * packaging

See the:

   * XTech slides
   * XTech summary
<http://blog.okfn.org/2006/05/09/the-four-principles-of-open-knowledge-development/>
<http://blog.okfn.org/2007/04/30/what-do-we-mean-by-componentization-for-knowledge/>

At present we are at the early stages of the development of a project 
entitled 'CKAN' (Comprehensive Knowledge Archive Network) named in 
analogy with CPAN. It is conceived as part of a wider movement towards 
'componentization' in (open) knowledge development, whether such 
knowledge comes as a database, a learning modules or collections of RDF 
statements. Developing good knowledge APIs will be hard and must 
necessarily proceed discipline by discipline -- it is clearly not 
something a single project should aim, our would be able, to do. However 
automated discovery, indexing and 'installation' of open knowledge 
resources -- which in analogy with the current practice in software one 
might term packages -- is something very much feasible with given 
technology.

> Research in Information Technology Programme. CKAN is a registry for
> open knowledge 'packages'. We estimate that it will cost $50,000 to
> take CKAN forward into its next phase of development.

I'd keep discussion of costs until later when the project has already 
been introduced.

> The 'Comprehensive Knowledge Archive Network' (CKAN)
> 
> The last few years have seen a considerable growth of interest in the
> social, scholarly and commercial benefits of 'open knowledge'.
> Knowledge producers and users - including those in government,
> education, business and the media - have been exploring new ways of
> facilitating the exchange and re-use of data, documents and media
> through a combination of internet technologies and liberal licensing
> practices. However, finding out about what is available is difficult
> due to the sheer volume of material, the diversity of groups and
> sectors involved in its production and distribution, the
> proliferation of licenses, and uncertainties surrounding the legal
> conditions of re-use.

We've got to be careful. We're not going to address license 
proliferation, or solve the search problem in a big way -- freshmeat 
isn't google, nor is CPAN. We're going to provide something pretty 
focused. Obviously we've still got to blow our trumpet but we should be 
pretty tight in terms of what we are delivering. Big ideas can go in the 
intro with a gradual tightening as the letter progresses.

> While several organisations have attempted to create directories and
> search tools for open content, these attempts have often been limited
> in scope to certain types of license, or certain types of content.
> There remains a need for a registry that includes all types of
> material (notably datasets as well as texts, images and multimedia)

This is where I'd be cautious. We're presenting ourselves as the 
universal panacea -- all those other registries were too limited etc 
etc. I don't think this is the right way to go here. I think we could 
use the CKAN FAQs to good effect in delineat

> from across the broad spectrum of knowledge production. Additionally
> there is a need for a service which documented the availability of
> material that which has passed into the public domain or is exempt
> from copyright as well as that which is available under an open
> license.

No! This is definitely not the focus. The focus is on discovery and 
reuse etc Not on just creating a 'registry'. In that sense I would say 
we are more narrow and long than broad and shallow. In some sense we 
might want to destress the 'comprehensive' (which was always a bit of a 
joke for PERL).

> CKAN or the 'Comprehensive Knowledge Archive Network' strives to meet
> both of these needs by providing a fully open, 'comprehensive'
> registry of knowledge 'packages' that others are free to access,
> distribute, modify and build upon. It is named in the manner of
> several free/open source software archives - such as CPAN for Perl,
> CTAN for TeX, and CRAN for R. It is currently in beta release, and

Probably say alpha.

> contains user contributed details of over 100 collections of open
> content and data - including license details, tags, links and
> comments. So far it has been developed through the efforts of
> volunteers, and with the input of specialists - from data experts and
> semantic web developers to researchers and academic publishers. The
> beta version will provide the core for future developments and has
> served as a proof-of-concept model to use to solicit for feedback
> from the wide community of potential users. We require funding to
> develop a more sophisticated domain model based on use cases in
> different fields, and to significantly refine and extend the
> codebase.

> We are particularly enthusiastic to have the Mellon Foundation as a

'are'?

> partner and a benefactor of CKAN because of its history of funding
> innovative discovery and archival tools for information resources -
> such as JSTOR, ARTstor and OCW. We strongly share the Foundation's
> belief in the widespread benefits of open resources. Furthermore, we

I think there is a danger that this is too blandly in agreement. Of 
course we agree with stuff about openness -- that's more than apparent 
from the site and the project.

> are confident that CKAN will be of significant benefit to all of the
> parties mentioned in the programme - from those involved in online
> teaching and learning to those in the cultural and heritage sector,
> from scholarly communities to those involved in the provision of
> library and information resources. It seems likely that the registry

This is important but perhaps we could give one or two specific 
examples. Perhaps give example of how someone looking for an open source 
package goes to freshmeat or goes to CPAN or to PyPI.

> will provide cost savings for organisations that would otherwise
> purchase knowledge packages, and will be of value to commercial
> organisations as well as the general public.

"purchasing knowledge packages."? I think this is perhaps pushing it but 
I'd be open to comments (perhaps I'm getting too english now).

> CKAN will be modular and well documented, so that it can be easily
> modified, built upon and integrated with existing applications and
> institutional resources. We hope it would be able to be used as an

Do we really say this? We could definitely say that it would be designed 
for customization for different areas -- e.g. geodata, chemistry etc.

> integral component for automated knowledge-acquisition - such that
> content and data packages could be located and downloaded as can be
> currently done for software. Another vision would be one in which
> material discovered using CKAN could be dynamically explored and
> manipulated using, e.g., best-of-breed visualisation, text mining and
> statistical applications.
> 
> CKAN, or something like it, will constitute an important part of the
> infrastructure for open knowledge producers and users. We hope it
> will stimulate innovation and growth in the ecology of open content
> and data by facilitating re-use, re-combination and encouraging the
> creation of derivative works.

This is just a bit bland I feel. The creation of 'derivative' works. We 
need to be a bit punchier. We also need a really good ending para.