[okfn-coord] Draft proposal for CKAN

Thu Nov 29 00:25:48 UTC 2007

Hi,

Here's a newer version of the draft letter of inquiry from CKAN for the 
Mellon Foundation (inline below). I've tried to work in/adjust according 
to Rufus' responses. There's a much stronger emphasis on parallels with 
software development.

More comments would be appreciated so we can get this thing sent off asap!

Best,

J.

Dear Mr. Fuchs,

I am writing to introduce you to the Comprehensive Knowledge Archive 
Network (CKAN) and to enquire whether it is eligible for funding under 
your Research in Information Technology Programme.

'''The 'Comprehensive Knowledge Archive Network' (CKAN)'''

In knowledge development, we stand where software developers stood 
almost 30 years ago. Tools and techniques are crude, and methodologies 
are limited. If we do freely distribute our work - whether this is a 
database, a learning module or a scientific paper - we often do so in a 
manner which impedes re-use. Significant effort must be expended to 
extract and re-format material such that it is useful for the purposes 
of others.

In software this problem was addressed by an increased focus on 
'componentization'. In the past most programmers would focus exclusively 
on writing a piece of code that would fulfil the purpose at hand - with 
little regard for its readability, documentation, and future 
reusability. With the rise of free and open source software, programmers 
increasingly showed an interest in developing discrete 'packages' of 
code that could be easily re-combined in a multiplicity of different 
contexts for a multiplicity of different purposes. In free and open 
source circles, unwieldly and impenetrable code was replaced by a host 
of flexible and recyclable components. 'Advanced packaging tools' like 
'apt-get' and 'Synaptic Package Manager' exemplify the enourmous 
potential of packaging in software: thousands of interdependent 
components can be easily located, acquired and installed.

The Comprehensive Knowledge Archive Network (CKAN) is a key part of our 
strategy to provide support for 'componentization' in knowledge 
development. Named after several free/open source software archives - 
such as CPAN, CTAN and CRAN - it aims to provide a fully open, registry 
of knowledge 'packages' that others are free to access, distribute, 
modify and build upon. While good knowledge APIs will be developed 
discipline by discipline, CKAN focuses on automated discovery, indexing 
and 'installation' of open knowledge packages - including datasets, 
documents, images, multimedia and source files.

It is currently in alpha release, and contains user contributed details 
for over 100 collections of open content and data - including license 
details, tags, links and comments. So far it has been developed through 
the efforts of volunteers, and with the input of specialists - from data 
experts and semantic web developers to researchers and academic 
publishers. The alpha version will provide the core for future 
developments and has served as a proof-of-concept model to use to 
solicit for feedback from the wide community of potential users. We 
require funding to develop a more sophisticated domain model based on 
use cases in different fields, and to significantly refine and extend 
the codebase.

We would be particularly enthusiastic to have the Mellon Foundation as a 
partner and a benefactor of CKAN because of its experience in funding 
successful, innovative discovery and archival tools for digital 
resources - such as JSTOR, ARTstor and OCW. We are confident CKAN will 
be of significant benefit to many parties traditionally served by Mellon 
programs - including scholars, educators, librarians and information 
professionals. For example, climate scientists and environmental 
information providers alike will be able to explore and automatically 
acquire journal articles, datasets, and graphical representations 
pertaining to atmospheric thermodynamics - based on detailed domain 
specific metadata.

A combination of liberal licensing policies and new technologies is 
creating a 'perfect storm of opportunity' for the growth of an ecology 
of recombinable knowledge 'packages'. We believe that CKAN, or something 
like it, will be essential to encourage and support this growth.

'''Budget, timeline and projected develoments'''

We anticipate that it will take one year of development before CKAN is 
ready for its full 1.0 release. $50,000 will enable us to conduct more 
detailed user research and domain analysis and to employ a part time 
software developer.

Projected developments include:
 * support for more detailed and domain-specific metadata, including 
facilities to create and import specialised structured metadata (e.g. 
for scientific data, geodata, or official statistics) and for metadata 
standards such as machine tagging;
 * support for mirroring and distributed storage;
 * support for machine automated retrieval of knowledge packages;
 * improving search functionality and querying;
 * improvements in the user interface;
 * browser plugins to enable on-the-fly registry of new CKAN entries;
 * detailed and comprehensive documentation for the project and its code.

'''The Open Knowledge Foundation'''

The Open Knowledge Foundation is a not-for-profit organisation founded 
in 2004 and dedicated to protecting and promoting open knowledge in all 
its forms. It is a European leader in this field and prominent on the 
international stage.

Its work includes:
  * doing research and making recommendations on issues related to open 
knowledge at UK, EU and international levels;
  * organising events to bring together groups and individuals with an 
interest in open knowledge - from academics and government 
representatives to data experts and media groups;
  * working on open knowledge 'standards' such as the Open Knowledge 
Definition and the Open Service Definition;
  * helping to support and advise those interested in using an open 
license, such as in our 'Guide to Open Data Licensing';
  * contributing to the 'infrastructure' for those who produce and use 
open knowledge, which (in addition to CKAN) includes:
    - KForge - an open source suite of tools for managing software and 
knowledge projects;
    - KnowledgeForge - a free service, based on KForge, for open 
knowledge projects;
  * initiating open knowledge 'exemplar' projects, such as Open 
Shakespeare and Open Economics.

Its advisory board includes:
  * Dr Tim Hubbard (Sanger Institute)
  * Paula Le Dieu (Magic Lantern)
  * Dr Peter Murray-Rust (Cambridge University)
  * Professor John Naughton (Open University and Cambridge University)
  * Professor Peter Suber (Earlham College and SPARC)
  * Benjamin Mako Hill (MIT)
  * John Wilbanks (Science Commons)
  * Dr Sören Auer (Universität Leipzig)

The Open Knowledge Foundation is extremely well placed to develop an 
open knowledge registry such as CKAN. In addition to our experience of 
open source software development, we have close contact with many 
individuals, organizations and communities with an interest in such a 
registry, and from whom it will be vital to solicit for input and 
testing. In addition to acquiring feedback from experts in different 
fields and information professionals we have key contacts supportive of 
our work in organisations such as JISC, EPSI, SPARC and W3C. We also 
have contact with emerging networks such as iCommons, the Open Content 
Alliance, and COMMUNIA, 'The European Thematic Network on the Digital 
Public Domain'. The Mellon Foundation's extensive contacts in the field 
of scholarly digital resources, telecommunications and infrastructure 
would complement the contacts we already have.

I sincerely hope that you will consider joining us to develop CKAN.

Rufus Pollock wrote:
> Jonathan wrote:
>> Hi guys,
>>
>> I've just posted a draft letter of inquiry (and associated notes) to
>> the Mellon Foundation for CKAN here:
>>
>> http://www.okfn.org/board/wiki/CKAN-Mellon
>>
>> (Its rather long, but I can post whole thing, or just letter, inline
>> if that would be useful.)
>>
>> As the programme leader is Ira Fuchs, who (as indicated in my brief 
>> biographical summary) has a technical background, we can afford to
>> flesh out the technical aspects. Input here (particularly with
>> respect to the development process, features list, timeline, budget,
>> etc.) would be very much appreciated, as I'm not so technical.. :-)
>
> I think this looks a good start but there is quite a bit we could do to
> improve it. I've posted the main part of your draft inline below and
> will comment there.
>
> [snip]
>
>> Finally as Mellon places an emphasis on high degree of collaboration
>> and the inclusion of wide communities in technologies they fund, in
>> the final paragraph I've attempted to summarise the more experienced
>>  organisations and 'constituencies' we will be able to solicit
>> feedback from. If anyone's got any good ideas for this...
>>
>> I've thought we might leave out a project team for this (as they
>> don't ask for one), but nevertheless it could be good to start
>> contacting
>
> Definitely at this point.
>
>> people. I understand a current list would include Jo, Rufus, John and
>>  possibly Aaron Straup Cope? Can anyone think of any suitable
>> 'official'
>
> Hmm, I think we'd want to be cautious about who was actually going to do
> coding. The whole idea of having funds would be that we were able to pay
> for the coding here. I think it would be better to focus on the project
> and the organization itself as guarantees that things were going to be
> run properly rather than specifically listing who would do what.
>
>> advisors, if we need any?
>
> I don't know whether these would be needed here. As 'advisors' we could
> pick people from relevant communities (but I think we'd want to keep
> things reasonably focused).
>
> ~rufus
>
> Proposal
> ========
>
>> Dear Mr. Fuchs,
>>
>> I am writing to enquire whether the Mellon Foundation would consider
>> funding the Comprehensive Knowledge Archive Network (CKAN) under its
>
> I think one would want to start a little stronger and less cautious, e.g.
>
> At present, in respect of knowledge 'development' we stand where 
> software stood almost 30 years ago. Tools and techniques are crude, 
> and methodologies are limited. When we distribute material openly such 
> as a database, a learning module or a scientific paper (if we 
> distribute it at all) we do so in forms that are hard to reuse and 
> work with (often significant effort must be expended to get the data 
> back into a usable form).
>
> ... segue in componentization ...
>
>   * atomization
>   * packaging
>
> See the:
>
>   * XTech slides
>   * XTech summary
> <http://blog.okfn.org/2006/05/09/the-four-principles-of-open-knowledge-development/> 
>
> <http://blog.okfn.org/2007/04/30/what-do-we-mean-by-componentization-for-knowledge/> 
>
>
> At present we are at the early stages of the development of a project 
> entitled 'CKAN' (Comprehensive Knowledge Archive Network) named in 
> analogy with CPAN. It is conceived as part of a wider movement towards 
> 'componentization' in (open) knowledge development, whether such 
> knowledge comes as a database, a learning modules or collections of 
> RDF statements. Developing good knowledge APIs will be hard and must 
> necessarily proceed discipline by discipline -- it is clearly not 
> something a single project should aim, our would be able, to do. 
> However automated discovery, indexing and 'installation' of open 
> knowledge resources -- which in analogy with the current practice in 
> software one might term packages -- is something very much feasible 
> with given technology.
>
>> Research in Information Technology Programme. CKAN is a registry for
>> open knowledge 'packages'. We estimate that it will cost $50,000 to
>> take CKAN forward into its next phase of development.
>
> I'd keep discussion of costs until later when the project has already 
> been introduced.
>
>> The 'Comprehensive Knowledge Archive Network' (CKAN)
>>
>> The last few years have seen a considerable growth of interest in the
>> social, scholarly and commercial benefits of 'open knowledge'.
>> Knowledge producers and users - including those in government,
>> education, business and the media - have been exploring new ways of
>> facilitating the exchange and re-use of data, documents and media
>> through a combination of internet technologies and liberal licensing
>> practices. However, finding out about what is available is difficult
>> due to the sheer volume of material, the diversity of groups and
>> sectors involved in its production and distribution, the
>> proliferation of licenses, and uncertainties surrounding the legal
>> conditions of re-use.
>
> We've got to be careful. We're not going to address license 
> proliferation, or solve the search problem in a big way -- freshmeat 
> isn't google, nor is CPAN. We're going to provide something pretty 
> focused. Obviously we've still got to blow our trumpet but we should 
> be pretty tight in terms of what we are delivering. Big ideas can go 
> in the intro with a gradual tightening as the letter progresses.
>
>> While several organisations have attempted to create directories and
>> search tools for open content, these attempts have often been limited
>> in scope to certain types of license, or certain types of content.
>> There remains a need for a registry that includes all types of
>> material (notably datasets as well as texts, images and multimedia)
>
> This is where I'd be cautious. We're presenting ourselves as the 
> universal panacea -- all those other registries were too limited etc 
> etc. I don't think this is the right way to go here. I think we could 
> use the CKAN FAQs to good effect in delineat
>
>> from across the broad spectrum of knowledge production. Additionally
>> there is a need for a service which documented the availability of
>> material that which has passed into the public domain or is exempt
>> from copyright as well as that which is available under an open
>> license.
>
> No! This is definitely not the focus. The focus is on discovery and 
> reuse etc Not on just creating a 'registry'. In that sense I would say 
> we are more narrow and long than broad and shallow. In some sense we 
> might want to destress the 'comprehensive' (which was always a bit of 
> a joke for PERL).
>
>> CKAN or the 'Comprehensive Knowledge Archive Network' strives to meet
>> both of these needs by providing a fully open, 'comprehensive'
>> registry of knowledge 'packages' that others are free to access,
>> distribute, modify and build upon. It is named in the manner of
>> several free/open source software archives - such as CPAN for Perl,
>> CTAN for TeX, and CRAN for R. It is currently in beta release, and
>
> Probably say alpha.
>
>> contains user contributed details of over 100 collections of open
>> content and data - including license details, tags, links and
>> comments. So far it has been developed through the efforts of
>> volunteers, and with the input of specialists - from data experts and
>> semantic web developers to researchers and academic publishers. The
>> beta version will provide the core for future developments and has
>> served as a proof-of-concept model to use to solicit for feedback
>> from the wide community of potential users. We require funding to
>> develop a more sophisticated domain model based on use cases in
>> different fields, and to significantly refine and extend the
>> codebase.
>
>> We are particularly enthusiastic to have the Mellon Foundation as a
>
> 'are'?
>
>> partner and a benefactor of CKAN because of its history of funding
>> innovative discovery and archival tools for information resources -
>> such as JSTOR, ARTstor and OCW. We strongly share the Foundation's
>> belief in the widespread benefits of open resources. Furthermore, we
>
> I think there is a danger that this is too blandly in agreement. Of 
> course we agree with stuff about openness -- that's more than apparent 
> from the site and the project.
>
>> are confident that CKAN will be of significant benefit to all of the
>> parties mentioned in the programme - from those involved in online
>> teaching and learning to those in the cultural and heritage sector,
>> from scholarly communities to those involved in the provision of
>> library and information resources. It seems likely that the registry
>
> This is important but perhaps we could give one or two specific 
> examples. Perhaps give example of how someone looking for an open 
> source package goes to freshmeat or goes to CPAN or to PyPI.
>
>> will provide cost savings for organisations that would otherwise
>> purchase knowledge packages, and will be of value to commercial
>> organisations as well as the general public.
>
> "purchasing knowledge packages."? I think this is perhaps pushing it 
> but I'd be open to comments (perhaps I'm getting too english now).
>
>> CKAN will be modular and well documented, so that it can be easily
>> modified, built upon and integrated with existing applications and
>> institutional resources. We hope it would be able to be used as an
>
> Do we really say this? We could definitely say that it would be 
> designed for customization for different areas -- e.g. geodata, 
> chemistry etc.
>
>> integral component for automated knowledge-acquisition - such that
>> content and data packages could be located and downloaded as can be
>> currently done for software. Another vision would be one in which
>> material discovered using CKAN could be dynamically explored and
>> manipulated using, e.g., best-of-breed visualisation, text mining and
>> statistical applications.
>>
>> CKAN, or something like it, will constitute an important part of the
>> infrastructure for open knowledge producers and users. We hope it
>> will stimulate innovation and growth in the ecology of open content
>> and data by facilitating re-use, re-combination and encouraging the
>> creation of derivative works.
>
> This is just a bit bland I feel. The creation of 'derivative' works. 
> We need to be a bit punchier. We also need a really good ending para.
>
>
>