[okfn-discuss] What Do We Mean by Componentization (for Knowledge)?

Wed May 2 09:57:38 UTC 2007

I think that Rufus's essay is particularly important for informing the way 
in which we design solutions to support OK artefacts. All I would like to 
add is that we may want to think of adding a fifth principle (or make a 
variation of the incremental principle) so that we focus more on the 
properties an OK artefact itself should concentrate:

For instance, I would add that the Open Knowledge artefact should always be 
potentially unfinished or open.

In the first version of the essay it is not clear to me, whether Rufus 
refers to 'open' under incremental in terms of open participation or 'open' 
in terms of an artefact that is susceptible to continuous improvements/ 
additions. I guess that the more 'completed' or 'self-standing' an artefact 
is, the less it is possible to be an OK artefact. I would even go to the 
extent of arguing that the way the artefact is structured frames subsequent 
OK development interactions: for instance the way in which the artefact is 
atomized (it is not all artefacts that are atomized in the same way or even 
are susceptible to atomization) to a great extent influences the development 
routines.

To state it differently, is it enough to focus on the development process as 
if all artefacts are capable of being subjected to OK principles? Is the 
development process going to 'contaminate' the artefact and make it open or 
are there properties in an artefact that make it a bad candidate for OK 
development? and if there are such properties, are they essential or 
accidental?

In overall, I would be very interested if someone disagrees or has a 
different take on the whole issue or even thinks that the point is so 
obvious that there is no need to discuss it at all.

thnx
pRo

----- Original Message ----- 
From: "Rufus Pollock" <rufus.pollock at okfn.org>
To: "okfn-discuss" <okfn-discuss at lists.okfn.org>
Sent: Tuesday, May 01, 2007 2:53 PM
Subject: [okfn-discuss] What Do We Mean by Componentization (for Knowledge)?

> Also at:
>
> <http://blog.okfn.org/2007/04/30/what-do-we-mean-by-componentization-for-knowledge/>
>
> ~rufus
>
> ## Background
>
> Nearly a year ago I wrote a short essay entitled [The Four Principles of 
> (Open) Knowledge 
> Development](http://blog.okfn.org/2006/05/09/the-four-principles-of-open-knowledge-development/) 
> in which I proposed that the four key features features of a successful 
> (open) knowledge development process were that it was:
>
>   1. Incremental
>   2. Decentralized
>   3. Collaborative
>   4. Componentized
>
> As I emphasized at the time the most important feature -- and currently 
> least advanced -- was the last: Componentization. Since then I've had the 
> chance to discuss issue further, most recently and extensively at [Open 
> Knowledge 1.0](http://www.okfn.org/okcon/) and this has prompted me to 
> re-evaluate and extend the ideas I put forward in the original essay.
>
> ## What Do We Mean By Componentization?
>
> > Componentization is the process of **atomizing** (breaking down)
> resources into separate reusable **packages** that can be easily 
> recombined.
>
> Componentization is the most important feature of (open) knowledge 
> development as well as the one which is, at present, least advanced. If 
> you look at the way software has evolved it now highly componentized into 
> packages/libraries. Doing this allows one to 'divide and conquer' the 
> organizational and conceptual problems of highly complex systems. Even 
> more importantly it allows for greatly increased levels of reuse.
>
> The power and significance of componentization really comes home to one 
> when using a package manager (e.g. apt-get for debian) on a modern 
> operating system. A request to install a single given package can result 
> in the automatic discovery and installation of all packages on which that 
> one depends. The result may be a list of tens -- or even hundreds -- of 
> packages in a graphic demonstration of the way in which computer programs 
> have been broken down into interdependent components.
>
> ## Atomization
>
> Atomization denotes the breaking down of a resource such as a piece of 
> software or collection of data into smaller parts (though the word atomic 
> connotes irreducibility it is never clear what the exact irreducible, or 
> optimal, size for a given part is). For example a given software 
> application may be divided up into several components or libraries. 
> Atomization can happen on many levels.
>
> At a very low level when writing software we break thinks down into 
> functions and classes, into different files (modules) and even group 
> together different files. Similarly when creating a dataset in a database 
> we divide things into columns, tables, and groups of inter-related tables.
>
> But such divisions are only visible to the members of that specific 
> project. Anyone else has to get the entire application or entire database 
> to use one particular part of it. Furthermore anyone working on any given 
> part of one of the application or database needs to be aware of, and 
> interact with, anyone else working on it -- decentralization is impossible 
> or extremely limited.
>
> Thus, atomization at such a low level is not what we are really concerned 
> with, instead it is with atomization into **Packages**:
>
>
> ## Packaging
>
> By packaging we mean the process by which a resource is made reusable by 
> the addition of an external interface. The package is therefore the 
> logical unit of distribution and reuse and it is only with packaging that 
> the full power of atomization's "divide and conquer" comes into play --  
> without it there is still tight coupling between different parts of a 
> given set of resources.
>
> Developing packages is a non-trivial exercise precisely because developing 
> good *stable* interfaces (usually in the form of a code or knowledge API) 
> is hard. One way to manage this need to provide stability but still remain 
> flexible in terms of future development is to employ versioning. By 
> versioning the package and providing 'releases' those who reuse the 
> packaged resource can use a specific (and stable) release while 
> development and changes are made in the 'trunk' and become available in 
> later releases. This practice of versioning and releasing is already 
> ubiquitous in software development -- so ubiquitous it is practically 
> taken for granted -- but is almost unknown in the area of knowledge.
>
>
> ## A Basic Example: A Photo Collection
>
> Imagine we had a large store of photos, say more than 100k of individual 
> pictures (~50GB of data at 500k per picture). Suppose that initially this 
> data is just sitting as a large set of files on disk somewhere. Consider 
> several possibilities for how we could make them available:
>
> 1. Bundle all the photos together (zip/tgz) and post them for download. 
> Comment: this is a very crude approach to componentization. There is 
> little atomization and the 'knowledge-API' is practically non-existent (it 
> consists solely of the filenames and directory structure).
>
> 2. In addition tag or categorize the photos and make this database 
> available as part of the download. Comment: By adding some structured 
> metadata we have started to develop an 'knowledge-API' for the underlying 
> resource that makes it more useful. One could now write a screensaver 
> program which showed photos from a particular category or auto-import 
> photos by their area.
>
> 3. In addition suppose the photos fall into several well-defined and 
> distinct classes (e.g. photos of animals, of buildings and of works of 
> art). Divide the photo collection into these three categories and make 
> each of them as a separate download. Comment: A initial step on atomizing 
> the resource to make it more useful, after all 5GB is rather a lot to 
> download for one photo.
>
> 4. In addition to dividing them up allow different people to maintain the 
> tags for different categories (one might imagine those knowledgeable about 
> animals are different from those knowledgeable about art). Comment: 
> Atomization assists the development of good knowledge-APIs (the human mind 
> is limited and divide and conquer helps us deal with the complexity).
>
> 5. Standardize the ids for each photo (if this hasn't been done already) 
> and separate the tags/categories data from the underlying photo data. This 
> way multiple (independent) groups can provide tags/categorization data for 
> the photos. Comment: Repackaging -- along with the development of a better 
> knowledge-API for the basic resource -- allows a dramatic decrease in the 
> level of coupling and increase the scope for independent development of 
> complementary libraries (the tags). This in turn will increase the utility 
> to end users.
>
>
> ## Conclusion
>
> In the early days of software there was also little arms-length reuse 
> because there was little packaging. Hardware was so expensive, and so 
> limited, that it made sense for all software to be bespoke and little 
> effort to be put into building libraries or packages. Only gradually did 
> the modern complex, though still crude, system develop.
>
> The same evolution can be expected for knowledge. At present knowledge 
> development displays very little componentization but as the underlying 
> pool of raw, 'unpackaged', information continues to increase there will be 
> increasing emphasis on componentization and reuse it supports. (One can 
> conceptualize this as a question of interface vs. the content. Currently 
> 90% of effort goes into the content and 10% goes into the interface. With 
> components this will change to 90% on the interface 10% on the content).
>
> The change to a componentized architecture will be complex but, once 
> achieved, will revolutionize the production and development of open 
> knowledge.
>
>
>
> -- 
> Executive Director, Open Knowledge Foundation
> m: +44 (0)7795 176 976
> www: http://www.okfn.org/ | blog: http://blog.okfn.org/
>
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss at lists.okfn.org
> http://lists.okfn.org/cgi-bin/mailman/listinfo/okfn-discuss 

***** Email confidentiality notice *****
This message is private and confidential. If you have received this message in error, please notify us and remove it from your system.

The London School of Economics and Political Science (the School) is a company limited by guarantee, registered in England and Wales, under registered number 00070527, and having its registered office at 10th Floor, Tower One, Houghton Street, London WC2A 2AE.

The inclusion of this information does not of itself make this email a business document of the School and, to the maximum extent permitted by law, the School accepts no liability for the content and opinions in any non-business emails.