[open-science] first steps with KNB

Tom Roche Tom_Roche at pobox.com
Thu Dec 13 05:22:33 UTC 2012


http://lists.okfn.org/pipermail/open-science/2012-December/001943.html
>> Much science informatics involves manipulation of input data, e.g.,
>> to produce visualizations, or just refined data ("analyses," in
>> [meteorological jargon]) for another stage in a pipeline. It's
>> therefore useful for open-science projects to host not only code
>> but [also] its associated inputs and outputs (here called "I+O").

http://lists.okfn.org/pipermail/open-science/2012-December/001948.html
> the KNB

http://knb.ecoinformatics.org/

> allows any data to be uploaded that is relevant to science and
> is legal to redistribute.

So for the benefit of the next open-science newbie, some observations
and questions:

* Initial contact with KNB is fairly intuitive: from the homepage above,
  one can see how to create an account, and that one should install and
  use Morpho

http://knb.ecoinformatics.org/morphoportal.jsp

  to get data into KNB.

* Being a jar, Morpho is pretty easy to install and use (at least on
  linux), though

** There is nothing on the homepage to suggest that its installer is 84
   MB, so one may be surprised by the time required to download it.
   (More on progress UI below.)

** One should have some java background: at least, one should know
   how to define one's JAVA_HOME

* On running Morpho, one must input a lotta metadata, both for the "data
  package" (or project, i.e., data on oneself and one's collaborators)
  and for the actual data (or "data table"). I suspect this is much less
  onerous if one's data is text, since Morpho seems to have the ability
  to extract metadata from that, if available.

** (caveat: I may overestimate the global importance of my own concerns
   :-) One hopes Morpho will gain the ability to extract metadata from
   widely-used, explicitly-metadata-supporting (aka "self-describing")
   binary formats such as netCDF

http://en.wikipedia.org/wiki/Netcdf#Format_description

** IMHO, a salient weakness of the Morpho UI is its lack of progress
   feedback. E.g., on choosing to save my data "to the network" (which
   seems to mean, to save to KNB--I confess to not Reading The Fine
   Documentation which installs with Morpho before using it), I hit a
   button on a dialog ... which just sat there for quite awhile, until
   completing (hopefully :-)

? Having (apparently) gotten the data into KNB, I'm now wondering, how
  to get it out programmatically? If I search Metacat for my data
  package, I can browse to the file. Should I just call that (ginormous)
  URI via, e.g., `wget`? Or is there API for this usecase?

** I see a lot more information on the KNB site about how to get data
   in, and how the infrastructure works, than about accessing data. The
   search interface seems pretty good, though.

? Where should one go for more detailed questions, bug reports, feature
  requests, etc?

?? Are there different user-level communities for DataONE and KNB? If
   so, how to determine which to access for a given problem/question/
   request?

HTH, Tom Roche <Tom_Roche at pobox.com>




More information about the open-science mailing list