[open-science] first steps with KNB
Tom Roche
Tom_Roche at pobox.com
Thu Dec 13 05:22:33 UTC 2012
http://lists.okfn.org/pipermail/open-science/2012-December/001943.html
>> Much science informatics involves manipulation of input data, e.g.,
>> to produce visualizations, or just refined data ("analyses," in
>> [meteorological jargon]) for another stage in a pipeline. It's
>> therefore useful for open-science projects to host not only code
>> but [also] its associated inputs and outputs (here called "I+O").
http://lists.okfn.org/pipermail/open-science/2012-December/001948.html
> the KNB
http://knb.ecoinformatics.org/
> allows any data to be uploaded that is relevant to science and
> is legal to redistribute.
So for the benefit of the next open-science newbie, some observations
and questions:
* Initial contact with KNB is fairly intuitive: from the homepage above,
one can see how to create an account, and that one should install and
use Morpho
http://knb.ecoinformatics.org/morphoportal.jsp
to get data into KNB.
* Being a jar, Morpho is pretty easy to install and use (at least on
linux), though
** There is nothing on the homepage to suggest that its installer is 84
MB, so one may be surprised by the time required to download it.
(More on progress UI below.)
** One should have some java background: at least, one should know
how to define one's JAVA_HOME
* On running Morpho, one must input a lotta metadata, both for the "data
package" (or project, i.e., data on oneself and one's collaborators)
and for the actual data (or "data table"). I suspect this is much less
onerous if one's data is text, since Morpho seems to have the ability
to extract metadata from that, if available.
** (caveat: I may overestimate the global importance of my own concerns
:-) One hopes Morpho will gain the ability to extract metadata from
widely-used, explicitly-metadata-supporting (aka "self-describing")
binary formats such as netCDF
http://en.wikipedia.org/wiki/Netcdf#Format_description
** IMHO, a salient weakness of the Morpho UI is its lack of progress
feedback. E.g., on choosing to save my data "to the network" (which
seems to mean, to save to KNB--I confess to not Reading The Fine
Documentation which installs with Morpho before using it), I hit a
button on a dialog ... which just sat there for quite awhile, until
completing (hopefully :-)
? Having (apparently) gotten the data into KNB, I'm now wondering, how
to get it out programmatically? If I search Metacat for my data
package, I can browse to the file. Should I just call that (ginormous)
URI via, e.g., `wget`? Or is there API for this usecase?
** I see a lot more information on the KNB site about how to get data
in, and how the infrastructure works, than about accessing data. The
search interface seems pretty good, though.
? Where should one go for more detailed questions, bug reports, feature
requests, etc?
?? Are there different user-level communities for DataONE and KNB? If
so, how to determine which to access for a given problem/question/
request?
HTH, Tom Roche <Tom_Roche at pobox.com>
More information about the open-science
mailing list