[open-science] first steps with KNB

Matt Jones jones at nceas.ucsb.edu
Thu Dec 13 21:26:48 UTC 2012


Hi Tom --

To answer a couple of your followup questions...

> that one should install and use Morpho

Actually, you can use Morpho, or several other options, including the Web
form that is linked on the home page of the KNB entitled 'Register your
dataset online', or you can use the REST API to upload data and metadata
using your own scripts.  Our metadata tools are built to encourage the use
of good metadata, so tools like Morpho will prompt you for a fairly
complete set of information.  But that is not required -- using the REST
API you could upload bare files without much metadata, but we try to
encourage people to document their stuff more thoroughly. You are right
that Morpho is targeted at text files, and particularly csv files, and so
those are the easiest to enter.  You can attach netcdf and other binary
files, but Morpho can't parse those to help provide metadata.  I agree, it
would be a great addition to Morpho to support NetCDF more fully.  Alas,
open source... if you have an itch, scratch it.... or at least file a
feature request so we know there is a desire for such a feature.  Also
note, Morpho is targeted at user's that have never developed metadata, and
might need to upload a few data sets a year -- we step people through,
explaining each of the fields in fair detail.  It can get fairly tedious if
you have a lot of data to upload.  Once you are familiar with the metadata,
most people that process a lot of data transition to using scripts that
call the REST API.

>  ? Having (apparently) gotten the data into KNB, I'm now wondering, how
>  to get it out programmatically? If I search Metacat for my data
>  package, I can browse to the file. Should I just call that (ginormous)
>  URI via, e.g., `wget`? Or is there API for this usecase?

You can get it from the DataONE REST API, which would be of the form:
    https://knb.ecoinformatics.org/knb/d1/mn/v1/object/{pid}
where {pid} is the permanent identifier for the object you uploaded.  The
API is documented at the link I sent earlier, and this particular method is
documented here:

http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html#MNRead.get

> ? Where should one go for more detailed questions, bug reports, feature
>  requests, etc?
For KNB, you can write to knb-help at nceas.ucsb.edu, or if you have
Morpho-specific questions, you can write to morpho-dev at ecoinformatics.org.
 If you have general questions about DataONE, you can write to
support at dataone,org, but remember that DataONE is the federation -- most
support for getting data into and out of repositories probably belongs at
the repository's own support channels.  We also have an IRC channel if you
want to talk to people more in real time -- connect on
irc.ecoinformatics.org in channels #kdi for Morpho/Metacat questions and
#dataone for DataONE discussions.  I can see we need to add these support
links to our web site -- I'll be sure to get them there.


> ?? Are there different user-level communities for DataONE and KNB? If
>   so, how to determine which to access for a given problem/question/
>   request?
Yes.  DataONE has the DataONE User's Group, which is appropriate for
questions and discussions about the broader federation that DataONE
represents.  KNB is a repository (one of many in the DataONE federation),
and so its user community is somewhat more focused -- mainly
ecological, environmental  and ocean sciences, although there are no hard
boundaries -- there's an incredibly broad set of data there.  For DataONE,
you can use 'developers at dataone.org' to discuss the DataONE APIs and other
features of the federation.

Let me know if you have further questions, or visit us on the KNB & DataONE
lists.

Regards,
Matt



On Wed, Dec 12, 2012 at 8:22 PM, Tom Roche <Tom_Roche at pobox.com> wrote:

>
> http://lists.okfn.org/pipermail/open-science/2012-December/001943.html
> >> Much science informatics involves manipulation of input data, e.g.,
> >> to produce visualizations, or just refined data ("analyses," in
> >> [meteorological jargon]) for another stage in a pipeline. It's
> >> therefore useful for open-science projects to host not only code
> >> but [also] its associated inputs and outputs (here called "I+O").
>
> http://lists.okfn.org/pipermail/open-science/2012-December/001948.html
> > the KNB
>
> http://knb.ecoinformatics.org/
>
> > allows any data to be uploaded that is relevant to science and
> > is legal to redistribute.
>
> So for the benefit of the next open-science newbie, some observations
> and questions:
>
> * Initial contact with KNB is fairly intuitive: from the homepage above,
>   one can see how to create an account, and that one should install and
>   use Morpho
>
> http://knb.ecoinformatics.org/morphoportal.jsp
>
>   to get data into KNB.
>
> * Being a jar, Morpho is pretty easy to install and use (at least on
>   linux), though
>
> ** There is nothing on the homepage to suggest that its installer is 84
>    MB, so one may be surprised by the time required to download it.
>    (More on progress UI below.)
>
> ** One should have some java background: at least, one should know
>    how to define one's JAVA_HOME
>
> * On running Morpho, one must input a lotta metadata, both for the "data
>   package" (or project, i.e., data on oneself and one's collaborators)
>   and for the actual data (or "data table"). I suspect this is much less
>   onerous if one's data is text, since Morpho seems to have the ability
>   to extract metadata from that, if available.
>
> ** (caveat: I may overestimate the global importance of my own concerns
>    :-) One hopes Morpho will gain the ability to extract metadata from
>    widely-used, explicitly-metadata-supporting (aka "self-describing")
>    binary formats such as netCDF
>
> http://en.wikipedia.org/wiki/Netcdf#Format_description
>
> ** IMHO, a salient weakness of the Morpho UI is its lack of progress
>    feedback. E.g., on choosing to save my data "to the network" (which
>    seems to mean, to save to KNB--I confess to not Reading The Fine
>    Documentation which installs with Morpho before using it), I hit a
>    button on a dialog ... which just sat there for quite awhile, until
>    completing (hopefully :-)
>
> ? Having (apparently) gotten the data into KNB, I'm now wondering, how
>   to get it out programmatically? If I search Metacat for my data
>   package, I can browse to the file. Should I just call that (ginormous)
>   URI via, e.g., `wget`? Or is there API for this usecase?
>
> ** I see a lot more information on the KNB site about how to get data
>    in, and how the infrastructure works, than about accessing data. The
>    search interface seems pretty good, though.
>
> ? Where should one go for more detailed questions, bug reports, feature
>   requests, etc?
>
> ?? Are there different user-level communities for DataONE and KNB? If
>    so, how to determine which to access for a given problem/question/
>    request?
>
> HTH, Tom Roche <Tom_Roche at pobox.com>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
> Unsubscribe: http://lists.okfn.org/mailman/options/open-science
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20121213/f87de683/attachment-0001.html>


More information about the open-science mailing list