[ckan4rdm] catalogue of CKAN research data repositories

John Erickson erickj4 at rpi.edu
Fri Aug 8 13:54:49 UTC 2014


Semantic markup following http://schema.org/DataCatalog and
http://schema.org/Dataset can be very useful outside the context of
the big search engines, and esp. can be very useful in supporting
"human curated search."

Specific example: members of the Web Science Trust network of labs
<http://webscience.org/> are developing a Web Observatories vocabulary
<http://tw.rpi.edu/web/WOschema>, following the schema.org framework.
The WebObs vocabulary might never become an official extension, but
WSTnet members are already using it as the basis for a crawler that
can extract metadata from known sites, aggregating a
searchable/browsable graph of WebObs sites, projects, datasets, tools,
etc. The WebObs vocab inherits Dataset and DataCatalog (and other
schema.org) classes, so those remain indexable by the schema.org
partners even if the WebObs terms aren't included.

If someone were to use the vocabulary in a malicious way, a curator
would remove the target URL for teh offending candidate site from the
crawl list. Meanwhile, the markup on the good sites is useful to
curators...

John

On Fri, Aug 8, 2014 at 8:46 AM, Ross Jones <ross at servercode.co.uk> wrote:
> It's worth pointing out that implementing a schema.org schema is only of use in a Google Custom Search, not currently in the main search index (afaict).
>
> I'd rather there was a human curated list than relying on something so easily gamed.
>
> Ross
>
>
> On 8 Aug 2014, at 13:41, John Erickson <erickj4 at rpi.edu> wrote:
>
>> Perhaps if CKAN repositories were to...
>>
>> * Use http://schema.org/DataCatalog to expose catalog metadata...
>> * Use http://schema.org/Dataset to expose dataset metadata...
>> * Adopt appropriate values in the metadata elements, to accurately
>> identify the data as research data, from particular domains...
>>
>> ...then a stand-alone "catalogue" or "good list" of research data
>> repositories would not be required; rather, "Big Search" (The Google
>> et.al.) could be used to locate research data with good fidelity.
>> Alternatively (or in addition), crawlers could be used to extract and
>> aggregate such metadata to automagically create such a "good list..."
>>
>> John
>>
>>
>>
>> On Fri, Aug 8, 2014 at 7:17 AM, Heinrich Widmann <widmann at dkrz.de> wrote:
>>> Hi Vasily,
>>> Good wuestion, I only know the official list of CKAN portals at
>>> http://ckan.org/instances/#
>>> but of course a list 'limited to research data repos' would be helpful.
>>> Actually I started to list some interesting CKAN portals at the bottom of
>>> the communities table in the EUDAT wiki.
>>>
>>> Best,
>>> Heinrich
>>>
>>> Am 08/08/2014 12:05, schrieb vasily.bunakov at stfc.ac.uk:
>>>
>>>> Hi,
>>>>
>>>> Please is there any catalogue or a good list of research data repositories
>>>> based on CKAN? There was a nice compilation made by St Andrews some time ago
>>>> http://research-computing.wp.st-andrews.ac.uk/2013/11/27/using-ckan-for-research-data-management/
>>>> but it had its own scope (technology evaluation), the list of CKAN research
>>>> implementers in the UK might have expanded since then, also there may be
>>>> success stories beyond the UK.
>>>>
>>>> With kind regards,
>>>> Vasily Bunakov
>>>> STFC Scientific Computing
>>>
>>>
>>> --
>>> -----------------------------\\---------------------------------------
>>> Heinrich Widmann              \\ Deutsches Klimarechenzentrum GmbH
>>> Phone: +49 40 460094 282       \\   Abteilung Datenmanagement
>>> FAX:   +49 40 460094 270        \\    Bundesstr. 45a
>>> Email: widmann at dkrz.de           \\   D-20146 Hamburg
>>> http://www.dkrz.de                \\  Germany
>>> -----------------------------------\\---------------------------------
>>>
>>>
>>> _______________________________________________
>>> ckan4rdm mailing list
>>> ckan4rdm at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>
>>
>>
>>
>> --
>> John S. Erickson, Ph.D.
>> Deputy Director, Web Science Research Center
>> Tetherless World Constellation (RPI)
>> <http://tw.rpi.edu> <erickj4 at rpi.edu>
>> Twitter & Skype: olyerickson
>> _______________________________________________
>> ckan4rdm mailing list
>> ckan4rdm at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>
> _______________________________________________
> ckan4rdm mailing list
> ckan4rdm at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>



-- 
John S. Erickson, Ph.D.
Deputy Director, Web Science Research Center
Tetherless World Constellation (RPI)
<http://tw.rpi.edu> <erickj4 at rpi.edu>
Twitter & Skype: olyerickson



More information about the ckan4rdm mailing list