[datahub-discuss] Introducing some metadata and muself

Wed Jun 3 15:05:59 UTC 2015

Thanks for the hacklay list link.
This might not be the right list for me, but I'll will see.

Ideally the sites turn the harvester on themselves and provide metadata not only embedded as JSON in their pages, but also as a full list of each and every dataset, preferably a database that can also be queried remotely

Yes, TRUD is not ideal, you have to provide a 'business case' for most subscriptions, and check all the terms boxes for each subscription (100+).
There's an ftp site too, which again needs a business case (they refused mine for that recently, so have to try again). I advocate the ftp site should be open

Have seen your and yours scrapers, from which I learned about the embedded JSON at HSCIC;^)
Not really a need to host the data yourselves if you can provide links that persist (a while).
I pull it all down, as I load some of it on my DBs and reorganise it to enable cross-referencing, and (from experience) want to have a save copy when I use it for analysis to avoid getting caught by unannounced modifications.

Strangely health data has the tendency to start morphing, just by people looking at it, even becoming slightly schizoid when accommodating different groups like a few trusts doing a HES-refresh before the HES-extraction date and a 'defresh' after the HES extraction date.

Cheers

-----Original Message-----
From: Ross Jones [mailto:ross at servercode.co.uk] 
Sent: 03 June 2015 15:15
To: P. Harry E. Coenen
Cc: datahub-discuss at lists.okfn.org
Subject: Re: [datahub-discuss] Introducing some metadata and muself

Hi Harry,

Whilst the health data stuff might be useful on datahub.io (someone should turn on the harvester ;) ) it might be best to discuss this type of thing at the NHSHackDay list (https://groups.google.com/forum/?fromgroups#!forum/nhshackday) , which is more focussed on health - this list is really for conversations specifically about datahub.io ;)

> On 3 Jun 2015, at 14:58, P. Harry E. Coenen <pharryecoenen at aol.com> wrote:
> First, anyone can subscribe to TRUD at 
> http://www.uktcregistration.nss.cfh.nhs.uk/
> and get access to most NHS metadata

Whilst this is true, I've had a long running moan, I mean discussion, with some people about the process. 
It turns out the entire registration faffing about is *solely* so that they can email you when data is updated so you're not out of date. I understand the reasoning I guess, but there must be a better way.

> Second, I’ve seen some of your data scraping attempts of old and new 
> NHS sites Did the same myself for:
> http://www.hscic.gov.uk/
> https://data.england.nhs.uk/
> http://data.gov.uk/
> https://indicators.ic.nhs.uk/webview/
> and probably with better results

Most of the indicators and HSCIC datasets should be on https://data.england.nhs.uk/ now - the scrapers for them are at https://github.com/nhsengland/publish-o-matic if you're interested. Getting the data out of data.england.nhs.uk into datahub.io should be do-able as well, as I'm not sure whether the nhsengland ckan instance is actually live or not.

Ross.