[ddj] 5811 datasets in beta EU data portal

Benjamin Ooghe-Tabanou b.ooghe at gmail.com
Mon Jan 14 09:11:47 UTC 2013


And as promised, here's my scraper (very basic curl/grep/awk/sed shell
script) with the resulting data
https://github.com/RouxRC/various_scrapers/tree/master/euopendata

Benjamin


On Mon, Jan 14, 2013 at 9:41 AM, Michael Bauer <michael.bauer at okfn.org> wrote:
> Yes the API is really well hidden... (The CKAN documentation says /api so I
> tried both Benjamin tried and failed...)
>
> Michael
>
> On Fri, Jan 11, 2013 at 05:53:49PM +0100, Benjamin Ooghe-Tabanou wrote:
>> Hey Rufus,
>>
>> They should link it, I looked for it and tried with
>> open-data.europa.eu/open-data/api and open-data.europa.eu/api, didn't
>> think of open-data.europa.eu/open-data/data/api ...
>> So I ended up crawling the website...
>> By the way, now that I know about the url :
>> http://open-data.europa.eu/open-data/data/api/rest/licenses says that
>> Eurostat's license is OKD compliant.
>>
>> Benjamin
>>
>> PS: Sorry Jonathan for answering on ddj's ML but Rufus started ;)
>>
>>
>> On Fri, Jan 11, 2013 at 4:15 PM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>> > You know this a CKAN based data catalog so there's an API from which
>> > you can get all of this directly without needing to scrape ;-)
>> >
>> > http://open-data.europa.eu/open-data/data/api
>> >
>> > http://open-data.europa.eu/open-data/data/api/search/dataset?limit=2&all_fields=1
>> >
>> > Rufus
>> >
>> > On 11 January 2013 10:25, Michael Bauer <michael.bauer at okfn.org> wrote:
>> >> The metadata:
>> >>
>> >> https://scraperwiki.com/scrapers/metadata_european_open_data_catalog/
>> >>
>> >> (hope this runs through...)
>> >>
>> >> You want to use:
>> >>
>> >> https://scraperwiki.com/docs/api?name=metadata_european_open_data_catalog#sqlite
>> >>
>> >> with the query:
>> >>
>> >> select license,count(dataset_url) from `swdata` group by license
>> >>
>> >> Michael
>> >>
>> >> On Fri, Jan 11, 2013 at 08:56:47AM +0100, Michael Bauer wrote:
>> >>> Can we get the metadata out and do a statistic of this. This would mean
>> >>> that the majority of the data in the open data portal are not open (as in
>> >>> the opendefinition.org)
>> >>>
>> >>> Michael
>> >>>
>> >>> On Thu, Jan 10, 2013 at 09:29:28PM +0100, Benjamin Ooghe-Tabanou wrote:
>> >>> > Like on most data catalogs, each dataset is released under a specific licence.
>> >>> > In the EU Commission's case, there are three different ones used :
>> >>> > cc-by for about 100 datasets, a few dozens under Europa Legal Notice,
>> >>> > which seems complex but opendata-compatible, and the whole others,
>> >>> > meaning more than 95%, are released under the Eurostat
>> >>> > Copyright/Licence Policy
>> >>> > <http://epp.eurostat.ec.europa.eu/portal/page/portal/about_eurostat/policies/copyright_licence_policy>
>> >>> > This last one explicitly says that commercial reuse is possible except
>> >>> > in a broad variety of situations which the user obviously has to
>> >>> > identify for himself : this is the exact opposite of an opendata
>> >>> > licence that would get rid of any legal uncertainty that could put the
>> >>> > reuser at risk as defined in the opendefinition.
>> >>> >
>> >>> > Benjamin
>> >>> >
>> >>> > On Thu, Jan 10, 2013 at 6:04 PM, Ahmed ElAmin <elamin.ahmed at gmail.com> wrote:
>> >>> > > I don't know if you are correct Benjamin. Where did you get the info
>> >>> > > that it is for non-commercial use? Even if it says that, media can
>> >>> > > analyse and publish the data (eg see UK gov transparency data
>> >>> > > publishing, which is what this exercise by the Commission is based
>> >>> > > on). Non-commercial use normally means companies cannot package and
>> >>> > > resell the data. That's my understanding. I can always contact the
>> >>> > > Commission to clarify if people are still unsure. Here is what the
>> >>> > > site says (though it is badly put):
>> >>> > >
>> >>> > > 'This portal is about transparency, open government and innovation.
>> >>> > > The European Commission Data Portal provides access to open public
>> >>> > > data from the European Commission. It also provides access to data of
>> >>> > > other Union institutions, bodies, offices and agencies at their
>> >>> > > request. The published data can be downloaded by everyone interested
>> >>> > > to facilitate reuse, linking and the creation of innovative services.
>> >>> > > Moreover, this Data Portal promotes and builds literacy around
>> >>> > > Europe’s data. The data publishers, application developers and the
>> >>> > > general public can also use new functionalities enabled by the
>> >>> > > semantic technologies.'
>> >>> > >
>> >>> > > On 10 January 2013 15:23, Benjamin Ooghe-Tabanou <b.ooghe at gmail.com> wrote:
>> >>> > >> That is the issue... Does a media reusing data make a commercial
>> >>> > >> reuse? No one knows, and everyone is in a high legal incertainty
>> >>> > >> situation...
>> >>> > >>
>> >>> > >> Benjamin
>> >>> > >>
>> >>> > >>
>> >>> > >> On Thu, Jan 10, 2013 at 3:10 PM, Robin Linderborg
>> >>> > >> <robin.linderborg at gmail.com> wrote:
>> >>> > >>> Just of curiosity, where does one draw the line with non-commercial data?
>> >>> > >>> Surely, it must be allowed to reference the results, although not in great
>> >>> > >>> detail.
>> >>> > >>>
>> >>> > >>> /Robin Linderborg
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> 2013/1/10 Benjamin Ooghe-Tabanou <b.ooghe at gmail.com>
>> >>> > >>>>
>> >>> > >>>> Warning : most of this data can not be used by medias as it is
>> >>> > >>>> released under a non-commercial licence, which is quite surprising
>> >>> > >>>> coming from the european commission repeating in loop the strong
>> >>> > >>>> interest of OpenData for economic purposes...
>> >>> > >>>>
>> >>> > >>>> Benjamin
>> >>> > >>>>
>> >>> > >>>>
>> >>> > >>>> On Thu, Jan 10, 2013 at 11:53 AM, Ahmed ElAmin <elamin.ahmed at gmail.com>
>> >>> > >>>> wrote:
>> >>> > >>>> > Very interesting and sometimes bizzare kind of microdata begin put
>> >>> > >>>> > online as part of a Commission transparency exercise. Here is a sample
>> >>> > >>>> > of what can be found begging for infographic kind of treatment.
>> >>> > >>>> > http://open-data.europa.eu/open-data/data/
>> >>> > >>>> >
>> >>> > >>>> >
>> >>> > >>>> >     Psychiatric care beds in hospitals was updated on 09/01/13.
>> >>> > >>>> >     Total length of railway lines was updated on 09/01/13.
>> >>> > >>>> >     Public electronic procurement systems was updated on 09/01/13.
>> >>> > >>>> >     Enterprises using Internet for interaction with public authorities
>> >>> > >>>> > (NACE Rev. 1.1) was updated on 09/01/13.
>> >>> > >>>> >     Public services - Individuals was updated on 09/01/13.
>> >>> > >>>> >     Enterprises using the Internet for submitting a proposal in a
>> >>> > >>>> > public electronic tender system to public authorities was updated on
>> >>> > >>>> > 09/01/13.
>> >>> > >>>> >     Population connected to independent wastewater collecting systems:
>> >>> > >>>> > with treatment was updated on 09/01/13.
>> >>> > >>>> >     Gross value added - NACE Rev.1: L-P - current prices was updated
>> >>> > >>>> > on 09/01/13.
>> >>> > >>>> >     Water abstracted for public water supply was updated on 09/01/13.
>> >>> > >>>> >     Compensation of employees by NACE Rev.1 was updated on 09/01/13.
>> >>> > >>>> >
>> >>> > >>>> > Cheers
>> >>> > >>>> > Ahmed
>> >>> > >>>> >
>> >>> > >>>> > _______________________________________________
>> >>> > >>>> > data-driven-journalism mailing list
>> >>> > >>>> > data-driven-journalism at lists.okfn.org
>> >>> > >>>> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> >>> > >>>> > Unsubscribe:
>> >>> > >>>> > http://lists.okfn.org/mailman/options/data-driven-journalism
>> >>> > >>>>
>> >>> > >>>> _______________________________________________
>> >>> > >>>> data-driven-journalism mailing list
>> >>> > >>>> data-driven-journalism at lists.okfn.org
>> >>> > >>>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> >>> > >>>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>> >>> > >>>
>> >>> > >>>
>> >>> > >>
>> >>> > >> _______________________________________________
>> >>> > >> data-driven-journalism mailing list
>> >>> > >> data-driven-journalism at lists.okfn.org
>> >>> > >> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> >>> > >> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>> >>> >
>> >>> > _______________________________________________
>> >>> > data-driven-journalism mailing list
>> >>> > data-driven-journalism at lists.okfn.org
>> >>> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> >>> > Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>> >>>
>> >>> --
>> >>> Data Wrangler with the Open Knowledge Foundation (OKFN.org)
>> >>> GPG/PGP key: http://tentacleriot.eu/mihi.asc
>> >>> Twitter: @mihi_tr Skype: mihi_tr
>> >>>
>> >>> _______________________________________________
>> >>> data-driven-journalism mailing list
>> >>> data-driven-journalism at lists.okfn.org
>> >>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> >>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>> >>
>> >> --
>> >> Data Wrangler with the Open Knowledge Foundation (OKFN.org)
>> >> GPG/PGP key: http://tentacleriot.eu/mihi.asc
>> >> Twitter: @mihi_tr Skype: mihi_tr
>> >>
>> >> _______________________________________________
>> >> data-driven-journalism mailing list
>> >> data-driven-journalism at lists.okfn.org
>> >> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> >> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>> >
>> >
>> >
>> > --
>> > Co-Founder, Open Knowledge Foundation
>> > Promoting Open Knowledge in a Digital Age
>> > http://www.okfn.org/ - http://blog.okfn.org/
>> >
>> > _______________________________________________
>> > data-driven-journalism mailing list
>> > data-driven-journalism at lists.okfn.org
>> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> > Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>>
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>
> --
> Data Wrangler with the Open Knowledge Foundation (OKFN.org)
> GPG/PGP key: http://tentacleriot.eu/mihi.asc
> Twitter: @mihi_tr Skype: mihi_tr




More information about the data-driven-journalism mailing list