[School-of-data] Help Defining Dataset Definition and Quality Parameters

Michael Bauer michael.bauer at okfn.org
Tue Apr 30 07:06:08 UTC 2013


Everyone,

There a several things that are crucial if I think of the quality of open
data:

1) What kind of datasets are released. Is it hot and spicy stuff like
company registers, election data, budget data, transport data etc. or is it
something like the dog register, positions of public toilets etc. - this
can easily be done manually - if you do this: please submit to the OpenData
Census:  http://census.okfn.org/country/

2) Is the data actually downloadable? There is someone in Austria who does
this automatically for austria - some datasets are linked but the link is
old and just goes somewhere

3) Is it in machine readable format (or just .pdf or .doc files) (yes some
open data portals consider this data)

4) How finely grained is the data: eg. the city of vienna publishes their
budget as open data - but only the highly aggregated version: each dataset
has about 5 datapoints. So while the budget is there, it's pretty useless
for analysis.

5) Licenses: is it actually a license that is open according to the open
definition: http://opendefinition.org/ (German dataportals like to use
non-open licenses eg, as does the EU)

I'd try to work down these five criteria, explain why each of them is
important and base my analysis on it.

Michael

On Mon, Apr 29, 2013 at 01:15:17PM +0100, Tarek Amr wrote:
> As far as I understood, there should be two approaches to do so.
> 
> (A) You can do it manually, sort of. Let's say you set some rules to
> measure collaboration. Number of edits for each file, number of people who
> edit it, may be quantity of discussions, process log files, you name it.
> 
> (B) Learn a computer to do this for you. In such case, you need some files
> or records that you know they represent collaboration, and some that don't.
> They you learn a classifier on the attributes of those records or resources
> and then use it to tell if the whole data in general represent
> collaboration or not.
> 
> Regarding the quality of the data, I guess you should check the following:
> 
> - Is it easy to transform the data to an open format a computer can read
> and process
> - Is it easy to extract some features from the data (for example: number of
> edits on each file, their data, who edited them, etc)
> - Aren't there any missing data
> 
> Anyway, those are my (not even) $ 0.002, so will wait for more experienced
> ones to add their input here
> 
> 
> On Mon, Apr 29, 2013 at 12:49 PM, Matan Rotman <matan.rotman at gmail.com>wrote:
> 
> > Hello all,
> >
> > My name is Matan Rotman, and I'm a student at Hebrew University majoring
> > Political Science. As a part of my studying, I'm writing a paper that tries
> > to understand whether the Israel open data program is efficient
> > (Collaberative-wise), and if not, why not (hence, why wouldn't
> > administrative dept. won't cooperate with the program). The first thing I
> > need to do for that, though, is to understand if the datasets that are on
> > the website are of quality or not. As i'm not a technical guy, I could use
> > some help understanding what would be considered as the definition of a
> > dataset (hopefully, as particular as possible), and more important, I could
> > really use for some help with defining quality parameters so I could
> > measure the quality of the different files and sets uploaded.
> >
> > The website is at http://data.gov.il (all in Hebrew though), and I'd love
> > any help on the subject possible
> >
> > P.S
> > I hope this is the right place to ask, and I'm going to ask also at Open
> > government mailing list, so I apologize if you get my message twice.
> >
> > Best Regards,
> > Matan
> >
> > _______________________________________________
> > School-of-data mailing list
> > School-of-data at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/school-of-data
> > Unsubscribe: http://lists.okfn.org/mailman/options/school-of-data
> >
> >
> 
> 
> -- 
> Best Regards
> Tarek Amr
> 
> http://about.me/tarekamr

> _______________________________________________
> School-of-data mailing list
> School-of-data at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/school-of-data
> Unsubscribe: http://lists.okfn.org/mailman/options/school-of-data


-- 
Data Wrangler with the Open Knowledge Foundation (OKFN.org)
GPG/PGP key: http://tentacleriot.eu/mihi.asc
Twitter: @mihi_tr Skype: mihi_tr




More information about the school-of-data mailing list