[OpenSpending] Ecosystem of tools: Working with Spending Data

Lucy Chambers lucy.chambers at okfn.org
Thu May 2 02:07:24 UTC 2013


Hi All,

A quick question, I'm trying to draw up an 'ideal' ecosystem of tools for
working with spending data so that we can work out how to teach them more
effectively and hopefully, get some more exciting Spending Data related
projects out there.

I've attached my current thoughts in a draft and would be grateful for
input from the group!:

https://dl.dropboxusercontent.com/u/7348125/Spending%20Data%20-%20Tool%20Ecosystem.doc

It's important that the audience for this is NGOs, who probably cannot
code, so I would prefer to keep this list relatively short and to the
point.

(I have also copy-pasted the text from the document below for your
convenience - although- the structure may not make it through the mailing
list!)

Lucy







*Spending Data: The Tool Ecosystem *


There are a set of staple tools that can be used to tackle the issues
highlighted by the organisations in this report. For each one - we’ve
outlined the tool - what it’s useful for and what the barrier to entry is.


Key:


*Basic* = An off-the-shelf tool that can be learned and first independent
usage made of within 1 day. No installation on servers etc required.

*Intermediate* = Between 1 day - 1 week to master basic functionality. May
require tweaking of code but not new creation thereof.

*Advanced* = Requires code


*Extracting and Getting Data *



  *Issue*

*Tools*

*Level*

Data not available

Freedom of Information Portals

Basic - though some education may be required to inform people that they
have the right to ask, how to phrase an FOI request etc, whether it is
possible to submit these requests electronically etc.

**

*Case in point: The group of people assembled in Romania said that they
never submitted requests electronically because they could not prove the
date the request was submitted (better to have a stamp on paper.) *

Data available online but not downloadable. (e.g. in HTML tables on
webpages)

For simple sites (information on an individual webpage) Google Spreadsheets
and ImportHTML Function, or the Google scraper extension (basic).


For more complex webpages (information spread across numerous pages) - a
scraper will be required. Scrapers are ways to extract structured
information from websites using code. There is a useful tool to make doing
this easier online - Scraperwiki <http://scraperwiki.org/>.  (advanced).

For the basic level, anyone who can use a spreadsheet and functions can use
it. It is not, however, a well-known command and awareness must be spread
about how it can be used. (People often daunted because they presume
scraping involves code).


(See School of Data course:


Scraping using code is advanced, and requires knowledge of at least one
programming language.


  Data available only in PDFS

A variety of tools are available - many require knowledge of code to
operate. Most promising non-code variants are ABBYY Finereader (not free)
and Tabula (new software, still a bit buggy and requires people to be able
to host it themselves to use.)


For more info - see School of Data course.
http://schoolofdata.org/handbook/courses/extracting-data-from-pdf/


Note: these tools are still imperfect and it is still vastly preferable to
advocate for data in the correct formats, rather than teach people how to
extract.

Most require knowledge of coding - some progress being made on
non-technical ones.



*Cleaning, Working with and Analyzing Data *


Note: One obvious omission in this section is statistical software such as
SPSS. The reason for the omission is that interviewees seemed largely to
have been trained to use such software. The other software in this list was
more likely to be outside the comfort zone of the general practitioner.



  *Issue*

*Tools*

*Level*

Messy data, typos, blanks (various)

Spreadsheets, Google Refine

Basic (but not too powerful, particularly on big datasets), Intermediate

Need to reconcile entities against one another (to answer questions such
as, what is company X)

Nomenklatura, OpenCorporates, PublicBodies.org

Advanced

Need to be able to conceptualize networks and relationships between
entities

Gephi

Intermediate - advanced.


*Note: This may not perform all of the functions that tools such as ‘the
Network’ from K-Monitor is intended to (no link to database of articles),
however, data can be structured quite simply to visualise networks of
interaction.  *

Need to be able to work with many many lines of data (too big to be able to
fit in Excel).


*Note: As few countries currently release transaction level data, this is
not a frequent problem, but is already problematic in places such as
Brazil, US and the UK. As we push for greater disclosure, this will be
needed ever more. *

OpenSpending.org, Other database software

OpenSpending.org - easy for basic upload search and interrogation, some
advanced queries may require knowledge of coding.


Databases - Intermediate.

Repetitive tasks or modelling

Macros - Excel

Basic - Intermediate.

Entity Extraction (e.g. from large bodies of documents)

Open Calais

Intermediate. This is also not a perfect method.



*Presenting Data *



  *Issue*

*Tools*

*Level*

Basic visualisation, time series, bar charts

DataWrapper, Tableau Public, Many Eyes, Google Tools

Basic

Mapping issues

TileMill, Google Fusion Tables, Kartograph, QGis

Basic - Advanced

Creating a citizen’s budget

OpenSpending.org, Off-the shelf tools listed above, Open Source tools from
around the community

OpenSpending.org - basic.



*Publishing Data *




  *Issue*

*Tools*

*Level*

Need a place online to store and manage data, raw, from Freedom of
Information Requests

DataNest, CKAN, Socrata - various Data Portal Software options

Basic to use, can be tricky to get running and set up.



See also: http://openspending.org/resources/handbook/ch014_resources.html



-- 
*Project Coordinator*
School of Data <http://schoolofdata.org/> and
OpenSpending <http://openspending.org/>
Projects of the Open Knowledge Foundation <http://okfn.org/>
Support our work <http://okfn.org/support/>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending/attachments/20130502/168762cd/attachment.html>


More information about the openspending mailing list