[OpenSpending] Ecosystem of tools: Working with Spending Data

Sam Smith s at msmith.net
Thu May 2 12:29:28 UTC 2013


This looks great.

One thing I especially like is the fact that easy rebuttals to "we can't do that here" type responses are all ingrained in the text, in a way which is probably incredibly helpful for people who don't realise that they need that yet. 



Sam

On 2 May 2013, at 12:16, Lucy Chambers <lucy.chambers at okfn.org> wrote:

> Ah, yes - thanks Kathryn - I keep making that mistake!
> 
> @Pedro, as per your request - I've uploaded the rtf version to GDocs - here
> it is:
> 
> https://docs.google.com/a/okfn.org/document/d/12Wyqif_uqX01NYgY9xZumpq-y_NmU2ipoT3_FkU8AFU/edit
> 
> Lucy
> 
> 
> On 2 May 2013 05:54, Kathryn.Corrick <kathryn.corrick at googlemail.com> wrote:
> 
>> Hi Lucy,
>> This is a great list. No additional tools that I can think of that meet
>> your criteria but I'll ask the ODI team just in case.
>> 
>> However, you may want to be aware that Google Refine is now Open Refine.
>> The explanatory video for Google Refine which applies to Open Refine for
>> the moment is a good one, so it might be worth adding to your list if there
>> are intro videos or how-tos to the tools listed.
>> 
>> Kathryn
>> 
>> 
>> On 2 May 2013, at 03:07, Lucy Chambers <lucy.chambers at okfn.org> wrote:
>> 
>> Hi All,
>> 
>> A quick question, I'm trying to draw up an 'ideal' ecosystem of tools for
>> working with spending data so that we can work out how to teach them more
>> effectively and hopefully, get some more exciting Spending Data related
>> projects out there.
>> 
>> I've attached my current thoughts in a draft and would be grateful for
>> input from the group!:
>> 
>> 
>> https://dl.dropboxusercontent.com/u/7348125/Spending%20Data%20-%20Tool%20Ecosystem.doc
>> 
>> It's important that the audience for this is NGOs, who probably cannot
>> code, so I would prefer to keep this list relatively short and to the
>> point.
>> 
>> (I have also copy-pasted the text from the document below for your
>> convenience - although- the structure may not make it through the mailing
>> list!)
>> 
>> Lucy
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> *Spending Data: The Tool Ecosystem *
>> 
>> 
>> There are a set of staple tools that can be used to tackle the issues
>> highlighted by the organisations in this report. For each one - we’ve
>> outlined the tool - what it’s useful for and what the barrier to entry is.
>> 
>> 
>> Key:
>> 
>> 
>> *Basic* = An off-the-shelf tool that can be learned and first independent
>> usage made of within 1 day. No installation on servers etc required.
>> 
>> *Intermediate* = Between 1 day - 1 week to master basic functionality.
>> May require tweaking of code but not new creation thereof.
>> 
>> *Advanced* = Requires code
>> 
>> 
>> *Extracting and Getting Data *
>> 
>> 
>> 
>>  *Issue*
>> 
>> *Tools*
>> 
>> *Level*
>> 
>> Data not available
>> 
>> Freedom of Information Portals
>> 
>> Basic - though some education may be required to inform people that they
>> have the right to ask, how to phrase an FOI request etc, whether it is
>> possible to submit these requests electronically etc.
>> 
>> **
>> 
>> *Case in point: The group of people assembled in Romania said that they
>> never submitted requests electronically because they could not prove the
>> date the request was submitted (better to have a stamp on paper.) *
>> 
>> Data available online but not downloadable. (e.g. in HTML tables on
>> webpages)
>> 
>> For simple sites (information on an individual webpage) Google
>> Spreadsheets and ImportHTML Function, or the Google scraper extension
>> (basic).
>> 
>> 
>> For more complex webpages (information spread across numerous pages) - a
>> scraper will be required. Scrapers are ways to extract structured
>> information from websites using code. There is a useful tool to make doing
>> this easier online - Scraperwiki <http://scraperwiki.org/>.  (advanced).
>> 
>> For the basic level, anyone who can use a spreadsheet and functions can
>> use it. It is not, however, a well-known command and awareness must be
>> spread about how it can be used. (People often daunted because they presume
>> scraping involves code).
>> 
>> 
>> (See School of Data course:
>> 
>> 
>> Scraping using code is advanced, and requires knowledge of at least one
>> programming language.
>> 
>> 
>>  Data available only in PDFS
>> 
>> A variety of tools are available - many require knowledge of code to
>> operate. Most promising non-code variants are ABBYY Finereader (not free)
>> and Tabula (new software, still a bit buggy and requires people to be able
>> to host it themselves to use.)
>> 
>> 
>> For more info - see School of Data course.
>> http://schoolofdata.org/handbook/courses/extracting-data-from-pdf/
>> 
>> 
>> Note: these tools are still imperfect and it is still vastly preferable to
>> advocate for data in the correct formats, rather than teach people how to
>> extract.
>> 
>> Most require knowledge of coding - some progress being made on
>> non-technical ones.
>> 
>> 
>> 
>> *Cleaning, Working with and Analyzing Data *
>> 
>> 
>> Note: One obvious omission in this section is statistical software such as
>> SPSS. The reason for the omission is that interviewees seemed largely to
>> have been trained to use such software. The other software in this list was
>> more likely to be outside the comfort zone of the general practitioner.
>> 
>> 
>> 
>>  *Issue*
>> 
>> *Tools*
>> 
>> *Level*
>> 
>> Messy data, typos, blanks (various)
>> 
>> Spreadsheets, Google Refine
>> 
>> Basic (but not too powerful, particularly on big datasets), Intermediate
>> 
>> Need to reconcile entities against one another (to answer questions such
>> as, what is company X)
>> 
>> Nomenklatura, OpenCorporates, PublicBodies.org
>> 
>> Advanced
>> 
>> Need to be able to conceptualize networks and relationships between
>> entities
>> 
>> Gephi
>> 
>> Intermediate - advanced.
>> 
>> 
>> *Note: This may not perform all of the functions that tools such as ‘the
>> Network’ from K-Monitor is intended to (no link to database of articles),
>> however, data can be structured quite simply to visualise networks of
>> interaction.  *
>> 
>> Need to be able to work with many many lines of data (too big to be able
>> to fit in Excel).
>> 
>> 
>> *Note: As few countries currently release transaction level data, this is
>> not a frequent problem, but is already problematic in places such as
>> Brazil, US and the UK. As we push for greater disclosure, this will be
>> needed ever more. *
>> 
>> OpenSpending.org, Other database software
>> 
>> OpenSpending.org - easy for basic upload search and interrogation, some
>> advanced queries may require knowledge of coding.
>> 
>> 
>> Databases - Intermediate.
>> 
>> Repetitive tasks or modelling
>> 
>> Macros - Excel
>> 
>> Basic - Intermediate.
>> 
>> Entity Extraction (e.g. from large bodies of documents)
>> 
>> Open Calais
>> 
>> Intermediate. This is also not a perfect method.
>> 
>> 
>> 
>> *Presenting Data *
>> 
>> 
>> 
>>  *Issue*
>> 
>> *Tools*
>> 
>> *Level*
>> 
>> Basic visualisation, time series, bar charts
>> 
>> DataWrapper, Tableau Public, Many Eyes, Google Tools
>> 
>> Basic
>> 
>> Mapping issues
>> 
>> TileMill, Google Fusion Tables, Kartograph, QGis
>> 
>> Basic - Advanced
>> 
>> Creating a citizen’s budget
>> 
>> OpenSpending.org, Off-the shelf tools listed above, Open Source tools
>> from around the community
>> 
>> OpenSpending.org - basic.
>> 
>> 
>> 
>> *Publishing Data *
>> 
>> 
>> 
>> 
>>  *Issue*
>> 
>> *Tools*
>> 
>> *Level*
>> 
>> Need a place online to store and manage data, raw, from Freedom of
>> Information Requests
>> 
>> DataNest, CKAN, Socrata - various Data Portal Software options
>> 
>> Basic to use, can be tricky to get running and set up.
>> 
>> 
>> 
>> See also: http://openspending.org/resources/handbook/ch014_resources.html
>> 
>> 
>> 
>> --
>> *Project Coordinator*
>> School of Data <http://schoolofdata.org/> and
>> OpenSpending <http://openspending.org/>
>> Projects of the Open Knowledge Foundation <http://okfn.org/>
>> Support our work <http://okfn.org/support/>.
>> 
>> 
>> _______________________________________________
>> openspending mailing list
>> openspending at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/openspending
>> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>> 
>> 
>> _______________________________________________
>> openspending mailing list
>> openspending at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/openspending
>> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>> 
>> 
> 
> 
> -- 
> *Project Coordinator*
> School of Data <http://schoolofdata.org/> and
> OpenSpending <http://openspending.org/>
> Projects of the Open Knowledge Foundation <http://okfn.org/>
> Support our work <http://okfn.org/support/>.
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending

-- 
@smithsam










More information about the openspending mailing list