[OpenSpending] Ecosystem of tools: Working with Spending Data

Kathryn.Corrick kathryn.corrick at googlemail.com
Thu May 2 05:54:11 UTC 2013


Hi Lucy,
This is a great list. No additional tools that I can think of that meet your criteria but I'll ask the ODI team just in case.

However, you may want to be aware that Google Refine is now Open Refine. The explanatory video for Google Refine which applies to Open Refine for the moment is a good one, so it might be worth adding to your list if there are intro videos or how-tos to the tools listed.

Kathryn 


On 2 May 2013, at 03:07, Lucy Chambers <lucy.chambers at okfn.org> wrote:

> Hi All, 
> 
> A quick question, I'm trying to draw up an 'ideal' ecosystem of tools for working with spending data so that we can work out how to teach them more effectively and hopefully, get some more exciting Spending Data related projects out there. 
> 
> I've attached my current thoughts in a draft and would be grateful for input from the group!: 
> 
> https://dl.dropboxusercontent.com/u/7348125/Spending%20Data%20-%20Tool%20Ecosystem.doc
> 
> It's important that the audience for this is NGOs, who probably cannot code, so I would prefer to keep this list relatively short and to the point. 
> 
> (I have also copy-pasted the text from the document below for your convenience - although- the structure may not make it through the mailing list!) 
> 
> Lucy 
> 
> 
> 
> 
> 
> 
> 
> Spending Data: The Tool Ecosystem 
> 
> 
> 
> There are a set of staple tools that can be used to tackle the issues highlighted by the organisations in this report. For each one - we’ve outlined the tool - what it’s useful for and what the barrier to entry is. 
> 
> 
> 
> Key: 
> 
> 
> 
> Basic = An off-the-shelf tool that can be learned and first independent usage made of within 1 day. No installation on servers etc required.
> 
> Intermediate = Between 1 day - 1 week to master basic functionality. May require tweaking of code but not new creation thereof. 
> 
> Advanced = Requires code
> 
> 
> 
> Extracting and Getting Data 
> 
> 
> 
> 
> 
> Issue
> 
> Tools
> 
> Level
> 
> Data not available
> 
> Freedom of Information Portals 
> 
> Basic - though some education may be required to inform people that they have the right to ask, how to phrase an FOI request etc, whether it is possible to submit these requests electronically etc. 
> 
> 
> 
> Case in point: The group of people assembled in Romania said that they never submitted requests electronically because they could not prove the date the request was submitted (better to have a stamp on paper.) 
> 
> Data available online but not downloadable. (e.g. in HTML tables on webpages)
> 
> For simple sites (information on an individual webpage) Google Spreadsheets and ImportHTML Function, or the Google scraper extension (basic). 
> 
> 
> 
> For more complex webpages (information spread across numerous pages) - a scraper will be required. Scrapers are ways to extract structured information from websites using code. There is a useful tool to make doing this easier online - Scraperwiki.  (advanced). 
> 
> For the basic level, anyone who can use a spreadsheet and functions can use it. It is not, however, a well-known command and awareness must be spread about how it can be used. (People often daunted because they presume scraping involves code). 
> 
> 
> 
> (See School of Data course: 
> 
> 
> 
> Scraping using code is advanced, and requires knowledge of at least one programming language. 
> 
> 
> 
> Data available only in PDFS
> 
> A variety of tools are available - many require knowledge of code to operate. Most promising non-code variants are ABBYY Finereader (not free) and Tabula (new software, still a bit buggy and requires people to be able to host it themselves to use.) 
> 
> 
> 
> For more info - see School of Data course. http://schoolofdata.org/handbook/courses/extracting-data-from-pdf/
> 
> 
> 
> Note: these tools are still imperfect and it is still vastly preferable to advocate for data in the correct formats, rather than teach people how to extract. 
> 
> Most require knowledge of coding - some progress being made on non-technical ones. 
> 
> 
> 
> 
> 
> Cleaning, Working with and Analyzing Data 
> 
> 
> 
> Note: One obvious omission in this section is statistical software such as SPSS. The reason for the omission is that interviewees seemed largely to have been trained to use such software. The other software in this list was more likely to be outside the comfort zone of the general practitioner. 
> 
> 
> 
> 
> 
> Issue
> 
> Tools
> 
> Level
> 
> Messy data, typos, blanks (various) 
> 
> Spreadsheets, Google Refine
> 
> Basic (but not too powerful, particularly on big datasets), Intermediate 
> 
> Need to reconcile entities against one another (to answer questions such as, what is company X)
> 
> Nomenklatura, OpenCorporates, PublicBodies.org 
> 
> Advanced 
> 
> Need to be able to conceptualize networks and relationships between entities 
> 
> Gephi 
> 
> Intermediate - advanced. 
> 
> 
> 
> Note: This may not perform all of the functions that tools such as ‘the Network’ from K-Monitor is intended to (no link to database of articles), however, data can be structured quite simply to visualise networks of interaction.  
> 
> Need to be able to work with many many lines of data (too big to be able to fit in Excel). 
> 
> 
> 
> Note: As few countries currently release transaction level data, this is not a frequent problem, but is already problematic in places such as Brazil, US and the UK. As we push for greater disclosure, this will be needed ever more. 
> 
> OpenSpending.org, Other database software 
> 
> OpenSpending.org - easy for basic upload search and interrogation, some advanced queries may require knowledge of coding. 
> 
> 
> 
> Databases - Intermediate. 
> 
> Repetitive tasks or modelling 
> 
> Macros - Excel
> 
> Basic - Intermediate. 
> 
> Entity Extraction (e.g. from large bodies of documents)
> 
> Open Calais 
> 
> Intermediate. This is also not a perfect method. 
> 
> 
> 
> 
> 
> Presenting Data 
> 
> 
> 
> 
> 
> Issue
> 
> Tools
> 
> Level
> 
> Basic visualisation, time series, bar charts 
> 
> DataWrapper, Tableau Public, Many Eyes, Google Tools
> 
> Basic 
> 
> Mapping issues
> 
> TileMill, Google Fusion Tables, Kartograph, QGis
> 
> Basic - Advanced
> 
> Creating a citizen’s budget
> 
> OpenSpending.org, Off-the shelf tools listed above, Open Source tools from around the community
> 
> OpenSpending.org - basic. 
> 
> 
> 
> 
> 
> Publishing Data 
> 
> 
> 
> 
> 
> 
> 
> Issue
> 
> Tools
> 
> Level
> 
> Need a place online to store and manage data, raw, from Freedom of Information Requests
> 
> DataNest, CKAN, Socrata - various Data Portal Software options
> 
> Basic to use, can be tricky to get running and set up. 
> 
> 
> 
> 
> 
> See also: http://openspending.org/resources/handbook/ch014_resources.html 
> 
> 
>  
> 
> -- 
> Project Coordinator
> School of Data and
> OpenSpending 
> Projects of the Open Knowledge Foundation
> Support our work. 
> 
> 
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending/attachments/20130502/bf2c077c/attachment.html>


More information about the openspending mailing list