[OpenSpending] Ecosystem of tools: Working with Spending Data

Kathryn.Corrick kathryn.corrick at googlemail.com
Thu May 2 19:32:56 UTC 2013


On that point Sam, all... 

We probably need to add in which of these tools work on IE6? 

The challenge we're currently facing (with my ODI training hat on) for many of the government departments we work with is that Google tools and anything that uses HTML5 won't work on their machines (which are locked down and run IE6), and getting new software installed can be a lengthy process.

This may also be the case with other organisations including in the voluntary sector, particularly ones having to rely on older or second hand machines.

Lucy - I'll see if I can get this info for you as we've already done some testing.

Kathryn


On 2 May 2013, at 13:29, Sam Smith <s at msmith.net> wrote:

> This looks great.
> 
> One thing I especially like is the fact that easy rebuttals to "we can't do that here" type responses are all ingrained in the text, in a way which is probably incredibly helpful for people who don't realise that they need that yet. 
> 
> 
> 
> Sam
> 
> On 2 May 2013, at 12:16, Lucy Chambers <lucy.chambers at okfn.org> wrote:
> 
>> Ah, yes - thanks Kathryn - I keep making that mistake!
>> 
>> @Pedro, as per your request - I've uploaded the rtf version to GDocs - here
>> it is:
>> 
>> https://docs.google.com/a/okfn.org/document/d/12Wyqif_uqX01NYgY9xZumpq-y_NmU2ipoT3_FkU8AFU/edit
>> 
>> Lucy
>> 
>> 
>> On 2 May 2013 05:54, Kathryn.Corrick <kathryn.corrick at googlemail.com> wrote:
>> 
>>> Hi Lucy,
>>> This is a great list. No additional tools that I can think of that meet
>>> your criteria but I'll ask the ODI team just in case.
>>> 
>>> However, you may want to be aware that Google Refine is now Open Refine.
>>> The explanatory video for Google Refine which applies to Open Refine for
>>> the moment is a good one, so it might be worth adding to your list if there
>>> are intro videos or how-tos to the tools listed.
>>> 
>>> Kathryn
>>> 
>>> 
>>> On 2 May 2013, at 03:07, Lucy Chambers <lucy.chambers at okfn.org> wrote:
>>> 
>>> Hi All,
>>> 
>>> A quick question, I'm trying to draw up an 'ideal' ecosystem of tools for
>>> working with spending data so that we can work out how to teach them more
>>> effectively and hopefully, get some more exciting Spending Data related
>>> projects out there.
>>> 
>>> I've attached my current thoughts in a draft and would be grateful for
>>> input from the group!:
>>> 
>>> 
>>> https://dl.dropboxusercontent.com/u/7348125/Spending%20Data%20-%20Tool%20Ecosystem.doc
>>> 
>>> It's important that the audience for this is NGOs, who probably cannot
>>> code, so I would prefer to keep this list relatively short and to the
>>> point.
>>> 
>>> (I have also copy-pasted the text from the document below for your
>>> convenience - although- the structure may not make it through the mailing
>>> list!)
>>> 
>>> Lucy
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> *Spending Data: The Tool Ecosystem *
>>> 
>>> 
>>> There are a set of staple tools that can be used to tackle the issues
>>> highlighted by the organisations in this report. For each one - we’ve
>>> outlined the tool - what it’s useful for and what the barrier to entry is.
>>> 
>>> 
>>> Key:
>>> 
>>> 
>>> *Basic* = An off-the-shelf tool that can be learned and first independent
>>> usage made of within 1 day. No installation on servers etc required.
>>> 
>>> *Intermediate* = Between 1 day - 1 week to master basic functionality.
>>> May require tweaking of code but not new creation thereof.
>>> 
>>> *Advanced* = Requires code
>>> 
>>> 
>>> *Extracting and Getting Data *
>>> 
>>> 
>>> 
>>> *Issue*
>>> 
>>> *Tools*
>>> 
>>> *Level*
>>> 
>>> Data not available
>>> 
>>> Freedom of Information Portals
>>> 
>>> Basic - though some education may be required to inform people that they
>>> have the right to ask, how to phrase an FOI request etc, whether it is
>>> possible to submit these requests electronically etc.
>>> 
>>> **
>>> 
>>> *Case in point: The group of people assembled in Romania said that they
>>> never submitted requests electronically because they could not prove the
>>> date the request was submitted (better to have a stamp on paper.) *
>>> 
>>> Data available online but not downloadable. (e.g. in HTML tables on
>>> webpages)
>>> 
>>> For simple sites (information on an individual webpage) Google
>>> Spreadsheets and ImportHTML Function, or the Google scraper extension
>>> (basic).
>>> 
>>> 
>>> For more complex webpages (information spread across numerous pages) - a
>>> scraper will be required. Scrapers are ways to extract structured
>>> information from websites using code. There is a useful tool to make doing
>>> this easier online - Scraperwiki <http://scraperwiki.org/>.  (advanced).
>>> 
>>> For the basic level, anyone who can use a spreadsheet and functions can
>>> use it. It is not, however, a well-known command and awareness must be
>>> spread about how it can be used. (People often daunted because they presume
>>> scraping involves code).
>>> 
>>> 
>>> (See School of Data course:
>>> 
>>> 
>>> Scraping using code is advanced, and requires knowledge of at least one
>>> programming language.
>>> 
>>> 
>>> Data available only in PDFS
>>> 
>>> A variety of tools are available - many require knowledge of code to
>>> operate. Most promising non-code variants are ABBYY Finereader (not free)
>>> and Tabula (new software, still a bit buggy and requires people to be able
>>> to host it themselves to use.)
>>> 
>>> 
>>> For more info - see School of Data course.
>>> http://schoolofdata.org/handbook/courses/extracting-data-from-pdf/
>>> 
>>> 
>>> Note: these tools are still imperfect and it is still vastly preferable to
>>> advocate for data in the correct formats, rather than teach people how to
>>> extract.
>>> 
>>> Most require knowledge of coding - some progress being made on
>>> non-technical ones.
>>> 
>>> 
>>> 
>>> *Cleaning, Working with and Analyzing Data *
>>> 
>>> 
>>> Note: One obvious omission in this section is statistical software such as
>>> SPSS. The reason for the omission is that interviewees seemed largely to
>>> have been trained to use such software. The other software in this list was
>>> more likely to be outside the comfort zone of the general practitioner.
>>> 
>>> 
>>> 
>>> *Issue*
>>> 
>>> *Tools*
>>> 
>>> *Level*
>>> 
>>> Messy data, typos, blanks (various)
>>> 
>>> Spreadsheets, Google Refine
>>> 
>>> Basic (but not too powerful, particularly on big datasets), Intermediate
>>> 
>>> Need to reconcile entities against one another (to answer questions such
>>> as, what is company X)
>>> 
>>> Nomenklatura, OpenCorporates, PublicBodies.org
>>> 
>>> Advanced
>>> 
>>> Need to be able to conceptualize networks and relationships between
>>> entities
>>> 
>>> Gephi
>>> 
>>> Intermediate - advanced.
>>> 
>>> 
>>> *Note: This may not perform all of the functions that tools such as ‘the
>>> Network’ from K-Monitor is intended to (no link to database of articles),
>>> however, data can be structured quite simply to visualise networks of
>>> interaction.  *
>>> 
>>> Need to be able to work with many many lines of data (too big to be able
>>> to fit in Excel).
>>> 
>>> 
>>> *Note: As few countries currently release transaction level data, this is
>>> not a frequent problem, but is already problematic in places such as
>>> Brazil, US and the UK. As we push for greater disclosure, this will be
>>> needed ever more. *
>>> 
>>> OpenSpending.org, Other database software
>>> 
>>> OpenSpending.org - easy for basic upload search and interrogation, some
>>> advanced queries may require knowledge of coding.
>>> 
>>> 
>>> Databases - Intermediate.
>>> 
>>> Repetitive tasks or modelling
>>> 
>>> Macros - Excel
>>> 
>>> Basic - Intermediate.
>>> 
>>> Entity Extraction (e.g. from large bodies of documents)
>>> 
>>> Open Calais
>>> 
>>> Intermediate. This is also not a perfect method.
>>> 
>>> 
>>> 
>>> *Presenting Data *
>>> 
>>> 
>>> 
>>> *Issue*
>>> 
>>> *Tools*
>>> 
>>> *Level*
>>> 
>>> Basic visualisation, time series, bar charts
>>> 
>>> DataWrapper, Tableau Public, Many Eyes, Google Tools
>>> 
>>> Basic
>>> 
>>> Mapping issues
>>> 
>>> TileMill, Google Fusion Tables, Kartograph, QGis
>>> 
>>> Basic - Advanced
>>> 
>>> Creating a citizen’s budget
>>> 
>>> OpenSpending.org, Off-the shelf tools listed above, Open Source tools
>>> from around the community
>>> 
>>> OpenSpending.org - basic.
>>> 
>>> 
>>> 
>>> *Publishing Data *
>>> 
>>> 
>>> 
>>> 
>>> *Issue*
>>> 
>>> *Tools*
>>> 
>>> *Level*
>>> 
>>> Need a place online to store and manage data, raw, from Freedom of
>>> Information Requests
>>> 
>>> DataNest, CKAN, Socrata - various Data Portal Software options
>>> 
>>> Basic to use, can be tricky to get running and set up.
>>> 
>>> 
>>> 
>>> See also: http://openspending.org/resources/handbook/ch014_resources.html
>>> 
>>> 
>>> 
>>> --
>>> *Project Coordinator*
>>> School of Data <http://schoolofdata.org/> and
>>> OpenSpending <http://openspending.org/>
>>> Projects of the Open Knowledge Foundation <http://okfn.org/>
>>> Support our work <http://okfn.org/support/>.
>>> 
>>> 
>>> _______________________________________________
>>> openspending mailing list
>>> openspending at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/openspending
>>> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>>> 
>>> 
>>> _______________________________________________
>>> openspending mailing list
>>> openspending at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/openspending
>>> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>>> 
>>> 
>> 
>> 
>> -- 
>> *Project Coordinator*
>> School of Data <http://schoolofdata.org/> and
>> OpenSpending <http://openspending.org/>
>> Projects of the Open Knowledge Foundation <http://okfn.org/>
>> Support our work <http://okfn.org/support/>.
>> _______________________________________________
>> openspending mailing list
>> openspending at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/openspending
>> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
> 
> -- 
> @smithsam
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending




More information about the openspending mailing list