[OpenSpending] Ecosystem of tools: Working with Spending Data

Lucy Chambers lucy.chambers at okfn.org
Thu May 2 11:16:44 UTC 2013


Ah, yes - thanks Kathryn - I keep making that mistake!

@Pedro, as per your request - I've uploaded the rtf version to GDocs - here
it is:

https://docs.google.com/a/okfn.org/document/d/12Wyqif_uqX01NYgY9xZumpq-y_NmU2ipoT3_FkU8AFU/edit

Lucy


On 2 May 2013 05:54, Kathryn.Corrick <kathryn.corrick at googlemail.com> wrote:

> Hi Lucy,
> This is a great list. No additional tools that I can think of that meet
> your criteria but I'll ask the ODI team just in case.
>
> However, you may want to be aware that Google Refine is now Open Refine.
> The explanatory video for Google Refine which applies to Open Refine for
> the moment is a good one, so it might be worth adding to your list if there
> are intro videos or how-tos to the tools listed.
>
> Kathryn
>
>
> On 2 May 2013, at 03:07, Lucy Chambers <lucy.chambers at okfn.org> wrote:
>
> Hi All,
>
> A quick question, I'm trying to draw up an 'ideal' ecosystem of tools for
> working with spending data so that we can work out how to teach them more
> effectively and hopefully, get some more exciting Spending Data related
> projects out there.
>
> I've attached my current thoughts in a draft and would be grateful for
> input from the group!:
>
>
> https://dl.dropboxusercontent.com/u/7348125/Spending%20Data%20-%20Tool%20Ecosystem.doc
>
> It's important that the audience for this is NGOs, who probably cannot
> code, so I would prefer to keep this list relatively short and to the
> point.
>
> (I have also copy-pasted the text from the document below for your
> convenience - although- the structure may not make it through the mailing
> list!)
>
> Lucy
>
>
>
>
>
>
>
> *Spending Data: The Tool Ecosystem *
>
>
> There are a set of staple tools that can be used to tackle the issues
> highlighted by the organisations in this report. For each one - we’ve
> outlined the tool - what it’s useful for and what the barrier to entry is.
>
>
> Key:
>
>
> *Basic* = An off-the-shelf tool that can be learned and first independent
> usage made of within 1 day. No installation on servers etc required.
>
> *Intermediate* = Between 1 day - 1 week to master basic functionality.
> May require tweaking of code but not new creation thereof.
>
> *Advanced* = Requires code
>
>
> *Extracting and Getting Data *
>
>
>
>   *Issue*
>
> *Tools*
>
> *Level*
>
> Data not available
>
> Freedom of Information Portals
>
> Basic - though some education may be required to inform people that they
> have the right to ask, how to phrase an FOI request etc, whether it is
> possible to submit these requests electronically etc.
>
> **
>
> *Case in point: The group of people assembled in Romania said that they
> never submitted requests electronically because they could not prove the
> date the request was submitted (better to have a stamp on paper.) *
>
> Data available online but not downloadable. (e.g. in HTML tables on
> webpages)
>
> For simple sites (information on an individual webpage) Google
> Spreadsheets and ImportHTML Function, or the Google scraper extension
> (basic).
>
>
> For more complex webpages (information spread across numerous pages) - a
> scraper will be required. Scrapers are ways to extract structured
> information from websites using code. There is a useful tool to make doing
> this easier online - Scraperwiki <http://scraperwiki.org/>.  (advanced).
>
> For the basic level, anyone who can use a spreadsheet and functions can
> use it. It is not, however, a well-known command and awareness must be
> spread about how it can be used. (People often daunted because they presume
> scraping involves code).
>
>
> (See School of Data course:
>
>
> Scraping using code is advanced, and requires knowledge of at least one
> programming language.
>
>
>   Data available only in PDFS
>
> A variety of tools are available - many require knowledge of code to
> operate. Most promising non-code variants are ABBYY Finereader (not free)
> and Tabula (new software, still a bit buggy and requires people to be able
> to host it themselves to use.)
>
>
> For more info - see School of Data course.
> http://schoolofdata.org/handbook/courses/extracting-data-from-pdf/
>
>
> Note: these tools are still imperfect and it is still vastly preferable to
> advocate for data in the correct formats, rather than teach people how to
> extract.
>
> Most require knowledge of coding - some progress being made on
> non-technical ones.
>
>
>
> *Cleaning, Working with and Analyzing Data *
>
>
> Note: One obvious omission in this section is statistical software such as
> SPSS. The reason for the omission is that interviewees seemed largely to
> have been trained to use such software. The other software in this list was
> more likely to be outside the comfort zone of the general practitioner.
>
>
>
>   *Issue*
>
> *Tools*
>
> *Level*
>
> Messy data, typos, blanks (various)
>
> Spreadsheets, Google Refine
>
> Basic (but not too powerful, particularly on big datasets), Intermediate
>
> Need to reconcile entities against one another (to answer questions such
> as, what is company X)
>
> Nomenklatura, OpenCorporates, PublicBodies.org
>
> Advanced
>
> Need to be able to conceptualize networks and relationships between
> entities
>
> Gephi
>
> Intermediate - advanced.
>
>
> *Note: This may not perform all of the functions that tools such as ‘the
> Network’ from K-Monitor is intended to (no link to database of articles),
> however, data can be structured quite simply to visualise networks of
> interaction.  *
>
> Need to be able to work with many many lines of data (too big to be able
> to fit in Excel).
>
>
> *Note: As few countries currently release transaction level data, this is
> not a frequent problem, but is already problematic in places such as
> Brazil, US and the UK. As we push for greater disclosure, this will be
> needed ever more. *
>
> OpenSpending.org, Other database software
>
> OpenSpending.org - easy for basic upload search and interrogation, some
> advanced queries may require knowledge of coding.
>
>
> Databases - Intermediate.
>
> Repetitive tasks or modelling
>
> Macros - Excel
>
> Basic - Intermediate.
>
> Entity Extraction (e.g. from large bodies of documents)
>
> Open Calais
>
> Intermediate. This is also not a perfect method.
>
>
>
> *Presenting Data *
>
>
>
>   *Issue*
>
> *Tools*
>
> *Level*
>
> Basic visualisation, time series, bar charts
>
> DataWrapper, Tableau Public, Many Eyes, Google Tools
>
> Basic
>
> Mapping issues
>
> TileMill, Google Fusion Tables, Kartograph, QGis
>
> Basic - Advanced
>
> Creating a citizen’s budget
>
> OpenSpending.org, Off-the shelf tools listed above, Open Source tools
> from around the community
>
> OpenSpending.org - basic.
>
>
>
> *Publishing Data *
>
>
>
>
>   *Issue*
>
> *Tools*
>
> *Level*
>
> Need a place online to store and manage data, raw, from Freedom of
> Information Requests
>
> DataNest, CKAN, Socrata - various Data Portal Software options
>
> Basic to use, can be tricky to get running and set up.
>
>
>
> See also: http://openspending.org/resources/handbook/ch014_resources.html
>
>
>
> --
> *Project Coordinator*
> School of Data <http://schoolofdata.org/> and
> OpenSpending <http://openspending.org/>
> Projects of the Open Knowledge Foundation <http://okfn.org/>
> Support our work <http://okfn.org/support/>.
>
>
>  _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>
>
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>
>


-- 
*Project Coordinator*
School of Data <http://schoolofdata.org/> and
OpenSpending <http://openspending.org/>
Projects of the Open Knowledge Foundation <http://okfn.org/>
Support our work <http://okfn.org/support/>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending/attachments/20130502/7a3f60d5/attachment.html>


More information about the openspending mailing list