[OpenSpending] Ecosystem of tools: Working with Spending Data

Pedro Markun pedro at esfera.mobi
Thu May 2 05:29:24 UTC 2013


Hey Lucy,

the doc open as gibberish on my libreoffice for some reason... can you try
converting it to some other format or sharing it on gdocs?

I'm really interested on that :D

[]'s
Pedro Markun

On Wed, May 1, 2013 at 11:07 PM, Lucy Chambers <lucy.chambers at okfn.org>wrote:

> Hi All,
>
> A quick question, I'm trying to draw up an 'ideal' ecosystem of tools for
> working with spending data so that we can work out how to teach them more
> effectively and hopefully, get some more exciting Spending Data related
> projects out there.
>
> I've attached my current thoughts in a draft and would be grateful for
> input from the group!:
>
>
> https://dl.dropboxusercontent.com/u/7348125/Spending%20Data%20-%20Tool%20Ecosystem.doc
>
> It's important that the audience for this is NGOs, who probably cannot
> code, so I would prefer to keep this list relatively short and to the
> point.
>
> (I have also copy-pasted the text from the document below for your
> convenience - although- the structure may not make it through the mailing
> list!)
>
> Lucy
>
>
>
>
>
>
>
> *Spending Data: The Tool Ecosystem *
>
>
> There are a set of staple tools that can be used to tackle the issues
> highlighted by the organisations in this report. For each one - we’ve
> outlined the tool - what it’s useful for and what the barrier to entry is.
>
>
> Key:
>
>
> *Basic* = An off-the-shelf tool that can be learned and first independent
> usage made of within 1 day. No installation on servers etc required.
>
> *Intermediate* = Between 1 day - 1 week to master basic functionality.
> May require tweaking of code but not new creation thereof.
>
> *Advanced* = Requires code
>
>
> *Extracting and Getting Data *
>
>
>
>   *Issue*
>
> *Tools*
>
> *Level*
>
> Data not available
>
> Freedom of Information Portals
>
> Basic - though some education may be required to inform people that they
> have the right to ask, how to phrase an FOI request etc, whether it is
> possible to submit these requests electronically etc.
>
> **
>
> *Case in point: The group of people assembled in Romania said that they
> never submitted requests electronically because they could not prove the
> date the request was submitted (better to have a stamp on paper.) *
>
> Data available online but not downloadable. (e.g. in HTML tables on
> webpages)
>
> For simple sites (information on an individual webpage) Google
> Spreadsheets and ImportHTML Function, or the Google scraper extension
> (basic).
>
>
> For more complex webpages (information spread across numerous pages) - a
> scraper will be required. Scrapers are ways to extract structured
> information from websites using code. There is a useful tool to make doing
> this easier online - Scraperwiki <http://scraperwiki.org/>.  (advanced).
>
> For the basic level, anyone who can use a spreadsheet and functions can
> use it. It is not, however, a well-known command and awareness must be
> spread about how it can be used. (People often daunted because they presume
> scraping involves code).
>
>
> (See School of Data course:
>
>
> Scraping using code is advanced, and requires knowledge of at least one
> programming language.
>
>
>   Data available only in PDFS
>
> A variety of tools are available - many require knowledge of code to
> operate. Most promising non-code variants are ABBYY Finereader (not free)
> and Tabula (new software, still a bit buggy and requires people to be able
> to host it themselves to use.)
>
>
> For more info - see School of Data course.
> http://schoolofdata.org/handbook/courses/extracting-data-from-pdf/
>
>
> Note: these tools are still imperfect and it is still vastly preferable to
> advocate for data in the correct formats, rather than teach people how to
> extract.
>
> Most require knowledge of coding - some progress being made on
> non-technical ones.
>
>
>
> *Cleaning, Working with and Analyzing Data *
>
>
> Note: One obvious omission in this section is statistical software such as
> SPSS. The reason for the omission is that interviewees seemed largely to
> have been trained to use such software. The other software in this list was
> more likely to be outside the comfort zone of the general practitioner.
>
>
>
>   *Issue*
>
> *Tools*
>
> *Level*
>
> Messy data, typos, blanks (various)
>
> Spreadsheets, Google Refine
>
> Basic (but not too powerful, particularly on big datasets), Intermediate
>
> Need to reconcile entities against one another (to answer questions such
> as, what is company X)
>
> Nomenklatura, OpenCorporates, PublicBodies.org
>
> Advanced
>
> Need to be able to conceptualize networks and relationships between
> entities
>
> Gephi
>
> Intermediate - advanced.
>
>
> *Note: This may not perform all of the functions that tools such as ‘the
> Network’ from K-Monitor is intended to (no link to database of articles),
> however, data can be structured quite simply to visualise networks of
> interaction.  *
>
> Need to be able to work with many many lines of data (too big to be able
> to fit in Excel).
>
>
> *Note: As few countries currently release transaction level data, this is
> not a frequent problem, but is already problematic in places such as
> Brazil, US and the UK. As we push for greater disclosure, this will be
> needed ever more. *
>
> OpenSpending.org, Other database software
>
> OpenSpending.org - easy for basic upload search and interrogation, some
> advanced queries may require knowledge of coding.
>
>
> Databases - Intermediate.
>
> Repetitive tasks or modelling
>
> Macros - Excel
>
> Basic - Intermediate.
>
> Entity Extraction (e.g. from large bodies of documents)
>
> Open Calais
>
> Intermediate. This is also not a perfect method.
>
>
>
> *Presenting Data *
>
>
>
>   *Issue*
>
> *Tools*
>
> *Level*
>
> Basic visualisation, time series, bar charts
>
> DataWrapper, Tableau Public, Many Eyes, Google Tools
>
> Basic
>
> Mapping issues
>
> TileMill, Google Fusion Tables, Kartograph, QGis
>
> Basic - Advanced
>
> Creating a citizen’s budget
>
> OpenSpending.org, Off-the shelf tools listed above, Open Source tools from
> around the community
>
> OpenSpending.org - basic.
>
>
>
> *Publishing Data *
>
>
>
>
>   *Issue*
>
> *Tools*
>
> *Level*
>
> Need a place online to store and manage data, raw, from Freedom of
> Information Requests
>
> DataNest, CKAN, Socrata - various Data Portal Software options
>
> Basic to use, can be tricky to get running and set up.
>
>
>
> See also: http://openspending.org/resources/handbook/ch014_resources.html
>
>
>
> --
> *Project Coordinator*
> School of Data <http://schoolofdata.org/> and
> OpenSpending <http://openspending.org/>
> Projects of the Open Knowledge Foundation <http://okfn.org/>
> Support our work <http://okfn.org/support/>.
>
>
>
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending/attachments/20130502/bbb46fbf/attachment.html>


More information about the openspending mailing list