[OpenSpending] Ecosystem of tools: Working with Spending Data
Sam Smith
s at msmith.net
Thu May 2 12:29:28 UTC 2013
This looks great.
One thing I especially like is the fact that easy rebuttals to "we can't do that here" type responses are all ingrained in the text, in a way which is probably incredibly helpful for people who don't realise that they need that yet.
Sam
On 2 May 2013, at 12:16, Lucy Chambers <lucy.chambers at okfn.org> wrote:
> Ah, yes - thanks Kathryn - I keep making that mistake!
>
> @Pedro, as per your request - I've uploaded the rtf version to GDocs - here
> it is:
>
> https://docs.google.com/a/okfn.org/document/d/12Wyqif_uqX01NYgY9xZumpq-y_NmU2ipoT3_FkU8AFU/edit
>
> Lucy
>
>
> On 2 May 2013 05:54, Kathryn.Corrick <kathryn.corrick at googlemail.com> wrote:
>
>> Hi Lucy,
>> This is a great list. No additional tools that I can think of that meet
>> your criteria but I'll ask the ODI team just in case.
>>
>> However, you may want to be aware that Google Refine is now Open Refine.
>> The explanatory video for Google Refine which applies to Open Refine for
>> the moment is a good one, so it might be worth adding to your list if there
>> are intro videos or how-tos to the tools listed.
>>
>> Kathryn
>>
>>
>> On 2 May 2013, at 03:07, Lucy Chambers <lucy.chambers at okfn.org> wrote:
>>
>> Hi All,
>>
>> A quick question, I'm trying to draw up an 'ideal' ecosystem of tools for
>> working with spending data so that we can work out how to teach them more
>> effectively and hopefully, get some more exciting Spending Data related
>> projects out there.
>>
>> I've attached my current thoughts in a draft and would be grateful for
>> input from the group!:
>>
>>
>> https://dl.dropboxusercontent.com/u/7348125/Spending%20Data%20-%20Tool%20Ecosystem.doc
>>
>> It's important that the audience for this is NGOs, who probably cannot
>> code, so I would prefer to keep this list relatively short and to the
>> point.
>>
>> (I have also copy-pasted the text from the document below for your
>> convenience - although- the structure may not make it through the mailing
>> list!)
>>
>> Lucy
>>
>>
>>
>>
>>
>>
>>
>> *Spending Data: The Tool Ecosystem *
>>
>>
>> There are a set of staple tools that can be used to tackle the issues
>> highlighted by the organisations in this report. For each one - we’ve
>> outlined the tool - what it’s useful for and what the barrier to entry is.
>>
>>
>> Key:
>>
>>
>> *Basic* = An off-the-shelf tool that can be learned and first independent
>> usage made of within 1 day. No installation on servers etc required.
>>
>> *Intermediate* = Between 1 day - 1 week to master basic functionality.
>> May require tweaking of code but not new creation thereof.
>>
>> *Advanced* = Requires code
>>
>>
>> *Extracting and Getting Data *
>>
>>
>>
>> *Issue*
>>
>> *Tools*
>>
>> *Level*
>>
>> Data not available
>>
>> Freedom of Information Portals
>>
>> Basic - though some education may be required to inform people that they
>> have the right to ask, how to phrase an FOI request etc, whether it is
>> possible to submit these requests electronically etc.
>>
>> **
>>
>> *Case in point: The group of people assembled in Romania said that they
>> never submitted requests electronically because they could not prove the
>> date the request was submitted (better to have a stamp on paper.) *
>>
>> Data available online but not downloadable. (e.g. in HTML tables on
>> webpages)
>>
>> For simple sites (information on an individual webpage) Google
>> Spreadsheets and ImportHTML Function, or the Google scraper extension
>> (basic).
>>
>>
>> For more complex webpages (information spread across numerous pages) - a
>> scraper will be required. Scrapers are ways to extract structured
>> information from websites using code. There is a useful tool to make doing
>> this easier online - Scraperwiki <http://scraperwiki.org/>. (advanced).
>>
>> For the basic level, anyone who can use a spreadsheet and functions can
>> use it. It is not, however, a well-known command and awareness must be
>> spread about how it can be used. (People often daunted because they presume
>> scraping involves code).
>>
>>
>> (See School of Data course:
>>
>>
>> Scraping using code is advanced, and requires knowledge of at least one
>> programming language.
>>
>>
>> Data available only in PDFS
>>
>> A variety of tools are available - many require knowledge of code to
>> operate. Most promising non-code variants are ABBYY Finereader (not free)
>> and Tabula (new software, still a bit buggy and requires people to be able
>> to host it themselves to use.)
>>
>>
>> For more info - see School of Data course.
>> http://schoolofdata.org/handbook/courses/extracting-data-from-pdf/
>>
>>
>> Note: these tools are still imperfect and it is still vastly preferable to
>> advocate for data in the correct formats, rather than teach people how to
>> extract.
>>
>> Most require knowledge of coding - some progress being made on
>> non-technical ones.
>>
>>
>>
>> *Cleaning, Working with and Analyzing Data *
>>
>>
>> Note: One obvious omission in this section is statistical software such as
>> SPSS. The reason for the omission is that interviewees seemed largely to
>> have been trained to use such software. The other software in this list was
>> more likely to be outside the comfort zone of the general practitioner.
>>
>>
>>
>> *Issue*
>>
>> *Tools*
>>
>> *Level*
>>
>> Messy data, typos, blanks (various)
>>
>> Spreadsheets, Google Refine
>>
>> Basic (but not too powerful, particularly on big datasets), Intermediate
>>
>> Need to reconcile entities against one another (to answer questions such
>> as, what is company X)
>>
>> Nomenklatura, OpenCorporates, PublicBodies.org
>>
>> Advanced
>>
>> Need to be able to conceptualize networks and relationships between
>> entities
>>
>> Gephi
>>
>> Intermediate - advanced.
>>
>>
>> *Note: This may not perform all of the functions that tools such as ‘the
>> Network’ from K-Monitor is intended to (no link to database of articles),
>> however, data can be structured quite simply to visualise networks of
>> interaction. *
>>
>> Need to be able to work with many many lines of data (too big to be able
>> to fit in Excel).
>>
>>
>> *Note: As few countries currently release transaction level data, this is
>> not a frequent problem, but is already problematic in places such as
>> Brazil, US and the UK. As we push for greater disclosure, this will be
>> needed ever more. *
>>
>> OpenSpending.org, Other database software
>>
>> OpenSpending.org - easy for basic upload search and interrogation, some
>> advanced queries may require knowledge of coding.
>>
>>
>> Databases - Intermediate.
>>
>> Repetitive tasks or modelling
>>
>> Macros - Excel
>>
>> Basic - Intermediate.
>>
>> Entity Extraction (e.g. from large bodies of documents)
>>
>> Open Calais
>>
>> Intermediate. This is also not a perfect method.
>>
>>
>>
>> *Presenting Data *
>>
>>
>>
>> *Issue*
>>
>> *Tools*
>>
>> *Level*
>>
>> Basic visualisation, time series, bar charts
>>
>> DataWrapper, Tableau Public, Many Eyes, Google Tools
>>
>> Basic
>>
>> Mapping issues
>>
>> TileMill, Google Fusion Tables, Kartograph, QGis
>>
>> Basic - Advanced
>>
>> Creating a citizen’s budget
>>
>> OpenSpending.org, Off-the shelf tools listed above, Open Source tools
>> from around the community
>>
>> OpenSpending.org - basic.
>>
>>
>>
>> *Publishing Data *
>>
>>
>>
>>
>> *Issue*
>>
>> *Tools*
>>
>> *Level*
>>
>> Need a place online to store and manage data, raw, from Freedom of
>> Information Requests
>>
>> DataNest, CKAN, Socrata - various Data Portal Software options
>>
>> Basic to use, can be tricky to get running and set up.
>>
>>
>>
>> See also: http://openspending.org/resources/handbook/ch014_resources.html
>>
>>
>>
>> --
>> *Project Coordinator*
>> School of Data <http://schoolofdata.org/> and
>> OpenSpending <http://openspending.org/>
>> Projects of the Open Knowledge Foundation <http://okfn.org/>
>> Support our work <http://okfn.org/support/>.
>>
>>
>> _______________________________________________
>> openspending mailing list
>> openspending at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/openspending
>> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>>
>>
>> _______________________________________________
>> openspending mailing list
>> openspending at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/openspending
>> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>>
>>
>
>
> --
> *Project Coordinator*
> School of Data <http://schoolofdata.org/> and
> OpenSpending <http://openspending.org/>
> Projects of the Open Knowledge Foundation <http://okfn.org/>
> Support our work <http://okfn.org/support/>.
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
--
@smithsam
More information about the openspending
mailing list