[ddj] Replacement for Needlebase?

Dan Nguyen Dan.Nguyen at propublica.org
Thu Mar 1 20:03:50 UTC 2012


Michelle mentioned this in her tipsheet on scraping without programming,
but I thought the "Developer Tools" part should be emphasized. Among those
"dev tools" is something called the "Web inspector" which is built into
all major modern browsers (even IE8+). It's a point-and-click interface
for inspecting a website, and it's useful to both HTML newbies and experts.

Its relevance to those who wish to scrape is that:
1) Some sites (Flash in particular) read data from external files. You can
skip the scraping part by accessing the external file and reading the raw
XML/JSON/TXT
2) If you want to use something like the Scraper Chrome tool, you need to
know how to identify tags and elements. Using the Web inspector makes this
much easier.

I cannot emphasize enough the importance of using the Web Inspector tool,
both from its power and its relative ease of use. And again, it's already
built into your browser.


Here is my overview page for my Web inspector session I taught at NICAR:
http://bit.ly/catscrape

It's not too useful in itself (I may add to it later), but there are a few
links that provide walkthroughs of the tools.

Here is how I used it for our Dollars for Docs investigation:
http://www.propublica.org/nerds/item/reading-flash-data


Here's a multi-part guide I wrote about the web-inspector and scraping in
general (it's part of a book about ruby but the Web inspector part doesn't
involve programming)
http://ruby.bastardsbook.com/chapters/web-inspecting-html/





Daniel Nguyen

dan.nguyen at propublica.org
917-512-0224
http://twitter.com/dancow








On 3/1/12 2:40 PM, "M Edward Borasky" <znmeb at znmeb.net> wrote:

>ScraperWiki is definitely "programming", but if you have access to a
>Ruby or Python programmer I'd recommend it highly.
>
>On Thu, Mar 1, 2012 at 6:26 AM,  <SMachlis at computerworld.com> wrote:
>> There was a session about "Web scraping without programming" at last
>>week's NICAR (National Institute for Computer Assisted Reporting in the
>>U.S.) conference. Michelle Minkoff of the Associated Press posted links
>>to the presentation materials here:
>>
>> 
>>http://michelleminkoff.com/2012/02/27/teaching-materials-from-nicar-2012/
>>
>> Sharon
>> --
>> Sharon Machlis
>> Online Managing Editor
>> Computerworld
>> smachlis at computerworld.com
>> Twitter: sharon000
>>
>>
>> ________________________________________
>> From: data-driven-journalism-bounces at lists.okfn.org
>>[data-driven-journalism-bounces at lists.okfn.org] On Behalf Of Antti Jogi
>>Poikola [antti.poikola at gmail.com]
>> Sent: Thursday, March 01, 2012 2:32 AM
>> To: List about Data Driven Journalism and Open Data in Journalism."
>> Subject: [ddj] Replacement for Needlebase?
>>
>> Hi,
>>
>> I was wondering if there are any suitable
>>"non-programming-screenscraping" -tools to replace Needlebase which will
>>be closed on 1st of June?
>>
>> BR, Jogi
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>
>
>
>-- 
>Twitter: http://twitter.com/znmeb Data Journalism Developer Studio
>2012LX http://j.mp/DJDS2012LX
>
>"A mathematician is a device for turning coffee into theorems." -- Paul
>Erdős
>
>_______________________________________________
>data-driven-journalism mailing list
>data-driven-journalism at lists.okfn.org
>http://lists.okfn.org/mailman/listinfo/data-driven-journalism





More information about the data-driven-journalism mailing list