[okfn-help] Trick for transcribing data from Treasury PDFs

Rufus Pollock rufus.pollock at okfn.org
Mon Dec 14 12:44:14 GMT 2009


Dear Will,

Thanks for this excellent suggestion which I hope you won't mind me
cc'ing to okfn-help (I suggest you send stuff like this there in
future if you're happy to ...).

Rufus

2009/12/12 William Waites <wwaites at googlemail.com>:
> A little trick from http://ouseful.wordpress.com/, getting google spreadsheets
> to import a table.
>
> Use http://www.adobe.com/products/acrobat/access_onlinetools.html to convert
> the PDF to HTML - doesn't work with all PDFs, but some of them are fine.
>
> Then do something like,
>
> =importHtml("http://access.adobe.com/access/getStatus.do?jobid=885811aa-a3b8-42c8-97ba-7efbb938cb45&srcPdfUrl=http://www.hm-treasury.gov.uk/d/SBI_part1_584.pdf&convertTo=html&visuallyImpaired=noreader&preferHTMLReason=&platform=other&comments=&starttime=1260618027921",
> "table", 4)
>
> in a spreadsheet cell. The URL is where Adobe puts the HTML, no idea how
> long this sticks around. That particular call grabs the "Resource Budget DEL and
> AME" from the SBI 2003-4 Part 1 PDF.
>
> Cheers,
> -w
>
> --
> William Waites
> Email: wwaites at gmail.com
> UK tel: +44 131 516 3563
> UK mob: +44 789 798 9965
>



-- 
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/



More information about the okfn-help mailing list