[Open-Legislation] docfnord

Stefan Sels stefan at sels.com
Mon Jan 10 23:38:08 UTC 2011


Hi,

just my two cents. I was suggesting we need some

*2utf8+markup. * would be PDF, DOC(x), TXT, you name it

That is a lot of work and I hope we can reuse some other code for that.

PDF2utf8+markup would be good in anycase. All those crappy PDFs are 
written in colons and footnotes make it hard to reference.

But maybe we can generate some XSLT or similar to perform the conversion.
Probably some PDF2HTML->Magic->XML/UTF8 would be great.

I hope we dont have to OCR anything :)

greetings from Cologne,

   Stefan



More information about the open-legislation mailing list