[Open-Legislation] docfnord
Stefan Sels
stefan at sels.com
Mon Jan 10 23:38:08 UTC 2011
Hi,
just my two cents. I was suggesting we need some
*2utf8+markup. * would be PDF, DOC(x), TXT, you name it
That is a lot of work and I hope we can reuse some other code for that.
PDF2utf8+markup would be good in anycase. All those crappy PDFs are
written in colons and footnotes make it hard to reference.
But maybe we can generate some XSLT or similar to perform the conversion.
Probably some PDF2HTML->Magic->XML/UTF8 would be great.
I hope we dont have to OCR anything :)
greetings from Cologne,
Stefan
More information about the open-legislation
mailing list