[open-science] Removing watermarks from pdfs (pdfparanoia)

Tue Feb 5 21:47:34 UTC 2013

On Tue, Feb 5, 2013 at 9:15 PM, Bryan Bishop <kanzure at gmail.com> wrote:

> On Tue, Feb 5, 2013 at 3:09 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>
>> PDF2SVG should be able to do this (http://bitbucket.org/petermr/pdf2svg).
>> It should also remove the side annotations about which library the PDF was
>> downloaded from. Send me one and I'll see.
>>
>
> Is there a svg2pdf? The problem with using pdfquery is that it can only
> generate an xml format, and at first it looks like pdfxml, except Adobe
> came up with a "standard" called pdfxml that looks completely different. So
> getting things back into pdf seems to be difficult.
>
>
I use Apache FOP.  We should be able to:
* read PDF into SVG
* remove the rubbish
* write the primitives back into PDF. We might get font problems so you may
have to make do with PDF/ISO standard 14 fonts. That might screw some of
the microkerning occasionally. If you want to reformat running text and
lose the publishers layout (e.g. 2-col => 1-col then we will use SVGPlus.

Some of this is alpha, not production.

> - Bryan
> http://heybryan.org/
> 1 512 203 0507
>

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20130205/7ee9bfad/attachment-0001.html>