[annotator-dev] Is there a way to annotate PDFs using annotator?

Randall Leeds randall at bleeds.info
Mon May 16 07:31:04 UTC 2016

Cross-posting my response to the Hypothesis mailing list, "h-dev". This
will be a little long :).

On Sun, May 15, 2016 at 10:21 PM Abdullah Bakhach <abdullah at gtl.io> wrote:

> I've been asked by a client to provide a solution that annotates both HTML
> pages as well as PDFs. I found Annotator to be the best for html
> annotating, but I was wondering if it annotates PDFs as well?

Annotator doesn't ship with any support for PDF directly. However, I'm
aware of a few others projects that desire this feature and it is possible.
I'm motivated to help, but limited on available time.

>  There is the PDF.js project that does browser PDF rendering and
> annotation.. I was just wondering if I can do it so that I can annotate
> both html and PDF using a single method rather than using two different
> API's from two different libraries.

An implementation of this exists in the Hypothesis project [1]. It depends
on a pretty heavily modified Annotator [2], overriding core methods of
Annotator v1.2.x. This Annotator maintains a set of objects that associate
an annotation with one of its target references and, where possible, the
corresponding document range and highlight(s). These objects are known as
anchors. A corresponding method named "anchor" [3] creates them and
"detach" destroys them.

The anchor method ensures that the anchor set contains exactly one entry
for each target reference in a given annotation. It then attempts to
resolve unresolved targets to a document range, creating corresponding
highlights. This process relies upon several external modules authored for
this purpose that are already independent of Annotator and Hypothesis.

These changes to Annotator make anchoring idempotent, allowing the
application to handle the dynamic rendering and destroying of PDF.js pages.
A separate plugin integrates the PDF.js aware anchoring utilities with the
PDF.js rendering events [4]. The approach should work for other dynamic
content, but the only integration I know about is this one, with PDF.js

Annotator does not provide this because it requires breaking changes. The
master branch of Annotator is intended to become a version 2.0 for this and
other reasons.

The Annotator master branch could be made to work with PDF.js by adopting
some of the code from Hypothesis, but it would help if some of this
Hypothesis code were separately packaged. There are a few goals we might
all work toward:

- Ship a minimal Annotator core
- Provide advanced anchoring features to Annotator
- Migrate Hypothesis, which would remove Annotator bloat
- Allow (not require) Annotator compatibility with old data

The core of Annotator now is tiny. It's not more than a Promise-based hook
system and some auxiliary utilities. Blessing a set of core hooks, like
"detach" and "anchor", might make the core into a minimal viable annotation
core. Meanwhile, the UI package can continue to provide an application
around life-cycle hooks that mimic Annotator v1.2.x events.

Separately packaging a core, an extensible user interface, and the
anchoring system(s) would unlock all the goals.

Hypothesis could stop requiring aggressive overrides, Annotator could
inherit the work on anchoring, and Annotator's UI could persist as long as
community wishes to maintain it. Over time, hooks from the UI might become
part of the core as the community understanding of common needs develops.

This is the best understanding I have now, but I will be taking a look at
Annotator this week with fresh eyes and think about what next steps are.

[2] https://github.com/hypothesis/h/tree/master/h/static/scripts/annotator
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/annotator-dev/attachments/20160516/97cf64f0/attachment-0004.html>

More information about the annotator-dev mailing list