[okfn-help] IRC meeting
Rufus Pollock
rufus.pollock at okfn.org
Thu Apr 17 17:52:56 UTC 2008
On 16/04/08 18:40, Iain Emsley wrote:
> Thanks. Will start posting soon once I've done some more research and
> had time to clear some projects, like the first pass of Milton.
>
> A quick q on the removing of the headers and footers. I've got 98% of
> the Milton header's removed but there are a couple of texts with
> sentences/paragraphs which start "produced by " but if I add that phrase
> into the get_header_start, it removes the entire text as well which
> largely defeats the object of the exercise. Any ideas on how to remove
This is probably because it is matching at the end as well as at the
beginning ...
> these? I'm wondering if I might try a copy of the get_footer_start with
> the use of min (on the assumption that self._find_max is a greedy match)
> a little later this evening.
Yes you could change from greedy matching (taking the largest line at
which it matches in header and lowest line at which it matches in
footer) to something else.
> Also, would you like me to look at the lineno generation in the next few
> weeks and try and make some headway towards XML creation?
I'm wondering how crucial the line numbering is in the sense that it is
trivial to generate line numbers once one has the text in a proper
'source' form (just count how many lines have passed). It would only
matter if there were some canonical line numbering that, for example,
included whitespace. In that case one might want to explicitly preserve
that line numbering. I added line numbering into the shakespeare just as
an assistance to readers as well as to allow the concordance to work and
neither of these are great reasons to continue with it :) (after all we
should move to xapian or the like for searching rather than having a
concordance).
~rufus
More information about the okfn-help
mailing list