[okfn-help] IRC meeting

Rufus Pollock rufus.pollock at okfn.org
Thu Apr 17 18:52:56 BST 2008


On 16/04/08 18:40, Iain Emsley wrote:
> Thanks. Will start posting soon once I've done some more research and 
> had time to clear some projects, like the first pass of Milton.
> 
> A quick q on the removing of the headers and footers. I've got 98% of 
> the Milton header's removed but there are a couple of texts with 
> sentences/paragraphs which start "produced by " but if I add that phrase 
> into the get_header_start, it removes the entire text as well which 
> largely defeats the object of the exercise. Any ideas on how to  remove 

This is probably because it is matching at the end as well as at the 
beginning ...

> these? I'm wondering if I might try a copy of the get_footer_start with 
> the use of min (on the assumption that self._find_max is a greedy match) 
> a little later this evening.

Yes you could change from greedy matching (taking the largest line at 
which it matches in header and lowest line at which it matches in 
footer) to something else.

> Also, would you like me to look at the lineno generation in the next few 
> weeks and try and make some headway towards XML creation?

I'm wondering how crucial the line numbering is in the sense that it is 
trivial to generate line numbers once one has the text in a proper 
'source' form (just count how many lines have passed). It would only 
matter if there were some canonical line numbering that, for example, 
included whitespace. In that case one might want to explicitly preserve 
that line numbering. I added line numbering into the shakespeare just as 
an assistance to readers as well as to allow the concordance to work and 
neither of these are great reasons to continue with it :) (after all we 
should move to xapian or the like for searching rather than having a 
concordance).

~rufus





More information about the okfn-help mailing list