No subject
Sun Dec 12 18:29:16 UTC 2010
people will be wanting to do with TEXTS so that we can give them an
interface that matches that - something for the next user requirements
workshop!
@Jonathan - could you add the kind of search that you were talking about to
the user stories on the Wiki please?
Sam
On Fri, Feb 24, 2012 at 11:37 AM, Nick Stenning <nick at whiteink.com> wrote:
> Hi Jonathan,
>
> Of course, you're absolutely right. Being able to do the searches you
> describe would be an incredibly powerful tool for a scholar -- right
> with you on this one. I just wanted to add a few comments on the
> technology:
>
> 1) Search is easy. We now have tools (Lucene/ElasticSearch/Solr) that
> basically solve all the "hard" problems of search: tokenization,
> indexing, adjust-at-runtime scoring, etc.
>
> 2) Search is really, really hard. Of course, the "hard" problems I've
> just described really aren't all that hard. Most importantly, they're
> concrete, which means that at least once you've designed an algorithm,
> the answer to the question "does it work?" is usually either "yes" or
> "no," rather than "well, sort of, but about 1/3 of the time it does
> something a bit funny." The hard stuff is the fluffy and intangible
> search heuristic.
>
> So, to pick up on that, I wanted to emphasise that in order to
> effectively solve the problems you describe in your email, the single
> most important thing for TEXTUS/A. N. Other Tool to understand is
> *context*.
>
> "being able to ... see all the times Nietzsche mentions Novalis"
>
> Here, the hidden heuristic is "search only documents written by
> Nietzsche" -- this would be trivial to implement manually, right? Just
> require the user to type in "author:Nietzsche". But a) for people with
> less unusual names, this doesn't uniquely identify them, potentially
> generating many spurious results, and b) this could be a simple "same
> author" checkbox. A simple heuristic that says "users frequently want
> to search works of the author they are currently reading" helps out a
> lot.
>
> You can go much further with this, and I'd suggest you do, by building
> a system that implements simple (but overridable) heuristics that
> reflect what users *usually* do. In addition, context is important in
> reverse. Don't just give people links to documents that match, give
> them (as Google frequently does) the matching extract itself, in
> context.
>
> So, that's just a few thoughts about what I think is usually missing
> from the kinds of search system you describe. I'd say that designing
> your system falls into two stages: first, identifying exactly what
> kinds of searches people really do most frequently, and second,
> attempting to design a search that embraces those heuristics, while
> remaining general and flexible.
>
> No mean feat, I might add.
>
> -N
>
>
>
>
>
>
> On Thu, Feb 23, 2012 at 22:40, Jonathan Gray <j.gray at cantab.net> wrote:
> > I've just been doing various bits of academic reading and writing, and
> > it has just struck me with a force bigger and mightier than ever
> > before: the importance of search. Such an important thing for TEXTUS
> > to get right.
> >
> > For example, being able to do things like see all the times Nietzsche
> > mentions Novalis. Or to find bits where Herder talks about the French
> > revolution. Or to see who actually read or cited works by Frederick
> > the Great. Especially if we can enable people to do (ever more)
> > comprehensive searches across a given thinker's corpus. Having more
> > and more letters and manuscripts in the system would mean this could
> > be fantastically useful.
> >
> > It might be a trivial thing which we know how to flawlessly implement,
> > or it might be a really difficult, totally non-trivial thing that
> > loads of people have struggled with, but thought it was worth putting
> > down my book and writing an email about due to the level of importance
> > I now think getting this right has. ;-)
> >
> > One possibly non-obvious thing I thought of was the idea that if you
> > search for 'Nietzsche' or another philosopher that we have data for in
> > a given text or collection, the system could cunningly give you the
> > option for searching for works by Nietzsche as well (or - two steps
> > ahead - ambiently give you the results of such a search). I'm sure
> > this would entail nightmarish semanticisation or technical acrobatics
> > beyond the scope of this project, but 'just sayin' how cool it would
> > be.
> >
> > J.
> >
> > --
> > Jonathan Gray
> > http://jonathangray.org
> >
> > _______________________________________________
> > open-humanities mailing list
> > open-humanities at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/open-humanities
>
--
Sam Leon
Community Coordinator
Open Knowledge Foundation
http://okfn.org/
Skype: samedleon
--0016e6de00d6fe897904b9b59a68
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Jonathan, Nick, Laura, Open=A0Humanists,<div><br></div><div>Really interest=
ing points.</div><div><br></div><div>Lack of good search capability on the =
Internet Archive given the unreliability of its OCR scans was something I s=
orely missed during my dissertation.</div>
<div><br></div><div>As Nick says much of the very useful searches we want t=
o do will be trivial to implement. the fact is I've never seen any plat=
form that allows you to search for a given word across the whole body of an=
authors work.</div>
<div><br></div><div>From a UI perspective it would be good to nail down the=
kind of searches people will be wanting to do with TEXTS so that we can gi=
ve them an interface that matches that - something for the next user requir=
ements workshop!</div>
<div><br></div><div>@Jonathan - could you add the kind of search that you w=
ere talking about to the user stories on the Wiki please?</div><div><br></d=
iv><div>Sam</div><div><br><div class=3D"gmail_quote">On Fri, Feb 24, 2012 a=
t 11:37 AM, Nick Stenning <span dir=3D"ltr"><<a href=3D"mailto:nick at whit=
eink.com">nick at whiteink.com</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi Jonathan,<br>
<br>
Of course, you're absolutely right. Being able to do the searches you<b=
r>
describe would be an incredibly powerful tool for a scholar -- right<br>
with you on this one. I just wanted to add a few comments on the<br>
technology:<br>
<br>
1) Search is easy. We now have tools (Lucene/ElasticSearch/Solr) that<br>
basically solve all the "hard" problems of search: tokenization,<=
br>
indexing, adjust-at-runtime scoring, etc.<br>
<br>
2) Search is really, really hard. Of course, the "hard" problems =
I've<br>
just described really aren't all that hard. Most importantly, they'=
re<br>
concrete, which means that at least once you've designed an algorithm,<=
br>
the answer to the question "does it work?" is usually either &quo=
t;yes" or<br>
"no," rather than "well, sort of, but about 1/3 of the time =
it does<br>
something a bit funny." The hard stuff is the fluffy and intangible<br=
>
search heuristic.<br>
<br>
So, to pick up on that, I wanted to emphasise that in order to<br>
effectively solve the problems you describe in your email, the single<br>
most important thing for TEXTUS/A. N. Other Tool to understand is<br>
*context*.<br>
<br>
"being able to ... see all the times Nietzsche mentions Novalis"<=
br>
<br>
Here, the hidden heuristic is "search only documents written by<br>
Nietzsche" -- this would be trivial to implement manually, right? Just=
<br>
require the user to type in "author:Nietzsche". But a) for people=
with<br>
less unusual names, this doesn't uniquely identify them, potentially<br=
>
generating many spurious results, and b) this could be a simple "same<=
br>
author" checkbox. A simple heuristic that says "users frequently =
want<br>
to search works of the author they are currently reading" helps out a<=
br>
lot.<br>
<br>
You can go much further with this, and I'd suggest you do, by building<=
br>
a system that implements simple (but overridable) heuristics that<br>
reflect what users *usually* do. In addition, context is important in<br>
reverse. Don't just give people links to documents that match, give<br>
them (as Google frequently does) the matching extract itself, in<br>
context.<br>
<br>
So, that's just a few thoughts about what I think is usually missing<br=
>
from the kinds of search system you describe. I'd say that designing<br=
>
your system falls into two stages: first, identifying exactly what<br>
kinds of searches people really do most frequently, and second,<br>
attempting to design a search that embraces those heuristics, while<br>
remaining general and flexible.<br>
<br>
No mean feat, I might add.<br>
<br>
-N<br>
<div><div class=3D"h5"><br>
<br>
<br>
<br>
<br>
<br>
On Thu, Feb 23, 2012 at 22:40, Jonathan Gray <<a href=3D"mailto:j.gray at c=
antab.net">j.gray at cantab.net</a>> wrote:<br>
> I've just been doing various bits of academic reading and writing,=
and<br>
> it has just struck me with a force bigger and mightier than ever<br>
> before: the importance of search. Such an important thing for TEXTUS<b=
r>
> to get right.<br>
><br>
> For example, being able to do things like see all the times Nietzsche<=
br>
> mentions Novalis. Or to find bits where Herder talks about the French<=
br>
> revolution. Or to see who actually read or cited works by Frederick<br=
>
> the Great. Especially if we can enable people to do (ever more)<br>
> comprehensive searches across a given thinker's corpus. Having mor=
e<br>
> and more letters and manuscripts in the system would mean this could<b=
r>
> be fantastically useful.<br>
><br>
> It might be a trivial thing which we know how to flawlessly implement,=
<br>
> or it might be a really difficult, totally non-trivial thing that<br>
> loads of people have struggled with, but thought it was worth putting<=
br>
> down my book and writing an email about due to the level of importance=
<br>
> I now think getting this right has. ;-)<br>
><br>
> One possibly non-obvious thing I thought of was the idea that if you<b=
r>
> search for 'Nietzsche' or another philosopher that we have dat=
a for in<br>
> a given text or collection, the system could cunningly give you the<br=
>
> option for searching for works by Nietzsche as well (or - two steps<br=
>
> ahead - ambiently give you the results of such a search). I'm sure=
<br>
> this would entail nightmarish semanticisation or technical acrobatics<=
br>
> beyond the scope of this project, but 'just sayin' how cool it=
would<br>
> be.<br>
><br>
> J.<br>
><br>
> --<br>
> Jonathan Gray<br>
> <a href=3D"http://jonathangray.org" target=3D"_blank">http://jonathang=
ray.org</a><br>
><br>
</div></div>> _______________________________________________<br>
> open-humanities mailing list<br>
> <a href=3D"mailto:open-humanities at lists.okfn.org">open-humanities at list=
s.okfn.org</a><br>
> <a href=3D"http://lists.okfn.org/mailman/listinfo/open-humanities" tar=
get=3D"_blank">http://lists.okfn.org/mailman/listinfo/open-humanities</a><b=
r>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><font color=
=3D"#888888">Sam Leon <br>Community Coordinator<br>Open Knowledge Foundatio=
n=A0<br><a href=3D"http://okfn.org/" target=3D"_blank">http://okfn.org/</a>=
<br>
Skype: samedleon<br></font><br>
</div>
--0016e6de00d6fe897904b9b59a68--
More information about the open-humanities
mailing list