[humanities-dev] [open-humanities] The importance of search

Laura James laura.james at okfn.org
Fri Feb 24 13:01:59 UTC 2012


Another tricksy element of search for scholarly collaboration
platforms is permissions. We have open texts, with a variety of open
and restricted annotation, and potentially, comments on annotations.
Consider the case of an instructor who asks her class to annotate a
text. The instructor them wants to comment on the annotations; perhaps
some comments are shared with the class "novel interpretation, what do
others think?" and maybe some are not "this is below your usual
standard"  - just shared with one student.  Perhaps we also have
additional social media type data - things I've read, shared,
bookmarked, where each such action might be private, globally
readable, or restricted to group access of some sort.

Now, when we come to search, we probably want to search
annotations/comments/bookmarks...  Does the search result contain
everything with the search term, even if the user cannot *see* the
contents? ("Bob annotated this document but you don't have permission
to see what he wrote; click here to ask Bob for access") Or does the
results page only contain items the user can see? Either way, we get
into a lot of auth checking over a sparse content set.

This is all very familar from my days working on Sakai OAE
http://sakaiproject.org/node/2239 and it's a big nasty problem.  There
are some great blog posts from my old colleague Ian Boston (deeply
technicaly) http://blog.tfd.co.uk/?s=search around this issue...

Best regards,

Laura

--

Dr Laura James
Foundation Coordinator, Open Knowledge Foundation

http://okfn.org

On 24 February 2012 11:58, Jonathan Gray <j.gray at cantab.net> wrote:
> Extremely useful comments - thank you very much Nick! :-)
>
> On Fri, Feb 24, 2012 at 11:37 AM, Nick Stenning <nick at whiteink.com> wrote:
>> Hi Jonathan,
>>
>> Of course, you're absolutely right. Being able to do the searches you
>> describe would be an incredibly powerful tool for a scholar -- right
>> with you on this one. I just wanted to add a few comments on the
>> technology:
>>
>> 1) Search is easy. We now have tools (Lucene/ElasticSearch/Solr) that
>> basically solve all the "hard" problems of search: tokenization,
>> indexing, adjust-at-runtime scoring, etc.
>>
>> 2) Search is really, really hard. Of course, the "hard" problems I've
>> just described really aren't all that hard. Most importantly, they're
>> concrete, which means that at least once you've designed an algorithm,
>> the answer to the question "does it work?" is usually either "yes" or
>> "no," rather than "well, sort of, but about 1/3 of the time it does
>> something a bit funny." The hard stuff is the fluffy and intangible
>> search heuristic.
>>
>> So, to pick up on that, I wanted to emphasise that in order to
>> effectively solve the problems you describe in your email, the single
>> most important thing for TEXTUS/A. N. Other Tool to understand is
>> *context*.
>>
>> "being able to ... see all the times Nietzsche mentions Novalis"
>>
>> Here, the hidden heuristic is "search only documents written by
>> Nietzsche" -- this would be trivial to implement manually, right? Just
>> require the user to type in "author:Nietzsche". But a) for people with
>> less unusual names, this doesn't uniquely identify them, potentially
>> generating many spurious results, and b) this could be a simple "same
>> author" checkbox. A simple heuristic that says "users frequently want
>> to search works of the author they are currently reading" helps out a
>> lot.
>>
>> You can go much further with this, and I'd suggest you do, by building
>> a system that implements simple (but overridable) heuristics that
>> reflect what users *usually* do. In addition, context is important in
>> reverse. Don't just give people links to documents that match, give
>> them (as Google frequently does) the matching extract itself, in
>> context.
>>
>> So, that's just a few thoughts about what I think is usually missing
>> from the kinds of search system you describe. I'd say that designing
>> your system falls into two stages: first, identifying exactly what
>> kinds of searches people really do most frequently, and second,
>> attempting to design a search that embraces those heuristics, while
>> remaining general and flexible.
>>
>> No mean feat, I might add.
>>
>> -N
>>
>>
>>
>>
>>
>>
>> On Thu, Feb 23, 2012 at 22:40, Jonathan Gray <j.gray at cantab.net> wrote:
>>> I've just been doing various bits of academic reading and writing, and
>>> it has just struck me with a force bigger and mightier than ever
>>> before: the importance of search. Such an important thing for TEXTUS
>>> to get right.
>>>
>>> For example, being able to do things like see all the times Nietzsche
>>> mentions Novalis. Or to find bits where Herder talks about the French
>>> revolution. Or to see who actually read or cited works by Frederick
>>> the Great. Especially if we can enable people to do (ever more)
>>> comprehensive searches across a given thinker's corpus. Having more
>>> and more letters and manuscripts in the system would mean this could
>>> be fantastically useful.
>>>
>>> It might be a trivial thing which we know how to flawlessly implement,
>>> or it might be a really difficult, totally non-trivial thing that
>>> loads of people have struggled with, but thought it was worth putting
>>> down my book and writing an email about due to the level of importance
>>> I now think getting this right has. ;-)
>>>
>>> One possibly non-obvious thing I thought of was the idea that if you
>>> search for 'Nietzsche' or another philosopher that we have data for in
>>> a given text or collection, the system could cunningly give you the
>>> option for searching for works by Nietzsche as well (or - two steps
>>> ahead - ambiently give you the results of such a search). I'm sure
>>> this would entail nightmarish semanticisation or technical acrobatics
>>> beyond the scope of this project, but 'just sayin' how cool it would
>>> be.
>>>
>>> J.
>>>
>>> --
>>> Jonathan Gray
>>> http://jonathangray.org
>>>
>>> _______________________________________________
>>> open-humanities mailing list
>>> open-humanities at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-humanities
>>
>> _______________________________________________
>> open-humanities mailing list
>> open-humanities at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-humanities
>
>
>
> --
> Jonathan Gray
> http://jonathangray.org
>
> _______________________________________________
> open-humanities mailing list
> open-humanities at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-humanities




More information about the humanities-dev mailing list