[annotator-dev] default analyzer for tags field

Randall Leeds tilgovi at hypothes.is
Thu Aug 1 00:34:20 UTC 2013


I believe he's suggesting to use an analyzer without stop words.

An easy solution would be to set it to not_analyzed, which is still
search-able as opaque tokens. It means that searches for "foo" would not
turn up things tagged "foo bar" but that might be the right idea.

Alternatively, we could set up an analyzer with a tokenizer special purpose
so that, for example, "foo_bar" and "foo bar" would be the same, and would
both be indexed as both "foo" and "bar".


On Wed, Jul 31, 2013 at 5:06 PM, Andrew Magliozzi <andrew at finalsclub.org>wrote:

> Hey Gregly,
>
> Roughly what might you suggest we do to improve upon our current schema?
>
> Warmly,
> Andrew
>
>
>
>
> On Jul 31, 2013, at 6:49 PM, Randall Leeds <tilgovi at hypothes.is> wrote:
>
> My guess would be that there was no intention in particular here.
>
>
> On Tue, Jul 30, 2013 at 4:33 PM, Gergely, Ujvari <ujvari at hypothes.is>wrote:
>
>>  Hello!
>>
>> I've a theoretical question about how should the tag index work.
>>
>> The tags field is defined as this in the annotation.py:
>>
>> 'tags': {'type': 'string', 'index_name': 'tag'}
>>
>> But no analyzer was set up for the search, so ES uses it's own analyzer
>> which by default ignores searches to common stopwords for example:
>>
>> "a", "an", "and", "are", "as", "at", "be", "but", "by",
>>   "for", "if", "in", "into", "is", "it",
>>   "no", "not", "of", "on", "or", "such",
>>   "that", "the", "their", "then", "there", "these",
>>   "they", "this", "to", "was", "will", "with"
>>
>>  This means that searching to these stopwords do not give back search
>> results.
>>
>> My question: is this an intentional decision to avoid using trivial tags?
>> If yes, wouldn't it make sense to not let create this tags if they're not
>> that searchable?
>>
>> Thanks
>> Gergely
>>
>>
>> _______________________________________________
>> annotator-dev mailing list
>> annotator-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/annotator-dev
>> Unsubscribe: http://lists.okfn.org/mailman/options/annotator-dev
>>
>>
> _______________________________________________
> annotator-dev mailing list
> annotator-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/annotator-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/annotator-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/annotator-dev/attachments/20130731/3c4bfd36/attachment-0002.html>


More information about the annotator-dev mailing list