[ECODP-dev] SOLR questions

PASTOR CAMARASA José Juan (OP) Jose.PASTOR-CAMARASA at publications.europa.eu
Wed Oct 9 13:10:45 UTC 2013


Hi John and Bert,
Thanks all this information.
Bert, could you add all this information to the Operational Manual.
That I don't understand is that https://github.com/okfn/ckanext-ecportal/blob/master/ckanext/ecportal/solr/schema.xml is a generic schema of CKAN, if I find about "keyword" or "geographical_coverage"  attribute I don’t find nothing about. There're not a specific definition for the ODP in SOLR?

Très cordialement / Best regards
José Pastor


From: John Glover [mailto:john.glover at okfn.org]
Sent: Wednesday, October 09, 2013 11:27 AM
To: Project list for EC ODP CKAN project
Cc: ZAJAC Agnieszka (OP); PASTOR CAMARASA José Juan (OP); SABETE Vafa (OP)
Subject: Re: [ECODP-dev] SOLR questions

Hi Bert,

Replies inline below.

> In which document we can find information about how we have implemented SOLR in ODP? In the operation manual I don’t find nothing.

I'm not really sure what information is required here. Our Solr schema is in the ckanext-ecportal extension [1], this contains the list of all fields that are currently indexed (and how we have configured Solr to index them). There is also some information  about the multilingual fields and the query parser in the Operations Manual (p. 34). It seems like you would be the best person to comment on the actual deployment aspects.

> Where we can find the configuration used in ODP : wildcard, Boolean operators fuzzy search, range search, search by fields, …

We don't have any special Solr query parsing in CKAN, we basically pass your query straight through to Solr, so this information is best obtained from the Solr documentation [2][3]. More information about our search API parameters is given in the docs [4].

> In which fields we can do a search,

These are listed in our Solr schema [1].

> Where we can find d the list of stop words? They are only in English?

We are not really using any stop words at the moment (the default 'protwords.txt' is used for English, but this is practically empty, containing just two test examples).

> How to search with special character (+ - && || ! ( ) { } [ ] ^ " ~ * ? : \)

Special characters will generally be stripped by our current Solr analyzers at both index and query time, so currently you cannot search for these characters.

[1]: https://github.com/okfn/ckanext-ecportal/blob/master/ckanext/ecportal/solr/schema.xml
[2]: http://wiki.apache.org/solr/SolrQuerySyntax
[3]: http://wiki.apache.org/solr/DisMaxQParserPlugin
[4]: http://docs.ckan.org/en/ckan-1.8.2/apiv3.html#ckan.logic.action.get.package_search


Regards,
John

On 8 October 2013 13:34, Bert Van Nuffelen <bert.van.nuffelen at tenforce.com<mailto:bert.van.nuffelen at tenforce.com>> wrote:
Hi Darwin and John,

here are some solr questions from Jose. Can you answer them:
·        In which document we can find information about how we have implemented SOLR in ODP? In the operation manual I don’t find nothing.
·        Where we can find the configuration used in ODP : wildcard, Boolean operators fuzzy search, range search, search by fields, …
·        In which fields we can do a search,
·        Where we can find d the list of stop words? They are only in English?
·        How to search with special character (+ - && || ! ( ) { } [ ] ^ " ~ * ? : \)

kind regards,

Bert
[JP] ----
From: John Glover [mailto:john.glover at okfn.org]
Sent: Wednesday, October 09, 2013 11:57 AM
To: Project list for EC ODP CKAN project
Cc: PASTOR CAMARASA José Juan (OP); ZAJAC Agnieszka (OP); HOHN Norbert (OP); SABETE Vafa (OP)
Subject: Re: [ECODP-dev] CKAN wild character for search

Hi Bert,

Yes, the poverty vs poverties example will be covered by the stemming in Solr.

In general, wildcard searches are not supported as we use the dismax query parser[1].

However, if you search for a specific field by entering something like "title: pov*" (without the quotes), it will actually be possible. This is because if we detect a ":" character in the query, we fall back to using the default query parser which does support wildcards. But yes, it does not support wildcards at the start of terms.

[1]: http://wiki.apache.org/solr/DisMaxQParserPlugin

Regards,
John

On 3 October 2013 16:38, Bert Van Nuffelen <bert.van.nuffelen at tenforce.com<mailto:bert.van.nuffelen at tenforce.com>> wrote:
Hi John,
It seems I forgot you to include in this conversation.
Can you have a look at it?

kind regards,
Bert

2013/10/3 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com<mailto:bert.van.nuffelen at tenforce.com>>
Hi José,
This should be captured by the stemming in the solr component I assume.
So poverty should return poverties, unless you search for "poverty" (the exacty string).
@John, can you confirm this?
kind regards,
Bert




2013/10/3 PASTOR CAMARASA José Juan (OP) <Jose.PASTOR-CAMARASA at publications.europa.eu<mailto:Jose.PASTOR-CAMARASA at publications.europa.eu>>

Hi Bert,
To do a search in CKAN what is the wild character to do search?
For example if I find about "poverty" or "poverties", what is the search wildcard we would use? I've tried with classical wildcards: pover%, pover?, pover* and I don't have reply.
And I presume we don't have lefty wild card?



Très cordialement / Best regards
José Pastor


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131009/71d2ac89/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 631 bytes
Desc: image001.jpg
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131009/71d2ac89/attachment-0001.jpg>


More information about the ecodp-dev mailing list