[ckan-dev] full text search and sql

William Waites ww at styx.org
Sun Feb 6 23:54:25 UTC 2011


So working a bit with James on the harvesting, he said to
me, "even when the harvesting works i don't see any packages".
Very strange. They show up in the version history... They
show up in the API... They do not show up in the web interface...

Now why is that?

There were two problems... The home page and package list
were changed to use the full-text search a few days ago.
This is only really tested with Solr and enables nice
facet-based stuff.

Problem #1: the FTS index wasn't getting built properly.
When a package is added by the harvester command line 
tool, the index tries to build (INSERT) before the package
is created (or committed). Maybe it's happening in another
thread, maybe it's really happening out of order. Not 
entirely sure since the trigger for the indexing is a bit
magic (I followed it through a few times before but have
forgotten how it works each time since). So the immediate
kludge for the meetings and dog and pony show tomorrow is 
to just rebuild the search index after the harvesting run.
Might this have something to do with a change in the 
semantics of magic between sqlalchemy 0.4 and 0.6?

Problem #2: the SQL FTS index doesn't seem to properly
handle a query of the form '*:*'. It returns no results.
But, this is now what the home page and the package list
controllers do. As an interim measure I made the home 
controller count packages in the traditional way using
count() so at least it doesn't lie frighteningly and tell
you that there are 0 packages available. And for the rest,
some canned queries, 'Photography', 'Plan' that will pull
out some known datasets since specific queries do still
work with the SQL FTS index. I don't know yet (hopefully
will find out tomorrow) if this might partially be an
artefact of my development environment running postgresql
9.0 instead of 8.x series or if this will happen everywhere.

Where to from here? Do we figure out how to fix the SQL
search back-end? Do we abandon it in favour of Solr?

That's all for now.

-w
-- 
William Waites                <mailto:ww at styx.org>
http://river.styx.org/ww/        <sip:ww at styx.org>
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45




More information about the ckan-dev mailing list