[ckan-dev] Deleted packages in search results

Friedrich Lindenberg friedrich at pudo.org
Wed Jan 12 21:48:53 UTC 2011


Hi all,

I'm currently trying to debug an issue that has been brought up by the
HRI folks: in their CKAN instance, they've deleted a large number of
packages and they're using solr indexing. The problem with this is
that both deleted and active packages are indexed, since we want
admins to still search for them (do we?). Filtering for deleted
packages is then done on the result set, while result counts remain
wrong.

My initial approach to fixing this was to do filtering within solr by
passing a list of all packages for which the querying user is an admin
solr in a query such as this:

 +(state:active OR name:my_pkg1 OR name:my_pkg2)

Of course, this doesn't scale, especially for sysadmins which are
admin to all packages. The solr query parser quits at about 1k package
names. I'm now a bit unsure since the only solution I can spot is to
include the list of admins into the index, thus replicating a part of
the authz layer in solr.

Is there a better/smarter/easier way to circumvent this?

 - Friedrich




More information about the ckan-dev mailing list