[ckan-dev] Semantic Search - Extension

Sven R. Kunze sven.kunze at s2007.tu-chemnitz.de
Fri Nov 30 10:44:48 UTC 2012


Hi guys,

I was currently thinking about trying to use SOLR for indexing the  
following data:
- list of URIs indicating the vocabularies, predicates, classes and  
entities (for topic search)
- latitude, longitude, radius (for geo search)
- min time, max time (for time search)

However, I would face the following issue:
- how can I assure, that e.g. subclasses of a class will also be found  
although it is not mention in that RDF dataset explicitly
=> materialization could do that, but that is the business of a triplestore
- so I could let a triplestore create all additional triples and got them  
indexed
- however forward-chaining is not the best way as it causes severe issues  
when updating (rebuilding the complete closure, indexes etc.)
=> therefore, I'd like to handle the filtering myself via SPARQL queries  
(where backward chaining can be done)

Implications:
The issue with the post-filtering (as the extension works now) is that the  
facets aren't updated correctly.
So, pre-filtering would be more adequate.
Is there a way to pass a list of relevant items to SOLR?

The idea is that a triplestore could (or could not defined by the admin)  
filter out the datasets that match the specified search criteria (topic,  
geo, time) and SOLR could run its regular search based on that  
pre-filtered list of datasets.

One more thing: if the described approach worked, could it also be  
possible to pre-sort the list for SOLR (not by an index value but by a  
pre-sorting the given list)?

Cheers,
Sven




More information about the ckan-dev mailing list