[ckan-dev] Is CKAN suitable for textual search in a 10Gb dataset?

Dominik Moritz dominik.moritz at okfn.org
Wed Apr 9 06:58:54 UTC 2014


On Apr 8, 2014, at 6:25, Andrés Martano <andres at inventati.org> wrote:

> Thanks for the answers.
> 
> 
> Em 07-04-2014 17:47, Dominik Moritz escreveu:
>> Since the datastore uses Postgres, you might be able to use its full text search. See http://www.postgresql.org/docs/9.3/static/textsearch.html
> Do the datastore always uses Postgres for the queries?
> I thought it would be better to use Solr, since it's already installed...
> Is there a way to use Solr to do the queries?

The datastore always uses postgres which is usually the database for storage in CKAN anyway. Solr is for search in the metadata. I am not sure how good it fits for search in large amounts of textdata. In any case you will need to write a significant amount of custom code. 

> 
>> I'm not sure how good this will work for queries across tables, though. 
> I plan on just pushing a huge CSV to the datastore.
> How do I know if the datastore will create cross tables?

I'm not sure what you mean by cross table. What I meant (and I probably didn't say that very well) is that the sql search is easier to write if it only spans one table. If you want to query multiple tables, you have to join or union the results. 

> 
> 
> Em 07-04-2014 17:59, Vitor Baptista escreveu:
>> Yes. There's an issue/pull request to improve our previews at https://github.com/ckan/ckan/pull/1251, allowing a single resource to have multiple different previews (called now resource views). It's still work in progress, but we've already created a few extensions using it, like github.com/ckan/ckanext-basiccharts and github.com/ckan/ckanext-mapviews.
> Great! I will study these extensions and see if I can mimic them.
> 
>> Yes, it's possible using the IRoutes plugin interface. Check the docs at http://docs.ckan.org/en/latest/extensions/plugin-interfaces.html#ckan.plugins.interfaces.IRoutes and an usage example by the DataStore at https://github.com/ckan/ckan/blob/master/ckanext/datastore/plugin.py#L226-L230 
> Nice. Is there a reference where I can read more about "Routes map object"?
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140408/62646653/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140408/62646653/attachment-0003.sig>


More information about the ckan-dev mailing list