[ckan-dev] Is CKAN suitable for textual search in a 10Gb dataset?

Dominik Moritz dominik.moritz at okfn.org
Wed Apr 9 18:11:24 UTC 2014


On Apr 9, 2014, at 7:08, Andrés Martano <andres at inventati.org> wrote:

> At IRC, rossjones, found me this line that enforced the limit:
> https://github.com/okfn/messytables/blob/3489e43bbae0b8eb35d7e4203a4b3aaf66bd88cb/messytables/commas.py#L159
> 
> Editing it, I got no error in the push interface. But it only pushed 5851 lines of about 160K. Any ideas why?
> I replaced all " in the text files with "" before inserting them in the CSV.

You can always just use the insert API http://docs.ckan.org/en/latest/maintaining/datastore.html?highlight=datastore#ckanext.datastore.logic.action.datastore_upsert

> 
> I tried to use the search, in the web "Grid" preview, but there are many words that I can see, in the preview, but it can't find.
> Same with the datastore_search API.
> I had the same problem with a 3Mb CSV.

The search does not do a full test search. You have to write custom sql queries using http://docs.ckan.org/en/latest/maintaining/datastore.html?highlight=datastore#ckanext.datastore.logic.action.datastore_search_sql and http://www.postgresql.org/docs/8.3/static/textsearch.html. 

> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140409/4633e00f/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140409/4633e00f/attachment-0003.sig>


More information about the ckan-dev mailing list