[ckan-dev] Search inside data files

David Read david.read at hackneyworkshop.com
Mon Mar 7 14:40:05 UTC 2016


Vangelis kindly responded by mentioning the main technologies that
dataopen.eu is based on:

Apache Tika, MariaDB, Sphinx Search & Redis

The back-end glue is Scala, and is sadly closed-source. But it really
shows the possibilities for us all to evaluate. We'll certainly be
watching it and people's interest in it as a way to search CKAN in a
deeper way.

In the meantime, I'd love to know if people have ideas in how this
could be done for CKAN.

David

On 7 March 2016 at 11:38, David Read <david.read at hackneyworkshop.com> wrote:
> I noticed the implementation of search *inside* of the PDF/XLS/CSV
> files listed in CKANs:
>
> http://www.epsiplatform.eu/content/searching-open-data-never-0
> http://dataopen.eu
>
> Since it's associated with Open Knowledge it would be great if the
> rest of the CKAN community can take advantage of the code - does
> anyone know? I've just sent an email to Vangelis Banos to ask more.
>
> I wonder if anyone else has attempted this sort of thing? I imagine
> it's a challenge to create such a big index. Our CKAN SOLR index is
> big enough already with just the metadata!
>
> David



More information about the ckan-dev mailing list