[ckan-dev] Search inside data files

David Read david.read at hackneyworkshop.com
Fri Mar 11 11:40:43 UTC 2016


It sounds like Matt's suggestion is the closest:
https://github.com/transparenzportalhamburg/ckanext-fulltext
I have reservations about storing the extracted text in postgres - I
don't see that scaling well for large sites. But we might well give it
a try.

Dave

On 11 March 2016 at 00:47, John Jediny - XAAB <john.jediny at gsa.gov> wrote:
> Stab in the dark but sounds like you'd have to use ckan archiver to cache a
> copy, use reclinedb or another multiformat parser and have that spit out to
> elasticsearch (or Solr..) to expose those as search terms... has anyone done
> something like this... Open Source of course?
>
> ...Propreitary plug-in vendors please focus on a different community please
>
> On Mar 10, 2016 5:10 PM, "Natalia Queiroz" <queiroz.nati at gmail.com> wrote:
>>
>> Anyone?
>>
>> On Wed, Mar 9, 2016 at 6:27 PM, Natalia Queiroz <queiroz.nati at gmail.com>
>> wrote:
>>>
>>> So, is Ckan able just to search for metadata values?
>>>
>>> How is that possible to search values inside a csv file, for example.
>>>
>>>  My search doesn't bring data these king of data, just the metadata. I'm
>>> using DataStore and DataPusher.
>>>
>>> Any idea?
>>>
>>> On Tue, Mar 8, 2016 at 5:04 AM, Matthew Fullerton
>>> <matt.fullerton at gmail.com> wrote:
>>>>
>>>> There is also the aomewhat simpler approach from Hamburg:
>>>>
>>>> https://github.com/transparenzportalhamburg/ckanext-fulltext
>>>>
>>>> -Matt
>>>>
>>>> On 7 Mar 2016 3:40 p.m., "David Read" <david.read at hackneyworkshop.com>
>>>> wrote:
>>>>>
>>>>> Vangelis kindly responded by mentioning the main technologies that
>>>>> dataopen.eu is based on:
>>>>>
>>>>> Apache Tika, MariaDB, Sphinx Search & Redis
>>>>>
>>>>> The back-end glue is Scala, and is sadly closed-source. But it really
>>>>> shows the possibilities for us all to evaluate. We'll certainly be
>>>>> watching it and people's interest in it as a way to search CKAN in a
>>>>> deeper way.
>>>>>
>>>>> In the meantime, I'd love to know if people have ideas in how this
>>>>> could be done for CKAN.
>>>>>
>>>>> David
>>>>>
>>>>> On 7 March 2016 at 11:38, David Read <david.read at hackneyworkshop.com>
>>>>> wrote:
>>>>> > I noticed the implementation of search *inside* of the PDF/XLS/CSV
>>>>> > files listed in CKANs:
>>>>> >
>>>>> > http://www.epsiplatform.eu/content/searching-open-data-never-0
>>>>> > http://dataopen.eu
>>>>> >
>>>>> > Since it's associated with Open Knowledge it would be great if the
>>>>> > rest of the CKAN community can take advantage of the code - does
>>>>> > anyone know? I've just sent an email to Vangelis Banos to ask more.
>>>>> >
>>>>> > I wonder if anyone else has attempted this sort of thing? I imagine
>>>>> > it's a challenge to create such a big index. Our CKAN SOLR index is
>>>>> > big enough already with just the metadata!
>>>>> >
>>>>> > David
>>>>> _______________________________________________
>>>>> ckan-dev mailing list
>>>>> ckan-dev at lists.okfn.org
>>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> ckan-dev mailing list
>>>> ckan-dev at lists.okfn.org
>>>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>>>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Natália Oliveira
>>
>>
>>
>>
>> --
>>
>>
>> Natália Oliveira
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>



More information about the ckan-dev mailing list