[ckan-dev] Search functions in CKAN; custom schema.

Wed May 30 13:47:38 UTC 2012

On Wed, May 30, 2012 at 3:26 PM, David Raznick <kindly at gmail.com> wrote:
> On Wed, May 30, 2012 at 2:16 PM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>> On 30 May 2012 11:52, David Raznick <kindly at gmail.com> wrote:
>>> On Wed, May 30, 2012 at 9:54 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>>>> On 29 May 2012 12:48, David Raznick <kindly at gmail.com> wrote:
>>>>> On Tue, May 29, 2012 at 11:25 AM, Friedrich Lindenberg
>>>>> <friedrich.lindenberg at okfn.org> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I just wanted to query a few points regarding CKAN that I may not be
>>>>>> up to date on, since we're discussing using CKAN with a partner
>>>>>> organization and I want to be clear on the features.
>>>>>>
>>>>>> * Is global data search planned; when is it likely to land? Has anyone
>>>>>> played with indexing PDF/Word/... docs, e.g. via Tika?
>>>>>
>>>>> No not planned but interesting, and should probably be stored in the
>>>>> datastore.  This is because you can easily do full text search across
>>>>
>>>> I should correct this in the sense that I've been thinking about this
>>>> for a while and even thought of putting it directly into the original
>>>> datastore implementation (this is really easy to do with ES).
>>>>
>>>> The main issue is performance -- but could be fixed by judicious
>>>> timeouts? Also one could have a simple switch for this so that people
>>>> who want this in their install can just enable.
>>>
>>> I do not think performance will be in an issue unless people are
>>> adding 100s of datasets.  We should have the parsed documents on a
>>> different index then the tabular data though. (you can still search
>>> across indexs)
>>
>> This point was about *querying* across multiple datasets/resources not
>> about loading.
>
> My point about having it on a separate index covered that.  There
> should be *far* less data in all the documents then in say the whole
> of uk spending data (which I would hope to have in the datastore one
> day!)

Not sure about that, actually. This is 150 * 8 * rand(100-3000) pages
of content - both to be indexed and stored as is (some of it is
probably scanned).

>>
>> [...]
>>
>>>> Where are the docs for this BTW? IIRC there was some good work done
>>>> here earlier in the year.
>>>
>>> http://readthedocs.org/docs/ckan/en/latest/forms.html
>>
>> thanks. (BTW why not use http://docs.ckan.org/en/latest/forms.html ?)
>>
>> One suggestion having read these a bit is that links in specific
>> sections to relevant portion of the ckanext-example would be really
>> useful but could break quite a bit (perhaps just to relevant files).
>> But I understand the so many things, so little time aspect of things.

Sorry to be bitching here, but this is pretty hardcore: I need to boot
up CKAN, load vocab fixtures, write a plugin, install that, re-write
some of the templates which change with every release and install a
second version of them - just in order to get an enum into a form? Is
there any movement towards just having some model file (JSON, XML,
even ... RDF) in which I could describe my metadata schema and you'd
make a form for me?

Cheers, Fr.