[ckan-dev] Search functions in CKAN; custom schema.

Wed May 30 14:07:33 UTC 2012

On Wed, May 30, 2012 at 2:47 PM, Friedrich Lindenberg
<friedrich.lindenberg at okfn.org> wrote:
> On Wed, May 30, 2012 at 3:26 PM, David Raznick <kindly at gmail.com> wrote:
>> On Wed, May 30, 2012 at 2:16 PM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>>> On 30 May 2012 11:52, David Raznick <kindly at gmail.com> wrote:
>>>> On Wed, May 30, 2012 at 9:54 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>>>>> On 29 May 2012 12:48, David Raznick <kindly at gmail.com> wrote:
>>>>>> On Tue, May 29, 2012 at 11:25 AM, Friedrich Lindenberg
>>>>>> <friedrich.lindenberg at okfn.org> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I just wanted to query a few points regarding CKAN that I may not be
>>>>>>> up to date on, since we're discussing using CKAN with a partner
>>>>>>> organization and I want to be clear on the features.
>>>>>>>
>>>>>>> * Is global data search planned; when is it likely to land? Has anyone
>>>>>>> played with indexing PDF/Word/... docs, e.g. via Tika?
>>>>>>
>>>>>> No not planned but interesting, and should probably be stored in the
>>>>>> datastore.  This is because you can easily do full text search across
>>>>>
>>>>> I should correct this in the sense that I've been thinking about this
>>>>> for a while and even thought of putting it directly into the original
>>>>> datastore implementation (this is really easy to do with ES).
>>>>>
>>>>> The main issue is performance -- but could be fixed by judicious
>>>>> timeouts? Also one could have a simple switch for this so that people
>>>>> who want this in their install can just enable.
>>>>
>>>> I do not think performance will be in an issue unless people are
>>>> adding 100s of datasets.  We should have the parsed documents on a
>>>> different index then the tabular data though. (you can still search
>>>> across indexs)
>>>
>>> This point was about *querying* across multiple datasets/resources not
>>> about loading.
>>
>> My point about having it on a separate index covered that.  There
>> should be *far* less data in all the documents then in say the whole
>> of uk spending data (which I would hope to have in the datastore one
>> day!)
>
> Not sure about that, actually. This is 150 * 8 * rand(100-3000) pages
> of content - both to be indexed and stored as is (some of it is
> probably scanned).

I was thinking of just the textual data, stored within the docs and
excluding excel files which should be in the datastore anyway.

But I admit that *far* less was a little over zealous :)

Nonetheless, having them on separate indexs is fine.  We should
probably start some kind of sharding based on resource id to spread
all the data across indexes if this is a bit issue.

>
>>>
>>> [...]
>>>
>>>>> Where are the docs for this BTW? IIRC there was some good work done
>>>>> here earlier in the year.
>>>>
>>>> http://readthedocs.org/docs/ckan/en/latest/forms.html
>>>
>>> thanks. (BTW why not use http://docs.ckan.org/en/latest/forms.html ?)
>>>
>>> One suggestion having read these a bit is that links in specific
>>> sections to relevant portion of the ckanext-example would be really
>>> useful but could break quite a bit (perhaps just to relevant files).
>>> But I understand the so many things, so little time aspect of things.
>
> Sorry to be bitching here, but this is pretty hardcore: I need to boot
> up CKAN, load vocab fixtures, write a plugin, install that, re-write
> some of the templates which change with every release and install a
> second version of them - just in order to get an enum into a form? Is
> there any movement towards just having some model file (JSON, XML,
> even ... RDF) in which I could describe my metadata schema and you'd
> make a form for me?

We have thought about this a lot before and we will slowly get there,
but it has not been our main use case.  Most customizations are just
too unique and standard generated forms do not cut it.

Our focus has been on making sure things do not break when upgrading
which is a more important concern. Then maybe we can do some nice to
haves like this.  This would most likely be added to the ui as well
though just not a json schema.

>
> Cheers, Fr.
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev