[ckan-dev] Search functions in CKAN; custom schema.

Wed May 30 18:16:13 UTC 2012

I'd like to add some gentle support to Friedrich here. I was
interested in the possibilities of configuring metadata, but gave up
the idea when I discovered what it entailed.

I entirely see that it's good for an admin who has the time and
technical know-how to be able to configure the forms to suit their
metadata, but it shouldn't be necessary to do so. A simple default
would be to have an extra screen when editing a dataset, looking
exactly like the 'Extras' screen that already exists, but for custom
fields defined in a local schema file somehow. Even better, merged
with the 'Extras' screen: locally-defined extras at the top, then
blank ones at the bottom as at present for arbitrary field/value
pairs.

I don't think specifying the schema through the UI is so important,
nice though it would undoubtedly be.

Mark

On 30 May 2012 15:07, David Raznick <kindly at gmail.com> wrote:
> On Wed, May 30, 2012 at 2:47 PM, Friedrich Lindenberg
> <friedrich.lindenberg at okfn.org> wrote:
>> On Wed, May 30, 2012 at 3:26 PM, David Raznick <kindly at gmail.com> wrote:
>>> On Wed, May 30, 2012 at 2:16 PM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>>>> On 30 May 2012 11:52, David Raznick <kindly at gmail.com> wrote:
>>>>> On Wed, May 30, 2012 at 9:54 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>>>>>> On 29 May 2012 12:48, David Raznick <kindly at gmail.com> wrote:
>>>>>>> On Tue, May 29, 2012 at 11:25 AM, Friedrich Lindenberg
>>>>>>> <friedrich.lindenberg at okfn.org> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I just wanted to query a few points regarding CKAN that I may not be
>>>>>>>> up to date on, since we're discussing using CKAN with a partner
>>>>>>>> organization and I want to be clear on the features.
>>>>>>>>
>>>>>>>> * Is global data search planned; when is it likely to land? Has anyone
>>>>>>>> played with indexing PDF/Word/... docs, e.g. via Tika?
>>>>>>>
>>>>>>> No not planned but interesting, and should probably be stored in the
>>>>>>> datastore.  This is because you can easily do full text search across
>>>>>>
>>>>>> I should correct this in the sense that I've been thinking about this
>>>>>> for a while and even thought of putting it directly into the original
>>>>>> datastore implementation (this is really easy to do with ES).
>>>>>>
>>>>>> The main issue is performance -- but could be fixed by judicious
>>>>>> timeouts? Also one could have a simple switch for this so that people
>>>>>> who want this in their install can just enable.
>>>>>
>>>>> I do not think performance will be in an issue unless people are
>>>>> adding 100s of datasets.  We should have the parsed documents on a
>>>>> different index then the tabular data though. (you can still search
>>>>> across indexs)
>>>>
>>>> This point was about *querying* across multiple datasets/resources not
>>>> about loading.
>>>
>>> My point about having it on a separate index covered that.  There
>>> should be *far* less data in all the documents then in say the whole
>>> of uk spending data (which I would hope to have in the datastore one
>>> day!)
>>
>> Not sure about that, actually. This is 150 * 8 * rand(100-3000) pages
>> of content - both to be indexed and stored as is (some of it is
>> probably scanned).
>
> I was thinking of just the textual data, stored within the docs and
> excluding excel files which should be in the datastore anyway.
>
> But I admit that *far* less was a little over zealous :)
>
> Nonetheless, having them on separate indexs is fine.  We should
> probably start some kind of sharding based on resource id to spread
> all the data across indexes if this is a bit issue.
>
>
>>
>>>>
>>>> [...]
>>>>
>>>>>> Where are the docs for this BTW? IIRC there was some good work done
>>>>>> here earlier in the year.
>>>>>
>>>>> http://readthedocs.org/docs/ckan/en/latest/forms.html
>>>>
>>>> thanks. (BTW why not use http://docs.ckan.org/en/latest/forms.html ?)
>>>>
>>>> One suggestion having read these a bit is that links in specific
>>>> sections to relevant portion of the ckanext-example would be really
>>>> useful but could break quite a bit (perhaps just to relevant files).
>>>> But I understand the so many things, so little time aspect of things.
>>
>> Sorry to be bitching here, but this is pretty hardcore: I need to boot
>> up CKAN, load vocab fixtures, write a plugin, install that, re-write
>> some of the templates which change with every release and install a
>> second version of them - just in order to get an enum into a form? Is
>> there any movement towards just having some model file (JSON, XML,
>> even ... RDF) in which I could describe my metadata schema and you'd
>> make a form for me?
>
> We have thought about this a lot before and we will slowly get there,
> but it has not been our main use case.  Most customizations are just
> too unique and standard generated forms do not cut it.
>
> Our focus has been on making sure things do not break when upgrading
> which is a more important concern. Then maybe we can do some nice to
> haves like this.  This would most likely be added to the ui as well
> though just not a json schema.
>
>
>>
>> Cheers, Fr.
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev

-- 
Mark Wainwright, CKAN Community Co-ordinator
Open Knowledge Foundation http://okfn.org/
CKAN on Twitter: @CKANproject