[ckan-dev] Search functions in CKAN; custom schema.

Friedrich Lindenberg friedrich.lindenberg at okfn.org
Wed May 30 08:26:48 UTC 2012


Hey David,

thanks for the quick reply!

On Tue, May 29, 2012 at 1:48 PM, David Raznick <kindly at gmail.com> wrote:
>> * Is global data search planned; when is it likely to land? Has anyone
>> played with indexing PDF/Word/... docs, e.g. via Tika?
>
> No not planned but interesting, and should probably be stored in the
> datastore.  This is because you can easily do full text search across
> all the tabular data already.  The datastorer could be pretty
> trivially extended to do this as long as the document parsers do not
> require to much work.  It would be very useful to have a core language
> metadata field/ posibably per resource if we were going to index
> textual documents.

Ok, but there is already a global index through ES by default, so it
would just be a matter of exposing that and building a custom
interface. The datastorer could probably send docs through solr (which
has Tika built in) before sending the extracted text content to ES.

>> * Can I have per-group or per-dataset schemata with custom vocabs and
>> have these enforced as validation when saving metadata, as well as
>> used to generate a custom form? e.g. I send people to
>> datahubio/datasets/new?schema=mymeta - this will ask for a couple of
>> extras and enforce they are part of an enumeration, then save that
>> association and use the form each time the dataset is edited.
>
> Yes, not particularly well tested, and got a reasonably large barrier
> to entry, but definitely there.  It is actually not done as a param
> but as as a top level entity i.e
> datahubio/my_dataset/new
> A custom extension can define what it can be called, its schema and its form.

Cool. Is there a version of this online somewhere that I could check
out and play with? How much work would productizing it be? Would it
also be possible to define a set of default roles for resources (e.g.
this is a planning document, this is a contract, this details
execution) and give a limited set of mime types (e.g. only doc, pdf
and excel - yuk).

Where do I find docs on editorial workflows atm?

Cheers,

 - Friedrich




More information about the ckan-dev mailing list