[ckan-dev] Date string errors with schema-2.0.xml

Sean Hammond sean.hammond at okfn.org
Tue Dec 11 17:24:51 UTC 2012


Hey all,

This is a little urgent as it's a blocker for deploying the new
publicdata.eu, which I need to do next week.

With publicdata.eu's datasets I'm getting this error from solr, when
trying to rebuild the solr index after moving to the new schema-2.0.xml:

ERROR [ckan.lib.search] Error while indexing dataset ...: HTTP code=400, reason=Invalid Date String:'2010-07-27T09:23:00'
ERROR [ckan.lib.search] Traceback (most recent call last):
  File "/usr/lib/ckan/pdeu_staging/src/ckan/ckan/lib/search/__init__.py", line 185, in rebuild
    defer_commit
  File "/usr/lib/ckan/pdeu_staging/src/ckan/ckan/lib/search/index.py", line 97, in update_dict
    self.index_package(pkg_dict, defer_commit)
  File "/usr/lib/ckan/pdeu_staging/src/ckan/ckan/lib/search/index.py", line 232, in index_package
    raise SearchIndexError(e)
SearchIndexError: HTTP code=400, reason=Invalid Date String:'2010-07-27T09:23:00'

The invalid date strings seem to come from a couple of custom extra
fields deposit_date and update_date that publicdata.eu uses. Here's an
example of an affected dataset:

http://pdeu.staging.ckanhosted.com/dataset/5a14ec71168ce0b15c0e9cece3865e308e28e32b

This prevents the search index from being rebuilt unless you pass the -i
(ignore exceptions) argument to the search-index rebuild command, then
it does rebuild it but the affected datasets don't get indexed and don't
show up in search results. Affects a lot of datasets on publicdata.eu.

I _could_ do a hack on the publicdata.eu database and convert all these
deposit_date and update_date fields to the date format that the new solr
schema expects. But then I think when the harvesters are run again,
they'll add more of these date fields to datasets and the solr error
will reappear. So I'd have to fix the harvesters as well. And there may
be other people trying to upgrade other CKAN sites who run into the same
problem with other custom fields.

So I think the better option may be to fix the solr schema.

It looks like this line from schema-2.0.xml is the culprit:

    <dynamicField name="*_date" type="date" indexed="true" stored="true"
                  multiValued="false"/>

But I'm not sure what this line is for or what the correct fix is.




More information about the ckan-dev mailing list