[ckan-dev] Date string errors with schema-2.0.xml

David Raznick david.raznick at okfn.org
Wed Dec 12 14:29:39 UTC 2012


Hello,

Maybe we need to remove that like from the solr config as it clearly breaks
you.  We were trying to find a way to be able to add a date field in
without the need to specifically modify the schema.

The other option is to cast anything ending in _date to a datetime at the
end of the core indexing code.  Would this fix be good enough?  We would
ignore dates that we could not parse, but that would mean that only that
field would not get indexed not the whole dataset.

Thanks

David




On Tue, Dec 11, 2012 at 5:24 PM, Sean Hammond <sean.hammond at okfn.org> wrote:

> Hey all,
>
> This is a little urgent as it's a blocker for deploying the new
> publicdata.eu, which I need to do next week.
>
> With publicdata.eu's datasets I'm getting this error from solr, when
> trying to rebuild the solr index after moving to the new schema-2.0.xml:
>
> ERROR [ckan.lib.search] Error while indexing dataset ...: HTTP code=400,
> reason=Invalid Date String:'2010-07-27T09:23:00'
> ERROR [ckan.lib.search] Traceback (most recent call last):
>   File "/usr/lib/ckan/pdeu_staging/src/ckan/ckan/lib/search/__init__.py",
> line 185, in rebuild
>     defer_commit
>   File "/usr/lib/ckan/pdeu_staging/src/ckan/ckan/lib/search/index.py",
> line 97, in update_dict
>     self.index_package(pkg_dict, defer_commit)
>   File "/usr/lib/ckan/pdeu_staging/src/ckan/ckan/lib/search/index.py",
> line 232, in index_package
>     raise SearchIndexError(e)
> SearchIndexError: HTTP code=400, reason=Invalid Date
> String:'2010-07-27T09:23:00'
>
> The invalid date strings seem to come from a couple of custom extra
> fields deposit_date and update_date that publicdata.eu uses. Here's an
> example of an affected dataset:
>
>
> http://pdeu.staging.ckanhosted.com/dataset/5a14ec71168ce0b15c0e9cece3865e308e28e32b
>
> This prevents the search index from being rebuilt unless you pass the -i
> (ignore exceptions) argument to the search-index rebuild command, then
> it does rebuild it but the affected datasets don't get indexed and don't
> show up in search results. Affects a lot of datasets on publicdata.eu.
>
> I _could_ do a hack on the publicdata.eu database and convert all these
> deposit_date and update_date fields to the date format that the new solr
> schema expects. But then I think when the harvesters are run again,
> they'll add more of these date fields to datasets and the solr error
> will reappear. So I'd have to fix the harvesters as well. And there may
> be other people trying to upgrade other CKAN sites who run into the same
> problem with other custom fields.
>
> So I think the better option may be to fix the solr schema.
>
> It looks like this line from schema-2.0.xml is the culprit:
>
>     <dynamicField name="*_date" type="date" indexed="true" stored="true"
>                   multiValued="false"/>
>
> But I'm not sure what this line is for or what the correct fix is.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20121212/963f6177/attachment-0001.html>


More information about the ckan-dev mailing list