[ckan-discuss] Wrong type mapping

David Raznick kindly at gmail.com
Sun Jun 3 22:34:36 BST 2012


Hello,

I had a quick look at your file.

You where correct about the type inferences if there are ",".  You
seem to have missed out on removing a few though on around line 370 in
your file.

i.e http://thedatahub.org/api/data/3b961dcf-f2fc-4425-8c07-159a58557bc9/369

I removed the last few commas from the fiile and tried to upload it
again and is seems to have worked.

http://thedatahub.org/dataset/copa-2014/resource/ad90de8a-17c7-4576-a6a5-cc8c68b61f89
http://thedatahub.org/api/data/ad90de8a-17c7-4576-a6a5-cc8c68b61f89/_mapping

We have to be very strict about our inference or elastic search will
blow up on us.  We fall back to text whenever there is a little bit of
uncertainty.

Thanks

David


On Sun, Jun 3, 2012 at 9:39 PM, Alexandre Gomes <alegomes at gmail.com> wrote:
> How can I force CKAN to map a CSV data column to numeric (float or double)
> instead of string? I think this is the cause of the error below when trying
> to use statistical facet on ElasticSearch [0].
>
> {
>
>   "error" : "SearchPhaseExecutionException[Failed to execute phase [query],
> total failure; shardFailures
> {[xyJEmo2iRhuhB75ZsN6kWQ][ckan-www.ckan.net][3]:
> RemoteTransportException[[du Paris,
> Bennet][inet[/193.34.146.144:9300]][search/phase/query]]; nested:
> QueryPhaseExecutionException[[ckan-www.ckan.net][3]:
> query[filtered(ConstantScore(*:*))->FilterCacheFilterWrapper(_type:3b961dcf-f2fc-4425-8c07-159a58557bc9)],from[0],size[10]:
> Query Failed [Failed to execute main query]]; nested:
> ClassCastException[org.elasticsearch.index.field.data.strings.SingleValueStringFieldData
> cannot be cast to org.elasticsearch.index.field.data.NumericFieldData];
> }{[dHahbQPkR2SgYA8Mp5JbBQ][ckan-www.ckan.net][4]:
> QueryPhaseExecutionException[[ckan-www.ckan.net][4]:
> query[filtered(ConstantScore(*:*))->FilterCacheFilterWrapper(_type:3b961dcf-f2fc-4425-8c07-159a58557bc9)],from[0],size[10]:
> Query Failed [Failed to execute main query]]; nested:
> ClassCastException[org.elasticsearch.index.field.data.strings.SingleValueStringFieldData
> cannot be cast to org.elasticsearch.index.field.data.NumericFieldData]; }{[
>
>
> First, I uploaded a new CSV resource [1] to the World Cup 2014 dataset [2]
> and currency fields was mapped as strings [3]:
>
>       "Investimento-Previsto-para-a-Etapa" : {
>         "type" : "string"
>       },
>       (...)
>       "Investimento-Contratado-para-a-Etapa" : {
>         "type" : "string"
>       },
>       "Investimento-Executado-para-a-Etapa" : {
>         "type" : "string"
>       },
>
>
> Then, imagining the use of comma as decimal separator (i.e. 84,55668315)
> could be misleading CKAN in the type inference, I re-submitted the CSV file
> as a new resource [4] fixing the numbering format (84,55668315
> to 84.55668315), but the wrong type mapping persisted.
>
>       "Investimento-Previsto-para-a-Etapa" : {
>         "type" : "string"
>       },
>       (...)
>       "Investimento-Contratado-para-a-Etapa" : {
>         "type" : "string"
>       },
>       "Investimento-Executado-para-a-Etapa" : {
>         "type" : "string"
>       },
>
>
> So, I tried to use de "Transform" option available at the data table column
> action button [4], using the script bellow
>
> function(doc) {
>   doc['Investimento-Previsto-para-a-Etapa'] =
> parseFloat(doc['Investimento-Previsto-para-a-Etapa']);
>   return doc;
> }
>
> but, after a while waiting for the message "Updating all visible docs. This
> could take a while..." to disappear, an alert message showed up saying
> something like "We have only updated the docs in this view. Update of all
> docs not yet implemented".
>
> Ideas on how to make those three fields as numbers?
>
> thanks
>
> [0] http://thedatahub.org/api/data/3b961dcf-f2fc-4425-8c07-159a58557bc9/_search?pretty=true&source={%22query%22:{%22match_all%22:{}},%22facets%22:{%22totais%22:{%22statistical%22:{%22field%22:%22Investimento-Previsto-para-a-Etapa%22}}}}
> [1] http://thedatahub.org/dataset/copa-2014/resource/075de5b0-19ba-45fb-bfaa-603a78c47d45
> [2] http://thedatahub.org/dataset/copa-2014
> [3] http://thedatahub.org/api/data/075de5b0-19ba-45fb-bfaa-603a78c47d45/_mapping?pretty=true
> [4] http://thedatahub.org/dataset/copa-2014/resource/3b961dcf-f2fc-4425-8c07-159a58557bc9
> [5] http://thedatahub.org/api/data/3b961dcf-f2fc-4425-8c07-159a58557bc9/_mapping?pretty=true
>
> http://www.elasticsearch.org/guide/reference/mapping/core-types.html
>
>
> []s!
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>



More information about the ckan-discuss mailing list