[ckan-discuss] Wrong type mapping

Alexandre Gomes alegomes at gmail.com
Sun Jun 3 21:39:08 BST 2012


How can I force CKAN to map a CSV data column to numeric (float or double)
instead of string? I think this is the cause of the error below when trying
to use statistical facet on ElasticSearch [0].

{

  "error" : "SearchPhaseExecutionException[Failed to execute phase
[query], total failure; shardFailures
{[xyJEmo2iRhuhB75ZsN6kWQ][ckan-www.ckan.net][3]:
RemoteTransportException[[du Paris,
Bennet][inet[/193.34.146.144:9300]][search/phase/query]]; nested:
QueryPhaseExecutionException[[ckan-www.ckan.net][3]:
query[filtered(ConstantScore(*:*))->FilterCacheFilterWrapper(_type:3b961dcf-f2fc-4425-8c07-159a58557bc9)],from[0],size[10]:
Query Failed [Failed to execute main query]]; nested:
ClassCastException[org.elasticsearch.index.field.data.strings.SingleValueStringFieldData
cannot be cast to
org.elasticsearch.index.field.data.NumericFieldData];
}{[dHahbQPkR2SgYA8Mp5JbBQ][ckan-www.ckan.net][4]:
QueryPhaseExecutionException[[ckan-www.ckan.net][4]:
query[filtered(ConstantScore(*:*))->FilterCacheFilterWrapper(_type:3b961dcf-f2fc-4425-8c07-159a58557bc9)],from[0],size[10]:
Query Failed [Failed to execute main query]]; nested:
ClassCastException[org.elasticsearch.index.field.data.strings.SingleValueStringFieldData
cannot be cast to
org.elasticsearch.index.field.data.NumericFieldData]; }{[


First, I uploaded a new CSV resource [1] to the World Cup 2014 dataset [2]
and currency fields was mapped as strings [3]:

      "Investimento-Previsto-para-a-Etapa" : {
        "type" : "string"
      },
      (...)
      "Investimento-Contratado-para-a-Etapa" : {
        "type" : "string"
      },
      "Investimento-Executado-para-a-Etapa" : {
        "type" : "string"
      },


Then, imagining the use of comma as decimal separator (i.e. 84,55668315)
could be misleading CKAN in the type inference, I re-submitted the CSV file
as a new resource [4] fixing the numbering format (84,55668315
to 84.55668315), but the wrong type mapping persisted.

      "Investimento-Previsto-para-a-Etapa" : {
        "type" : "string"
      },
      (...)
      "Investimento-Contratado-para-a-Etapa" : {
        "type" : "string"
      },
      "Investimento-Executado-para-a-Etapa" : {
        "type" : "string"
      },


So, I tried to use de "Transform" option available at the data table column
action button [4], using the script bellow

function(doc) {
  doc['Investimento-Previsto-para-a-Etapa'] =
parseFloat(doc['Investimento-Previsto-para-a-Etapa']);
  return doc;
}

but, after a while waiting for the message "Updating all visible docs. This
could take a while..." to disappear, an alert message showed up saying
something like "We have only updated the docs in this view. Update of all
docs not yet implemented".

Ideas on how to make those three fields as numbers?

thanks

[0]
http://thedatahub.org/api/data/3b961dcf-f2fc-4425-8c07-159a58557bc9/_search?pretty=true&source={%22query%22:{%22match_all%22:{}},%22facets%22:{%22totais%22:{%22statistical%22:{%22field%22:%22Investimento-Previsto-para-a-Etapa%22}}}}
[1]
http://thedatahub.org/dataset/copa-2014/resource/075de5b0-19ba-45fb-bfaa-603a78c47d45
[2] http://thedatahub.org/dataset/copa-2014
[3]
http://thedatahub.org/api/data/075de5b0-19ba-45fb-bfaa-603a78c47d45/_mapping?pretty=true
[4]
http://thedatahub.org/dataset/copa-2014/resource/3b961dcf-f2fc-4425-8c07-159a58557bc9
[5]
http://thedatahub.org/api/data/3b961dcf-f2fc-4425-8c07-159a58557bc9/_mapping?pretty=true

http://www.elasticsearch.org/guide/reference/mapping/core-types.html


[]s!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20120603/771c3366/attachment.htm>


More information about the ckan-discuss mailing list