[ECODP-dev] Malformed rdf

Bert Van Nuffelen bert.van.nuffelen at tenforce.com
Thu Nov 7 22:51:35 UTC 2013


Dear all,

We have again this problem. This is related to the issue of ODP-294

in the apache logs:

[Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306 ERROR
[ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column 103

in the CKAN logs

2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil] EntityRef:
expecting ';', line 39, column 102

As this is silently in the logs instead of a _proper_ validation error like:

2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation error:
'{\'keyword_string\': [u\'Tag "inter
national trade,trade statistics" must be alphanumeric characters or
symbols: -_.\'], \'__type\': \'Validati
on Error\'}'

rdf2ckan can only conclude it has inserted the data correctly while it is
not.
So the outcome of rdf2ckan is hence untrust whorthy. For that reason this
issue remained unseen by TF development.

Secondly I am concerned about the fact if the above configuration solution
described by David is working.
It seems that the setting of the header to "application/json" does not
result in the correct handling by CKAN.
Here under is the extract of the java code showing that we sent the
content-type as requested.

private static final String CONTENT_JSON = "application/json";

public Map<String, Object> post(String json, String method) {
        if (json.isEmpty() || method.isEmpty())
            return null;
        DefaultHttpClient httpClient = new DefaultHttpClient();

        HttpPost httpPost = requestCredentials(method);
        json = json.replaceAll(";", "");
        httpPost.setEntity(streamify(json, CONTENT_JSON));
       ...

 public InputStreamEntity streamify(String json, String contentType) {
        json = json.replaceAll(";", "");
        InputStreamEntity inputStreamEntity = new InputStreamEntity(new
ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
json.getBytes().length);
        inputStreamEntity.setContentType(contentType);
        return inputStreamEntity;
    }


Note that the submitted RDF is all valid rdf and XML in the request body:
it passes through all XML & RDF validations (e.g. rapper).

When I do my tests, it seems that in case the error occurs the submitted
RDF is not saved, and hence the RDF in CKAN is that one of the CKAN RDF
generation process. Is that true? In that case the level of the message
ERROR and the result by the API call return to it are not in sync. On an
ERROR I expect that the api call return an error code. If that was a warn I
can understand that the api call returns success.

So the next actions should be taken:

1) make the above error message a proper validation result.
    Note this solves also the problem of the meaningless error message.
There is no reference at all in the message to know about which call/object
it is. So the problem is unresolvable with this message. And putting CKAN
in debug mode logging for that is not very sensible.
2) ensure that the handling of RDF (in rdfxml-format) inside the json is
treated properly and not interpreted as an html encoding.

best regards,

Bert




2013/9/18 David Raznick <david.raznick at okfn.org>

> Hello
>
> It looks like the error is because the wrong content type is sent when
> posting to ckan in rdf2ckan.   Content-Type:application/json should be sent
> otherwise CKAN will think it is urlencoded and therefore make the final
> document encoded wrongly.
>
> This must have changed at somepoint, as the live db does not have this
> issue and is likely what caused the unicode issues from happening too.
>
> Thanks
>
> David
>
>
> On 18 September 2013 16:16, Dimitrios Mexis <dimitrios.mexis at tenforce.com>wrote:
>
>>  Hello David,
>>
>> we don't do any postprocessing for the data we send.
>>
>> It's pure rdf content as read from the file. So, we depend on CKAN how it
>> will digest it.
>> If we need to do some postprocessing, can you give clarification on the
>> matter ?
>>
>> Also, does that have to do with the problem we faced with unicode
>> exceptions from CKAN as well ?
>>
>> Kind regards
>> Dimitrios
>>
>>
>> On 18/09/2013 17:12, David Raznick wrote:
>>
>>  Hello Bert
>>
>>
>>  There appears to be a lot of malformed rdf documents in the test system
>> e.g
>>
>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>
>>  After some investigation this seems to due to the data send by
>> rdf2ckan.  It appears that xml "&" are not being escaped correctly in some
>> places.
>>
>>  These errors do not appear on the live db as far as I can see, could
>> you please look into this.
>>
>>  Thanks
>>
>> David
>>
>>
>> _______________________________________________
>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>
>>
>>
>> _______________________________________________
>> Ecodp-dev mailing list
>> Ecodp-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>
>>
>
> _______________________________________________
> Ecodp-dev mailing list
> Ecodp-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>
>


-- 
Bert Van Nuffelen

Semantic Technologies Software Architect at TenForce
www.tenforce.be

Bert.Van.Nuffelen at tenforce.com
Office: +32 (0)16 31 48 60
Mobile:+32 479 06 24 26
skype: bert.van.nuffelen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131107/c51ba212/attachment.html>


More information about the ecodp-dev mailing list