[ECODP-dev] Malformed rdf

Bert Van Nuffelen bert.van.nuffelen at tenforce.com
Tue Nov 12 08:07:38 UTC 2013


Hi John,


2013/11/11 John Glover <john.glover at okfn.org>

> Hi Bert,
>
> The last time this issue came up, the only way that we could recreate the
> problem was when making a request without setting the content type to
> 'application/json'. As you are now doing this, could you please send us an
> example of the JSON that you are sending to CKAN in a failing request so
> that we can investigate further.
>
>


> > When I do my tests, it seems that in case the error occurs the submitted
> RDF is not saved, and hence the RDF in CKAN is
> > that one of the CKAN RDF generation process. Is that true?
>
> Yes this is correct.
>
> > In that case the level of the message ERROR and the result by the API
> call return to it are not in sync. On an ERROR I
> > expect that the api call return an error code. If that was a warn I can
> understand that the api call returns success.
>
> We could of course return a validation error here (although this has not
> been requested before). I believe the thinking was that it was best to save
> the data and ignore this error as CKAN can generate RDF, but this logic
> could be changed.
>

I understand the reasoning, but it results in an incorrect end-result. In
this case it even has hidden a communication error between RDF2CKAN and
CKAN. Since the api provides a correct feedback mechanism I would like to
rely on that.






>
> Regards,
> John
>
>
> On 8 November 2013 11:47, Bert Van Nuffelen <
> bert.van.nuffelen at tenforce.com> wrote:
>
>> Hi,
>>
>> one addition, I run overnight with the same RDF2CKAN and the same ESTAT
>> package an upload on the release 09.00.0x version and there in the CKAN
>> logs the above message is not  present.
>>
>> But when I do the synchronisation with virtuoso then I get exceptions on
>> parsing the RDF files as they have been converted by CKAN.
>>
>> So it seems that
>>    * in release 09.00.0x the check on this issue was not present at all,
>>    * in release 01.00.0x the check is there, but is silently replaces it
>> with an empty value
>>
>> In both releases the accept-header is not bound to the desired execution.
>>
>> kind regards,
>>
>> Bert
>>
>>
>>
>>
>> 2013/11/7 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>
>>> Dear all,
>>>
>>> We have again this problem. This is related to the issue of ODP-294
>>>
>>> in the apache logs:
>>>
>>> [Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306 ERROR
>>> [ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column 103
>>>
>>> in the CKAN logs
>>>
>>> 2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil] EntityRef:
>>> expecting ';', line 39, column 102
>>>
>>> As this is silently in the logs instead of a _proper_ validation error
>>> like:
>>>
>>> 2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation error:
>>> '{\'keyword_string\': [u\'Tag "inter
>>> national trade,trade statistics" must be alphanumeric characters or
>>> symbols: -_.\'], \'__type\': \'Validati
>>> on Error\'}'
>>>
>>> rdf2ckan can only conclude it has inserted the data correctly while it
>>> is not.
>>> So the outcome of rdf2ckan is hence untrust whorthy. For that reason
>>> this issue remained unseen by TF development.
>>>
>>> Secondly I am concerned about the fact if the above configuration
>>> solution described by David is working.
>>> It seems that the setting of the header to "application/json" does not
>>> result in the correct handling by CKAN.
>>> Here under is the extract of the java code showing that we sent the
>>> content-type as requested.
>>>
>>> private static final String CONTENT_JSON = "application/json";
>>>
>>> public Map<String, Object> post(String json, String method) {
>>>         if (json.isEmpty() || method.isEmpty())
>>>             return null;
>>>         DefaultHttpClient httpClient = new DefaultHttpClient();
>>>
>>>         HttpPost httpPost = requestCredentials(method);
>>>         json = json.replaceAll(";", "");
>>>         httpPost.setEntity(streamify(json, CONTENT_JSON));
>>>        ...
>>>
>>>  public InputStreamEntity streamify(String json, String contentType) {
>>>         json = json.replaceAll(";", "");
>>>         InputStreamEntity inputStreamEntity = new InputStreamEntity(new
>>> ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
>>> json.getBytes().length);
>>>         inputStreamEntity.setContentType(contentType);
>>>         return inputStreamEntity;
>>>     }
>>>
>>>
>>> Note that the submitted RDF is all valid rdf and XML in the request
>>> body: it passes through all XML & RDF validations (e.g. rapper).
>>>
>>> When I do my tests, it seems that in case the error occurs the submitted
>>> RDF is not saved, and hence the RDF in CKAN is that one of the CKAN RDF
>>> generation process. Is that true? In that case the level of the message
>>> ERROR and the result by the API call return to it are not in sync. On an
>>> ERROR I expect that the api call return an error code. If that was a warn I
>>> can understand that the api call returns success.
>>>
>>> So the next actions should be taken:
>>>
>>> 1) make the above error message a proper validation result.
>>>     Note this solves also the problem of the meaningless error message.
>>> There is no reference at all in the message to know about which call/object
>>> it is. So the problem is unresolvable with this message. And putting CKAN
>>> in debug mode logging for that is not very sensible.
>>> 2) ensure that the handling of RDF (in rdfxml-format) inside the json is
>>> treated properly and not interpreted as an html encoding.
>>>
>>> best regards,
>>>
>>> Bert
>>>
>>>
>>>
>>>
>>> 2013/9/18 David Raznick <david.raznick at okfn.org>
>>>
>>>> Hello
>>>>
>>>> It looks like the error is because the wrong content type is sent when
>>>> posting to ckan in rdf2ckan.   Content-Type:application/json should be sent
>>>> otherwise CKAN will think it is urlencoded and therefore make the final
>>>> document encoded wrongly.
>>>>
>>>> This must have changed at somepoint, as the live db does not have this
>>>> issue and is likely what caused the unicode issues from happening too.
>>>>
>>>> Thanks
>>>>
>>>> David
>>>>
>>>>
>>>> On 18 September 2013 16:16, Dimitrios Mexis <
>>>> dimitrios.mexis at tenforce.com> wrote:
>>>>
>>>>>  Hello David,
>>>>>
>>>>> we don't do any postprocessing for the data we send.
>>>>>
>>>>> It's pure rdf content as read from the file. So, we depend on CKAN how
>>>>> it will digest it.
>>>>> If we need to do some postprocessing, can you give clarification on
>>>>> the matter ?
>>>>>
>>>>> Also, does that have to do with the problem we faced with unicode
>>>>> exceptions from CKAN as well ?
>>>>>
>>>>> Kind regards
>>>>> Dimitrios
>>>>>
>>>>>
>>>>> On 18/09/2013 17:12, David Raznick wrote:
>>>>>
>>>>>  Hello Bert
>>>>>
>>>>>
>>>>>  There appears to be a lot of malformed rdf documents in the test
>>>>> system e.g
>>>>>
>>>>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>>>>
>>>>>  After some investigation this seems to due to the data send by
>>>>> rdf2ckan.  It appears that xml "&" are not being escaped correctly in some
>>>>> places.
>>>>>
>>>>>  These errors do not appear on the live db as far as I can see, could
>>>>> you please look into this.
>>>>>
>>>>>  Thanks
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ecodp-dev mailing list
>>>>> Ecodp-dev at lists.okfn.org
>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Ecodp-dev mailing list
>>>> Ecodp-dev at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Bert Van Nuffelen
>>>
>>> Semantic Technologies Software Architect at TenForce
>>> www.tenforce.be
>>>
>>> Bert.Van.Nuffelen at tenforce.com
>>> Office: +32 (0)16 31 48 60
>>> Mobile:+32 479 06 24 26
>>> skype: bert.van.nuffelen
>>>
>>
>>
>>
>> --
>> Bert Van Nuffelen
>>
>> Semantic Technologies Software Architect at TenForce
>> www.tenforce.be
>>
>> Bert.Van.Nuffelen at tenforce.com
>> Office: +32 (0)16 31 48 60
>> Mobile:+32 479 06 24 26
>> skype: bert.van.nuffelen
>>
>> _______________________________________________
>> Ecodp-dev mailing list
>> Ecodp-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>
>>
>
> _______________________________________________
> Ecodp-dev mailing list
> Ecodp-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>
>


-- 
Bert Van Nuffelen

Semantic Technologies Software Architect at TenForce
www.tenforce.be

Bert.Van.Nuffelen at tenforce.com
Office: +32 (0)16 31 48 60
Mobile:+32 479 06 24 26
skype: bert.van.nuffelen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/9bf46189/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ESTAT_20131020230103_hrst_fl_tegrad.rdf.json
Type: application/json
Size: 8510 bytes
Desc: not available
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/9bf46189/attachment.json>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ESTAT_20131020230103_spr_exp_fsu.rdf.json
Type: application/json
Size: 8101 bytes
Desc: not available
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/9bf46189/attachment-0001.json>


More information about the ecodp-dev mailing list