[ECODP-dev] Malformed rdf

John Glover john.glover at okfn.org
Tue Nov 12 08:44:29 UTC 2013


Hi Bert,

Thanks, I will have a look at that JSON file and get back to you.

Regards,
John


On 12 November 2013 09:08, Bert Van Nuffelen <bert.van.nuffelen at tenforce.com
> wrote:

> Hi John,
>
> I forgot to add the message that I attached example json's  to the
> previous respons.
>
> Bert
>
>
> 2013/11/12 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>
>> Hi John,
>>
>>
>> 2013/11/11 John Glover <john.glover at okfn.org>
>>
>>> Hi Bert,
>>>
>>> The last time this issue came up, the only way that we could recreate
>>> the problem was when making a request without setting the content type to
>>> 'application/json'. As you are now doing this, could you please send us an
>>> example of the JSON that you are sending to CKAN in a failing request so
>>> that we can investigate further.
>>>
>>>
>>
>>
>>> > When I do my tests, it seems that in case the error occurs the
>>> submitted RDF is not saved, and hence the RDF in CKAN is
>>> > that one of the CKAN RDF generation process. Is that true?
>>>
>>> Yes this is correct.
>>>
>>> > In that case the level of the message ERROR and the result by the API
>>> call return to it are not in sync. On an ERROR I
>>> > expect that the api call return an error code. If that was a warn I
>>> can understand that the api call returns success.
>>>
>>> We could of course return a validation error here (although this has not
>>> been requested before). I believe the thinking was that it was best to save
>>> the data and ignore this error as CKAN can generate RDF, but this logic
>>> could be changed.
>>>
>>
>> I understand the reasoning, but it results in an incorrect end-result. In
>> this case it even has hidden a communication error between RDF2CKAN and
>> CKAN. Since the api provides a correct feedback mechanism I would like to
>> rely on that.
>>
>>
>>
>>
>>
>>
>>>
>>> Regards,
>>> John
>>>
>>>
>>> On 8 November 2013 11:47, Bert Van Nuffelen <
>>> bert.van.nuffelen at tenforce.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> one addition, I run overnight with the same RDF2CKAN and the same ESTAT
>>>> package an upload on the release 09.00.0x version and there in the CKAN
>>>> logs the above message is not  present.
>>>>
>>>> But when I do the synchronisation with virtuoso then I get exceptions
>>>> on parsing the RDF files as they have been converted by CKAN.
>>>>
>>>> So it seems that
>>>>    * in release 09.00.0x the check on this issue was not present at all,
>>>>    * in release 01.00.0x the check is there, but is silently replaces
>>>> it with an empty value
>>>>
>>>> In both releases the accept-header is not bound to the desired
>>>> execution.
>>>>
>>>> kind regards,
>>>>
>>>> Bert
>>>>
>>>>
>>>>
>>>>
>>>> 2013/11/7 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>>
>>>>> Dear all,
>>>>>
>>>>> We have again this problem. This is related to the issue of ODP-294
>>>>>
>>>>> in the apache logs:
>>>>>
>>>>> [Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306 ERROR
>>>>> [ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column 103
>>>>>
>>>>> in the CKAN logs
>>>>>
>>>>> 2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil] EntityRef:
>>>>> expecting ';', line 39, column 102
>>>>>
>>>>> As this is silently in the logs instead of a _proper_ validation error
>>>>> like:
>>>>>
>>>>> 2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation error:
>>>>> '{\'keyword_string\': [u\'Tag "inter
>>>>> national trade,trade statistics" must be alphanumeric characters or
>>>>> symbols: -_.\'], \'__type\': \'Validati
>>>>> on Error\'}'
>>>>>
>>>>> rdf2ckan can only conclude it has inserted the data correctly while it
>>>>> is not.
>>>>> So the outcome of rdf2ckan is hence untrust whorthy. For that reason
>>>>> this issue remained unseen by TF development.
>>>>>
>>>>> Secondly I am concerned about the fact if the above configuration
>>>>> solution described by David is working.
>>>>> It seems that the setting of the header to "application/json" does not
>>>>> result in the correct handling by CKAN.
>>>>> Here under is the extract of the java code showing that we sent the
>>>>> content-type as requested.
>>>>>
>>>>> private static final String CONTENT_JSON = "application/json";
>>>>>
>>>>> public Map<String, Object> post(String json, String method) {
>>>>>         if (json.isEmpty() || method.isEmpty())
>>>>>             return null;
>>>>>         DefaultHttpClient httpClient = new DefaultHttpClient();
>>>>>
>>>>>         HttpPost httpPost = requestCredentials(method);
>>>>>         json = json.replaceAll(";", "");
>>>>>         httpPost.setEntity(streamify(json, CONTENT_JSON));
>>>>>        ...
>>>>>
>>>>>  public InputStreamEntity streamify(String json, String contentType) {
>>>>>         json = json.replaceAll(";", "");
>>>>>         InputStreamEntity inputStreamEntity = new
>>>>> InputStreamEntity(new
>>>>> ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
>>>>> json.getBytes().length);
>>>>>         inputStreamEntity.setContentType(contentType);
>>>>>         return inputStreamEntity;
>>>>>     }
>>>>>
>>>>>
>>>>> Note that the submitted RDF is all valid rdf and XML in the request
>>>>> body: it passes through all XML & RDF validations (e.g. rapper).
>>>>>
>>>>> When I do my tests, it seems that in case the error occurs the
>>>>> submitted RDF is not saved, and hence the RDF in CKAN is that one of the
>>>>> CKAN RDF generation process. Is that true? In that case the level of the
>>>>> message ERROR and the result by the API call return to it are not in sync.
>>>>> On an ERROR I expect that the api call return an error code. If that was a
>>>>> warn I can understand that the api call returns success.
>>>>>
>>>>> So the next actions should be taken:
>>>>>
>>>>> 1) make the above error message a proper validation result.
>>>>>     Note this solves also the problem of the meaningless error
>>>>> message. There is no reference at all in the message to know about which
>>>>> call/object it is. So the problem is unresolvable with this message. And
>>>>> putting CKAN in debug mode logging for that is not very sensible.
>>>>> 2) ensure that the handling of RDF (in rdfxml-format) inside the json
>>>>> is treated properly and not interpreted as an html encoding.
>>>>>
>>>>> best regards,
>>>>>
>>>>> Bert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/9/18 David Raznick <david.raznick at okfn.org>
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> It looks like the error is because the wrong content type is sent
>>>>>> when posting to ckan in rdf2ckan.   Content-Type:application/json should be
>>>>>> sent otherwise CKAN will think it is urlencoded and therefore make the
>>>>>> final document encoded wrongly.
>>>>>>
>>>>>> This must have changed at somepoint, as the live db does not have
>>>>>> this issue and is likely what caused the unicode issues from happening too.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On 18 September 2013 16:16, Dimitrios Mexis <
>>>>>> dimitrios.mexis at tenforce.com> wrote:
>>>>>>
>>>>>>>  Hello David,
>>>>>>>
>>>>>>> we don't do any postprocessing for the data we send.
>>>>>>>
>>>>>>> It's pure rdf content as read from the file. So, we depend on CKAN
>>>>>>> how it will digest it.
>>>>>>> If we need to do some postprocessing, can you give clarification on
>>>>>>> the matter ?
>>>>>>>
>>>>>>> Also, does that have to do with the problem we faced with unicode
>>>>>>> exceptions from CKAN as well ?
>>>>>>>
>>>>>>> Kind regards
>>>>>>> Dimitrios
>>>>>>>
>>>>>>>
>>>>>>> On 18/09/2013 17:12, David Raznick wrote:
>>>>>>>
>>>>>>>  Hello Bert
>>>>>>>
>>>>>>>
>>>>>>>  There appears to be a lot of malformed rdf documents in the test
>>>>>>> system e.g
>>>>>>>
>>>>>>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>>>>>>
>>>>>>>  After some investigation this seems to due to the data send by
>>>>>>> rdf2ckan.  It appears that xml "&" are not being escaped correctly in some
>>>>>>> places.
>>>>>>>
>>>>>>>  These errors do not appear on the live db as far as I can see,
>>>>>>> could you please look into this.
>>>>>>>
>>>>>>>  Thanks
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ecodp-dev mailing list
>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ecodp-dev mailing list
>>>>>> Ecodp-dev at lists.okfn.org
>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bert Van Nuffelen
>>>>>
>>>>> Semantic Technologies Software Architect at TenForce
>>>>> www.tenforce.be
>>>>>
>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>> Office: +32 (0)16 31 48 60
>>>>> Mobile:+32 479 06 24 26
>>>>> skype: bert.van.nuffelen
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bert Van Nuffelen
>>>>
>>>> Semantic Technologies Software Architect at TenForce
>>>> www.tenforce.be
>>>>
>>>> Bert.Van.Nuffelen at tenforce.com
>>>> Office: +32 (0)16 31 48 60
>>>> Mobile:+32 479 06 24 26
>>>> skype: bert.van.nuffelen
>>>>
>>>> _______________________________________________
>>>> Ecodp-dev mailing list
>>>> Ecodp-dev at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Ecodp-dev mailing list
>>> Ecodp-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>
>>>
>>
>>
>> --
>> Bert Van Nuffelen
>>
>> Semantic Technologies Software Architect at TenForce
>> www.tenforce.be
>>
>> Bert.Van.Nuffelen at tenforce.com
>> Office: +32 (0)16 31 48 60
>> Mobile:+32 479 06 24 26
>> skype: bert.van.nuffelen
>>
>
>
>
> --
> Bert Van Nuffelen
>
> Semantic Technologies Software Architect at TenForce
> www.tenforce.be
>
> Bert.Van.Nuffelen at tenforce.com
> Office: +32 (0)16 31 48 60
> Mobile:+32 479 06 24 26
> skype: bert.van.nuffelen
>
> _______________________________________________
> Ecodp-dev mailing list
> Ecodp-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/808aca72/attachment.html>


More information about the ecodp-dev mailing list