[ECODP-dev] Malformed rdf

John Glover john.glover at okfn.org
Tue Nov 12 10:29:53 UTC 2013


Hi Bert,

I am unable to reproduce your error here. Both of those JSON files are
imported fine for me using a simple Python client to make an API request
(release 01), neither produce the error. Could you therefore please double
check that your API request is correct, and let me know when the test
server is back to release 01 so that we can test it again there.

I have pushed a change to ckanext-ecportal so it will now raise a
ValidationError if it cannot parse the RDF in the dataset JSON as requested.

Regards,
John


On 12 November 2013 09:44, John Glover <john.glover at okfn.org> wrote:

> Hi Bert,
>
> Thanks, I will have a look at that JSON file and get back to you.
>
> Regards,
> John
>
>
> On 12 November 2013 09:08, Bert Van Nuffelen <
> bert.van.nuffelen at tenforce.com> wrote:
>
>> Hi John,
>>
>> I forgot to add the message that I attached example json's  to the
>> previous respons.
>>
>> Bert
>>
>>
>> 2013/11/12 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>
>>> Hi John,
>>>
>>>
>>> 2013/11/11 John Glover <john.glover at okfn.org>
>>>
>>>> Hi Bert,
>>>>
>>>> The last time this issue came up, the only way that we could recreate
>>>> the problem was when making a request without setting the content type to
>>>> 'application/json'. As you are now doing this, could you please send us an
>>>> example of the JSON that you are sending to CKAN in a failing request so
>>>> that we can investigate further.
>>>>
>>>>
>>>
>>>
>>>> > When I do my tests, it seems that in case the error occurs the
>>>> submitted RDF is not saved, and hence the RDF in CKAN is
>>>> > that one of the CKAN RDF generation process. Is that true?
>>>>
>>>> Yes this is correct.
>>>>
>>>> > In that case the level of the message ERROR and the result by the API
>>>> call return to it are not in sync. On an ERROR I
>>>> > expect that the api call return an error code. If that was a warn I
>>>> can understand that the api call returns success.
>>>>
>>>> We could of course return a validation error here (although this has
>>>> not been requested before). I believe the thinking was that it was best to
>>>> save the data and ignore this error as CKAN can generate RDF, but this
>>>> logic could be changed.
>>>>
>>>
>>> I understand the reasoning, but it results in an incorrect end-result.
>>> In this case it even has hidden a communication error between RDF2CKAN and
>>> CKAN. Since the api provides a correct feedback mechanism I would like to
>>> rely on that.
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> Regards,
>>>> John
>>>>
>>>>
>>>> On 8 November 2013 11:47, Bert Van Nuffelen <
>>>> bert.van.nuffelen at tenforce.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> one addition, I run overnight with the same RDF2CKAN and the same
>>>>> ESTAT package an upload on the release 09.00.0x version and there in the
>>>>> CKAN logs the above message is not  present.
>>>>>
>>>>> But when I do the synchronisation with virtuoso then I get exceptions
>>>>> on parsing the RDF files as they have been converted by CKAN.
>>>>>
>>>>> So it seems that
>>>>>    * in release 09.00.0x the check on this issue was not present at
>>>>> all,
>>>>>    * in release 01.00.0x the check is there, but is silently replaces
>>>>> it with an empty value
>>>>>
>>>>> In both releases the accept-header is not bound to the desired
>>>>> execution.
>>>>>
>>>>> kind regards,
>>>>>
>>>>> Bert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/11/7 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> We have again this problem. This is related to the issue of ODP-294
>>>>>>
>>>>>> in the apache logs:
>>>>>>
>>>>>> [Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306 ERROR
>>>>>> [ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column 103
>>>>>>
>>>>>> in the CKAN logs
>>>>>>
>>>>>> 2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil] EntityRef:
>>>>>> expecting ';', line 39, column 102
>>>>>>
>>>>>> As this is silently in the logs instead of a _proper_ validation
>>>>>> error like:
>>>>>>
>>>>>> 2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation
>>>>>> error: '{\'keyword_string\': [u\'Tag "inter
>>>>>> national trade,trade statistics" must be alphanumeric characters or
>>>>>> symbols: -_.\'], \'__type\': \'Validati
>>>>>> on Error\'}'
>>>>>>
>>>>>> rdf2ckan can only conclude it has inserted the data correctly while
>>>>>> it is not.
>>>>>> So the outcome of rdf2ckan is hence untrust whorthy. For that reason
>>>>>> this issue remained unseen by TF development.
>>>>>>
>>>>>> Secondly I am concerned about the fact if the above configuration
>>>>>> solution described by David is working.
>>>>>> It seems that the setting of the header to "application/json" does
>>>>>> not result in the correct handling by CKAN.
>>>>>> Here under is the extract of the java code showing that we sent the
>>>>>> content-type as requested.
>>>>>>
>>>>>> private static final String CONTENT_JSON = "application/json";
>>>>>>
>>>>>> public Map<String, Object> post(String json, String method) {
>>>>>>         if (json.isEmpty() || method.isEmpty())
>>>>>>             return null;
>>>>>>         DefaultHttpClient httpClient = new DefaultHttpClient();
>>>>>>
>>>>>>         HttpPost httpPost = requestCredentials(method);
>>>>>>         json = json.replaceAll(";", "");
>>>>>>         httpPost.setEntity(streamify(json, CONTENT_JSON));
>>>>>>        ...
>>>>>>
>>>>>>  public InputStreamEntity streamify(String json, String contentType) {
>>>>>>         json = json.replaceAll(";", "");
>>>>>>         InputStreamEntity inputStreamEntity = new
>>>>>> InputStreamEntity(new
>>>>>> ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
>>>>>> json.getBytes().length);
>>>>>>         inputStreamEntity.setContentType(contentType);
>>>>>>         return inputStreamEntity;
>>>>>>     }
>>>>>>
>>>>>>
>>>>>> Note that the submitted RDF is all valid rdf and XML in the request
>>>>>> body: it passes through all XML & RDF validations (e.g. rapper).
>>>>>>
>>>>>> When I do my tests, it seems that in case the error occurs the
>>>>>> submitted RDF is not saved, and hence the RDF in CKAN is that one of the
>>>>>> CKAN RDF generation process. Is that true? In that case the level of the
>>>>>> message ERROR and the result by the API call return to it are not in sync.
>>>>>> On an ERROR I expect that the api call return an error code. If that was a
>>>>>> warn I can understand that the api call returns success.
>>>>>>
>>>>>> So the next actions should be taken:
>>>>>>
>>>>>> 1) make the above error message a proper validation result.
>>>>>>     Note this solves also the problem of the meaningless error
>>>>>> message. There is no reference at all in the message to know about which
>>>>>> call/object it is. So the problem is unresolvable with this message. And
>>>>>> putting CKAN in debug mode logging for that is not very sensible.
>>>>>> 2) ensure that the handling of RDF (in rdfxml-format) inside the json
>>>>>> is treated properly and not interpreted as an html encoding.
>>>>>>
>>>>>> best regards,
>>>>>>
>>>>>> Bert
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/9/18 David Raznick <david.raznick at okfn.org>
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> It looks like the error is because the wrong content type is sent
>>>>>>> when posting to ckan in rdf2ckan.   Content-Type:application/json should be
>>>>>>> sent otherwise CKAN will think it is urlencoded and therefore make the
>>>>>>> final document encoded wrongly.
>>>>>>>
>>>>>>> This must have changed at somepoint, as the live db does not have
>>>>>>> this issue and is likely what caused the unicode issues from happening too.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>> On 18 September 2013 16:16, Dimitrios Mexis <
>>>>>>> dimitrios.mexis at tenforce.com> wrote:
>>>>>>>
>>>>>>>>  Hello David,
>>>>>>>>
>>>>>>>> we don't do any postprocessing for the data we send.
>>>>>>>>
>>>>>>>> It's pure rdf content as read from the file. So, we depend on CKAN
>>>>>>>> how it will digest it.
>>>>>>>> If we need to do some postprocessing, can you give clarification on
>>>>>>>> the matter ?
>>>>>>>>
>>>>>>>> Also, does that have to do with the problem we faced with unicode
>>>>>>>> exceptions from CKAN as well ?
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>> Dimitrios
>>>>>>>>
>>>>>>>>
>>>>>>>> On 18/09/2013 17:12, David Raznick wrote:
>>>>>>>>
>>>>>>>>  Hello Bert
>>>>>>>>
>>>>>>>>
>>>>>>>>  There appears to be a lot of malformed rdf documents in the test
>>>>>>>> system e.g
>>>>>>>>
>>>>>>>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>>>>>>>
>>>>>>>>  After some investigation this seems to due to the data send by
>>>>>>>> rdf2ckan.  It appears that xml "&" are not being escaped correctly in some
>>>>>>>> places.
>>>>>>>>
>>>>>>>>  These errors do not appear on the live db as far as I can see,
>>>>>>>> could you please look into this.
>>>>>>>>
>>>>>>>>  Thanks
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Ecodp-dev mailing list
>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ecodp-dev mailing list
>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Bert Van Nuffelen
>>>>>>
>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>> www.tenforce.be
>>>>>>
>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>> Office: +32 (0)16 31 48 60
>>>>>> Mobile:+32 479 06 24 26
>>>>>> skype: bert.van.nuffelen
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bert Van Nuffelen
>>>>>
>>>>> Semantic Technologies Software Architect at TenForce
>>>>> www.tenforce.be
>>>>>
>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>> Office: +32 (0)16 31 48 60
>>>>> Mobile:+32 479 06 24 26
>>>>> skype: bert.van.nuffelen
>>>>>
>>>>> _______________________________________________
>>>>> Ecodp-dev mailing list
>>>>> Ecodp-dev at lists.okfn.org
>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Ecodp-dev mailing list
>>>> Ecodp-dev at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Bert Van Nuffelen
>>>
>>> Semantic Technologies Software Architect at TenForce
>>> www.tenforce.be
>>>
>>> Bert.Van.Nuffelen at tenforce.com
>>> Office: +32 (0)16 31 48 60
>>> Mobile:+32 479 06 24 26
>>> skype: bert.van.nuffelen
>>>
>>
>>
>>
>> --
>> Bert Van Nuffelen
>>
>> Semantic Technologies Software Architect at TenForce
>> www.tenforce.be
>>
>> Bert.Van.Nuffelen at tenforce.com
>> Office: +32 (0)16 31 48 60
>> Mobile:+32 479 06 24 26
>> skype: bert.van.nuffelen
>>
>> _______________________________________________
>> Ecodp-dev mailing list
>> Ecodp-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/d4e9cb16/attachment.html>


More information about the ecodp-dev mailing list