[ECODP-dev] Malformed rdf

John Glover john.glover at okfn.org
Tue Nov 12 15:46:46 UTC 2013


Hi Bert,

Thanks, yes it looks good to me.

Regards,
John


On 12 November 2013 16:32, Bert Van Nuffelen <bert.van.nuffelen at tenforce.com
> wrote:

> Hi John,
>
> Can you check now the test server.
>
> It should be now on release 01.00.0x
>
> Bert
>
>
> 2013/11/12 John Glover <john.glover at okfn.org>
>
>> Hi Bert,
>>
>> Ok thanks, let me know if your update resolves the problem.
>>
>> Regards,
>> John
>>
>>
>> On 12 November 2013 16:23, Bert Van Nuffelen <
>> bert.van.nuffelen at tenforce.com> wrote:
>>
>>> Hi John,
>>>
>>>  - I am busy resurrecting the machines after the power cut this noon. -
>>>
>>> On this topic: we see by doing a detailed analysis that the apache Java
>>> library is setting the content-type in the body of the request.
>>> From what I have seen so far that might be the case why this parameter
>>> is not taken into account. We have updated rdf2ckan to put is also in the
>>> header and will do a test with the new version.
>>>
>>> kind regards,
>>>
>>> Bert
>>>
>>>
>>>
>>> 2013/11/12 John Glover <john.glover at okfn.org>
>>>
>>>> Hi Bert,
>>>>
>>>> Yes I tested both create and update (both of which go through the same
>>>> RDF validator).
>>>>
>>>> Ok thanks for the info.
>>>>
>>>> Regards,
>>>> John
>>>>
>>>>
>>>> On 12 November 2013 11:34, Bert Van Nuffelen <
>>>> bert.van.nuffelen at tenforce.com> wrote:
>>>>
>>>>> Hi John,
>>>>>
>>>>> strange. I will create a full dump of the request to see the content.
>>>>> Did you do a create and an update?
>>>>>
>>>>> best,
>>>>>
>>>>> Bert
>>>>>
>>>>> ps. we have an power interrupt from 12h (in half an hour) till 13h. So
>>>>> there might be communication glitches.
>>>>>
>>>>> Bert
>>>>>
>>>>>
>>>>>
>>>>> 2013/11/12 John Glover <john.glover at okfn.org>
>>>>>
>>>>>> Hi Bert,
>>>>>>
>>>>>> I am unable to reproduce your error here. Both of those JSON files
>>>>>> are imported fine for me using a simple Python client to make an API
>>>>>> request (release 01), neither produce the error. Could you therefore please
>>>>>> double check that your API request is correct, and let me know when the
>>>>>> test server is back to release 01 so that we can test it again there.
>>>>>>
>>>>>> I have pushed a change to ckanext-ecportal so it will now raise a
>>>>>> ValidationError if it cannot parse the RDF in the dataset JSON as requested.
>>>>>>
>>>>>> Regards,
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 12 November 2013 09:44, John Glover <john.glover at okfn.org> wrote:
>>>>>>
>>>>>>> Hi Bert,
>>>>>>>
>>>>>>> Thanks, I will have a look at that JSON file and get back to you.
>>>>>>>
>>>>>>> Regards,
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 12 November 2013 09:08, Bert Van Nuffelen <
>>>>>>> bert.van.nuffelen at tenforce.com> wrote:
>>>>>>>
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> I forgot to add the message that I attached example json's  to the
>>>>>>>> previous respons.
>>>>>>>>
>>>>>>>> Bert
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/11/12 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>>>>>>
>>>>>>>>> Hi John,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/11/11 John Glover <john.glover at okfn.org>
>>>>>>>>>
>>>>>>>>>> Hi Bert,
>>>>>>>>>>
>>>>>>>>>> The last time this issue came up, the only way that we could
>>>>>>>>>> recreate the problem was when making a request without setting the content
>>>>>>>>>> type to 'application/json'. As you are now doing this, could you please
>>>>>>>>>> send us an example of the JSON that you are sending to CKAN in a failing
>>>>>>>>>> request so that we can investigate further.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> > When I do my tests, it seems that in case the error occurs the
>>>>>>>>>> submitted RDF is not saved, and hence the RDF in CKAN is
>>>>>>>>>> > that one of the CKAN RDF generation process. Is that true?
>>>>>>>>>>
>>>>>>>>>> Yes this is correct.
>>>>>>>>>>
>>>>>>>>>> > In that case the level of the message ERROR and the result by
>>>>>>>>>> the API call return to it are not in sync. On an ERROR I
>>>>>>>>>> > expect that the api call return an error code. If that was a
>>>>>>>>>> warn I can understand that the api call returns success.
>>>>>>>>>>
>>>>>>>>>> We could of course return a validation error here (although this
>>>>>>>>>> has not been requested before). I believe the thinking was that it was best
>>>>>>>>>> to save the data and ignore this error as CKAN can generate RDF, but this
>>>>>>>>>> logic could be changed.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I understand the reasoning, but it results in an incorrect
>>>>>>>>> end-result. In this case it even has hidden a communication error between
>>>>>>>>> RDF2CKAN and CKAN. Since the api provides a correct feedback mechanism I
>>>>>>>>> would like to rely on that.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 8 November 2013 11:47, Bert Van Nuffelen <
>>>>>>>>>> bert.van.nuffelen at tenforce.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> one addition, I run overnight with the same RDF2CKAN and the
>>>>>>>>>>> same ESTAT package an upload on the release 09.00.0x version and there in
>>>>>>>>>>> the CKAN logs the above message is not  present.
>>>>>>>>>>>
>>>>>>>>>>> But when I do the synchronisation with virtuoso then I get
>>>>>>>>>>> exceptions on parsing the RDF files as they have been converted by CKAN.
>>>>>>>>>>>
>>>>>>>>>>> So it seems that
>>>>>>>>>>>    * in release 09.00.0x the check on this issue was not present
>>>>>>>>>>> at all,
>>>>>>>>>>>    * in release 01.00.0x the check is there, but is silently
>>>>>>>>>>> replaces it with an empty value
>>>>>>>>>>>
>>>>>>>>>>> In both releases the accept-header is not bound to the desired
>>>>>>>>>>> execution.
>>>>>>>>>>>
>>>>>>>>>>> kind regards,
>>>>>>>>>>>
>>>>>>>>>>> Bert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2013/11/7 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>>>>>>>>>
>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>
>>>>>>>>>>>> We have again this problem. This is related to the issue of
>>>>>>>>>>>> ODP-294
>>>>>>>>>>>>
>>>>>>>>>>>> in the apache logs:
>>>>>>>>>>>>
>>>>>>>>>>>> [Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306
>>>>>>>>>>>> ERROR [ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column
>>>>>>>>>>>> 103
>>>>>>>>>>>>
>>>>>>>>>>>> in the CKAN logs
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil]
>>>>>>>>>>>> EntityRef: expecting ';', line 39, column 102
>>>>>>>>>>>>
>>>>>>>>>>>> As this is silently in the logs instead of a _proper_
>>>>>>>>>>>> validation error like:
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation
>>>>>>>>>>>> error: '{\'keyword_string\': [u\'Tag "inter
>>>>>>>>>>>> national trade,trade statistics" must be alphanumeric
>>>>>>>>>>>> characters or symbols: -_.\'], \'__type\': \'Validati
>>>>>>>>>>>> on Error\'}'
>>>>>>>>>>>>
>>>>>>>>>>>> rdf2ckan can only conclude it has inserted the data correctly
>>>>>>>>>>>> while it is not.
>>>>>>>>>>>> So the outcome of rdf2ckan is hence untrust whorthy. For that
>>>>>>>>>>>> reason this issue remained unseen by TF development.
>>>>>>>>>>>>
>>>>>>>>>>>> Secondly I am concerned about the fact if the above
>>>>>>>>>>>> configuration solution described by David is working.
>>>>>>>>>>>> It seems that the setting of the header to "application/json"
>>>>>>>>>>>> does not result in the correct handling by CKAN.
>>>>>>>>>>>> Here under is the extract of the java code showing that we sent
>>>>>>>>>>>> the content-type as requested.
>>>>>>>>>>>>
>>>>>>>>>>>> private static final String CONTENT_JSON = "application/json";
>>>>>>>>>>>>
>>>>>>>>>>>> public Map<String, Object> post(String json, String method) {
>>>>>>>>>>>>         if (json.isEmpty() || method.isEmpty())
>>>>>>>>>>>>             return null;
>>>>>>>>>>>>         DefaultHttpClient httpClient = new DefaultHttpClient();
>>>>>>>>>>>>
>>>>>>>>>>>>         HttpPost httpPost = requestCredentials(method);
>>>>>>>>>>>>         json = json.replaceAll(";", "");
>>>>>>>>>>>>         httpPost.setEntity(streamify(json, CONTENT_JSON));
>>>>>>>>>>>>        ...
>>>>>>>>>>>>
>>>>>>>>>>>>  public InputStreamEntity streamify(String json, String
>>>>>>>>>>>> contentType) {
>>>>>>>>>>>>         json = json.replaceAll(";", "");
>>>>>>>>>>>>         InputStreamEntity inputStreamEntity = new
>>>>>>>>>>>> InputStreamEntity(new
>>>>>>>>>>>> ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
>>>>>>>>>>>> json.getBytes().length);
>>>>>>>>>>>>         inputStreamEntity.setContentType(contentType);
>>>>>>>>>>>>         return inputStreamEntity;
>>>>>>>>>>>>     }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Note that the submitted RDF is all valid rdf and XML in the
>>>>>>>>>>>> request body: it passes through all XML & RDF validations (e.g. rapper).
>>>>>>>>>>>>
>>>>>>>>>>>> When I do my tests, it seems that in case the error occurs the
>>>>>>>>>>>> submitted RDF is not saved, and hence the RDF in CKAN is that one of the
>>>>>>>>>>>> CKAN RDF generation process. Is that true? In that case the level of the
>>>>>>>>>>>> message ERROR and the result by the API call return to it are not in sync.
>>>>>>>>>>>> On an ERROR I expect that the api call return an error code. If that was a
>>>>>>>>>>>> warn I can understand that the api call returns success.
>>>>>>>>>>>>
>>>>>>>>>>>> So the next actions should be taken:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) make the above error message a proper validation result.
>>>>>>>>>>>>     Note this solves also the problem of the meaningless error
>>>>>>>>>>>> message. There is no reference at all in the message to know about which
>>>>>>>>>>>> call/object it is. So the problem is unresolvable with this message. And
>>>>>>>>>>>> putting CKAN in debug mode logging for that is not very sensible.
>>>>>>>>>>>> 2) ensure that the handling of RDF (in rdfxml-format) inside
>>>>>>>>>>>> the json is treated properly and not interpreted as an html encoding.
>>>>>>>>>>>>
>>>>>>>>>>>> best regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Bert
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2013/9/18 David Raznick <david.raznick at okfn.org>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello
>>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like the error is because the wrong content type is
>>>>>>>>>>>>> sent when posting to ckan in rdf2ckan.   Content-Type:application/json
>>>>>>>>>>>>> should be sent otherwise CKAN will think it is urlencoded and therefore
>>>>>>>>>>>>> make the final document encoded wrongly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This must have changed at somepoint, as the live db does not
>>>>>>>>>>>>> have this issue and is likely what caused the unicode issues from happening
>>>>>>>>>>>>> too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 18 September 2013 16:16, Dimitrios Mexis <
>>>>>>>>>>>>> dimitrios.mexis at tenforce.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Hello David,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we don't do any postprocessing for the data we send.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's pure rdf content as read from the file. So, we depend on
>>>>>>>>>>>>>> CKAN how it will digest it.
>>>>>>>>>>>>>> If we need to do some postprocessing, can you give
>>>>>>>>>>>>>> clarification on the matter ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, does that have to do with the problem we faced with
>>>>>>>>>>>>>> unicode exceptions from CKAN as well ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kind regards
>>>>>>>>>>>>>> Dimitrios
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 18/09/2013 17:12, David Raznick wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Hello Bert
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  There appears to be a lot of malformed rdf documents in the
>>>>>>>>>>>>>> test system e.g
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  After some investigation this seems to due to the data send
>>>>>>>>>>>>>> by rdf2ckan.  It appears that xml "&" are not being escaped correctly in
>>>>>>>>>>>>>> some places.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  These errors do not appear on the live db as far as I can
>>>>>>>>>>>>>> see, could you please look into this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Bert Van Nuffelen
>>>>>>>>>>>>
>>>>>>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>>>>>>> www.tenforce.be
>>>>>>>>>>>>
>>>>>>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>>>>>>> Mobile:+32 479 06 24 26
>>>>>>>>>>>> skype: bert.van.nuffelen
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Bert Van Nuffelen
>>>>>>>>>>>
>>>>>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>>>>>> www.tenforce.be
>>>>>>>>>>>
>>>>>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>>>>>> Mobile:+32 479 06 24 26
>>>>>>>>>>> skype: bert.van.nuffelen
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Bert Van Nuffelen
>>>>>>>>>
>>>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>>>> www.tenforce.be
>>>>>>>>>
>>>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>>>> Mobile:+32 479 06 24 26
>>>>>>>>> skype: bert.van.nuffelen
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Bert Van Nuffelen
>>>>>>>>
>>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>>> www.tenforce.be
>>>>>>>>
>>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>>> Mobile:+32 479 06 24 26
>>>>>>>> skype: bert.van.nuffelen
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Ecodp-dev mailing list
>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ecodp-dev mailing list
>>>>>> Ecodp-dev at lists.okfn.org
>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bert Van Nuffelen
>>>>>
>>>>> Semantic Technologies Software Architect at TenForce
>>>>> www.tenforce.be
>>>>>
>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>> Office: +32 (0)16 31 48 60
>>>>> Mobile:+32 479 06 24 26
>>>>> skype: bert.van.nuffelen
>>>>>
>>>>> _______________________________________________
>>>>> Ecodp-dev mailing list
>>>>> Ecodp-dev at lists.okfn.org
>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Ecodp-dev mailing list
>>>> Ecodp-dev at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Bert Van Nuffelen
>>>
>>> Semantic Technologies Software Architect at TenForce
>>> www.tenforce.be
>>>
>>> Bert.Van.Nuffelen at tenforce.com
>>> Office: +32 (0)16 31 48 60
>>> Mobile:+32 479 06 24 26
>>> skype: bert.van.nuffelen
>>>
>>> _______________________________________________
>>> Ecodp-dev mailing list
>>> Ecodp-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>
>>>
>>
>> _______________________________________________
>> Ecodp-dev mailing list
>> Ecodp-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>
>>
>
>
> --
> Bert Van Nuffelen
>
> Semantic Technologies Software Architect at TenForce
> www.tenforce.be
>
> Bert.Van.Nuffelen at tenforce.com
> Office: +32 (0)16 31 48 60
> Mobile:+32 479 06 24 26
> skype: bert.van.nuffelen
>
> _______________________________________________
> Ecodp-dev mailing list
> Ecodp-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/be044693/attachment.html>


More information about the ecodp-dev mailing list