[ECODP-dev] Malformed rdf
Bert Van Nuffelen
bert.van.nuffelen at tenforce.com
Tue Nov 12 15:32:49 UTC 2013
Hi John,
Can you check now the test server.
It should be now on release 01.00.0x
Bert
2013/11/12 John Glover <john.glover at okfn.org>
> Hi Bert,
>
> Ok thanks, let me know if your update resolves the problem.
>
> Regards,
> John
>
>
> On 12 November 2013 16:23, Bert Van Nuffelen <
> bert.van.nuffelen at tenforce.com> wrote:
>
>> Hi John,
>>
>> - I am busy resurrecting the machines after the power cut this noon. -
>>
>> On this topic: we see by doing a detailed analysis that the apache Java
>> library is setting the content-type in the body of the request.
>> From what I have seen so far that might be the case why this parameter is
>> not taken into account. We have updated rdf2ckan to put is also in the
>> header and will do a test with the new version.
>>
>> kind regards,
>>
>> Bert
>>
>>
>>
>> 2013/11/12 John Glover <john.glover at okfn.org>
>>
>>> Hi Bert,
>>>
>>> Yes I tested both create and update (both of which go through the same
>>> RDF validator).
>>>
>>> Ok thanks for the info.
>>>
>>> Regards,
>>> John
>>>
>>>
>>> On 12 November 2013 11:34, Bert Van Nuffelen <
>>> bert.van.nuffelen at tenforce.com> wrote:
>>>
>>>> Hi John,
>>>>
>>>> strange. I will create a full dump of the request to see the content.
>>>> Did you do a create and an update?
>>>>
>>>> best,
>>>>
>>>> Bert
>>>>
>>>> ps. we have an power interrupt from 12h (in half an hour) till 13h. So
>>>> there might be communication glitches.
>>>>
>>>> Bert
>>>>
>>>>
>>>>
>>>> 2013/11/12 John Glover <john.glover at okfn.org>
>>>>
>>>>> Hi Bert,
>>>>>
>>>>> I am unable to reproduce your error here. Both of those JSON files are
>>>>> imported fine for me using a simple Python client to make an API request
>>>>> (release 01), neither produce the error. Could you therefore please double
>>>>> check that your API request is correct, and let me know when the test
>>>>> server is back to release 01 so that we can test it again there.
>>>>>
>>>>> I have pushed a change to ckanext-ecportal so it will now raise a
>>>>> ValidationError if it cannot parse the RDF in the dataset JSON as requested.
>>>>>
>>>>> Regards,
>>>>> John
>>>>>
>>>>>
>>>>> On 12 November 2013 09:44, John Glover <john.glover at okfn.org> wrote:
>>>>>
>>>>>> Hi Bert,
>>>>>>
>>>>>> Thanks, I will have a look at that JSON file and get back to you.
>>>>>>
>>>>>> Regards,
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 12 November 2013 09:08, Bert Van Nuffelen <
>>>>>> bert.van.nuffelen at tenforce.com> wrote:
>>>>>>
>>>>>>> Hi John,
>>>>>>>
>>>>>>> I forgot to add the message that I attached example json's to the
>>>>>>> previous respons.
>>>>>>>
>>>>>>> Bert
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/12 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>>>>>
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/11/11 John Glover <john.glover at okfn.org>
>>>>>>>>
>>>>>>>>> Hi Bert,
>>>>>>>>>
>>>>>>>>> The last time this issue came up, the only way that we could
>>>>>>>>> recreate the problem was when making a request without setting the content
>>>>>>>>> type to 'application/json'. As you are now doing this, could you please
>>>>>>>>> send us an example of the JSON that you are sending to CKAN in a failing
>>>>>>>>> request so that we can investigate further.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> > When I do my tests, it seems that in case the error occurs the
>>>>>>>>> submitted RDF is not saved, and hence the RDF in CKAN is
>>>>>>>>> > that one of the CKAN RDF generation process. Is that true?
>>>>>>>>>
>>>>>>>>> Yes this is correct.
>>>>>>>>>
>>>>>>>>> > In that case the level of the message ERROR and the result by
>>>>>>>>> the API call return to it are not in sync. On an ERROR I
>>>>>>>>> > expect that the api call return an error code. If that was a
>>>>>>>>> warn I can understand that the api call returns success.
>>>>>>>>>
>>>>>>>>> We could of course return a validation error here (although this
>>>>>>>>> has not been requested before). I believe the thinking was that it was best
>>>>>>>>> to save the data and ignore this error as CKAN can generate RDF, but this
>>>>>>>>> logic could be changed.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I understand the reasoning, but it results in an incorrect
>>>>>>>> end-result. In this case it even has hidden a communication error between
>>>>>>>> RDF2CKAN and CKAN. Since the api provides a correct feedback mechanism I
>>>>>>>> would like to rely on that.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8 November 2013 11:47, Bert Van Nuffelen <
>>>>>>>>> bert.van.nuffelen at tenforce.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> one addition, I run overnight with the same RDF2CKAN and the same
>>>>>>>>>> ESTAT package an upload on the release 09.00.0x version and there in the
>>>>>>>>>> CKAN logs the above message is not present.
>>>>>>>>>>
>>>>>>>>>> But when I do the synchronisation with virtuoso then I get
>>>>>>>>>> exceptions on parsing the RDF files as they have been converted by CKAN.
>>>>>>>>>>
>>>>>>>>>> So it seems that
>>>>>>>>>> * in release 09.00.0x the check on this issue was not present
>>>>>>>>>> at all,
>>>>>>>>>> * in release 01.00.0x the check is there, but is silently
>>>>>>>>>> replaces it with an empty value
>>>>>>>>>>
>>>>>>>>>> In both releases the accept-header is not bound to the desired
>>>>>>>>>> execution.
>>>>>>>>>>
>>>>>>>>>> kind regards,
>>>>>>>>>>
>>>>>>>>>> Bert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013/11/7 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>>>>>>>>
>>>>>>>>>>> Dear all,
>>>>>>>>>>>
>>>>>>>>>>> We have again this problem. This is related to the issue of
>>>>>>>>>>> ODP-294
>>>>>>>>>>>
>>>>>>>>>>> in the apache logs:
>>>>>>>>>>>
>>>>>>>>>>> [Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306 ERROR
>>>>>>>>>>> [ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column 103
>>>>>>>>>>>
>>>>>>>>>>> in the CKAN logs
>>>>>>>>>>>
>>>>>>>>>>> 2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil]
>>>>>>>>>>> EntityRef: expecting ';', line 39, column 102
>>>>>>>>>>>
>>>>>>>>>>> As this is silently in the logs instead of a _proper_ validation
>>>>>>>>>>> error like:
>>>>>>>>>>>
>>>>>>>>>>> 2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation
>>>>>>>>>>> error: '{\'keyword_string\': [u\'Tag "inter
>>>>>>>>>>> national trade,trade statistics" must be alphanumeric characters
>>>>>>>>>>> or symbols: -_.\'], \'__type\': \'Validati
>>>>>>>>>>> on Error\'}'
>>>>>>>>>>>
>>>>>>>>>>> rdf2ckan can only conclude it has inserted the data correctly
>>>>>>>>>>> while it is not.
>>>>>>>>>>> So the outcome of rdf2ckan is hence untrust whorthy. For that
>>>>>>>>>>> reason this issue remained unseen by TF development.
>>>>>>>>>>>
>>>>>>>>>>> Secondly I am concerned about the fact if the above
>>>>>>>>>>> configuration solution described by David is working.
>>>>>>>>>>> It seems that the setting of the header to "application/json"
>>>>>>>>>>> does not result in the correct handling by CKAN.
>>>>>>>>>>> Here under is the extract of the java code showing that we sent
>>>>>>>>>>> the content-type as requested.
>>>>>>>>>>>
>>>>>>>>>>> private static final String CONTENT_JSON = "application/json";
>>>>>>>>>>>
>>>>>>>>>>> public Map<String, Object> post(String json, String method) {
>>>>>>>>>>> if (json.isEmpty() || method.isEmpty())
>>>>>>>>>>> return null;
>>>>>>>>>>> DefaultHttpClient httpClient = new DefaultHttpClient();
>>>>>>>>>>>
>>>>>>>>>>> HttpPost httpPost = requestCredentials(method);
>>>>>>>>>>> json = json.replaceAll(";", "");
>>>>>>>>>>> httpPost.setEntity(streamify(json, CONTENT_JSON));
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> public InputStreamEntity streamify(String json, String
>>>>>>>>>>> contentType) {
>>>>>>>>>>> json = json.replaceAll(";", "");
>>>>>>>>>>> InputStreamEntity inputStreamEntity = new
>>>>>>>>>>> InputStreamEntity(new
>>>>>>>>>>> ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
>>>>>>>>>>> json.getBytes().length);
>>>>>>>>>>> inputStreamEntity.setContentType(contentType);
>>>>>>>>>>> return inputStreamEntity;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Note that the submitted RDF is all valid rdf and XML in the
>>>>>>>>>>> request body: it passes through all XML & RDF validations (e.g. rapper).
>>>>>>>>>>>
>>>>>>>>>>> When I do my tests, it seems that in case the error occurs the
>>>>>>>>>>> submitted RDF is not saved, and hence the RDF in CKAN is that one of the
>>>>>>>>>>> CKAN RDF generation process. Is that true? In that case the level of the
>>>>>>>>>>> message ERROR and the result by the API call return to it are not in sync.
>>>>>>>>>>> On an ERROR I expect that the api call return an error code. If that was a
>>>>>>>>>>> warn I can understand that the api call returns success.
>>>>>>>>>>>
>>>>>>>>>>> So the next actions should be taken:
>>>>>>>>>>>
>>>>>>>>>>> 1) make the above error message a proper validation result.
>>>>>>>>>>> Note this solves also the problem of the meaningless error
>>>>>>>>>>> message. There is no reference at all in the message to know about which
>>>>>>>>>>> call/object it is. So the problem is unresolvable with this message. And
>>>>>>>>>>> putting CKAN in debug mode logging for that is not very sensible.
>>>>>>>>>>> 2) ensure that the handling of RDF (in rdfxml-format) inside the
>>>>>>>>>>> json is treated properly and not interpreted as an html encoding.
>>>>>>>>>>>
>>>>>>>>>>> best regards,
>>>>>>>>>>>
>>>>>>>>>>> Bert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2013/9/18 David Raznick <david.raznick at okfn.org>
>>>>>>>>>>>
>>>>>>>>>>>> Hello
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like the error is because the wrong content type is
>>>>>>>>>>>> sent when posting to ckan in rdf2ckan. Content-Type:application/json
>>>>>>>>>>>> should be sent otherwise CKAN will think it is urlencoded and therefore
>>>>>>>>>>>> make the final document encoded wrongly.
>>>>>>>>>>>>
>>>>>>>>>>>> This must have changed at somepoint, as the live db does not
>>>>>>>>>>>> have this issue and is likely what caused the unicode issues from happening
>>>>>>>>>>>> too.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 18 September 2013 16:16, Dimitrios Mexis <
>>>>>>>>>>>> dimitrios.mexis at tenforce.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello David,
>>>>>>>>>>>>>
>>>>>>>>>>>>> we don't do any postprocessing for the data we send.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's pure rdf content as read from the file. So, we depend on
>>>>>>>>>>>>> CKAN how it will digest it.
>>>>>>>>>>>>> If we need to do some postprocessing, can you give
>>>>>>>>>>>>> clarification on the matter ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, does that have to do with the problem we faced with
>>>>>>>>>>>>> unicode exceptions from CKAN as well ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kind regards
>>>>>>>>>>>>> Dimitrios
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 18/09/2013 17:12, David Raznick wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello Bert
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> There appears to be a lot of malformed rdf documents in the
>>>>>>>>>>>>> test system e.g
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>>>>>>>>>>>>
>>>>>>>>>>>>> After some investigation this seems to due to the data send
>>>>>>>>>>>>> by rdf2ckan. It appears that xml "&" are not being escaped correctly in
>>>>>>>>>>>>> some places.
>>>>>>>>>>>>>
>>>>>>>>>>>>> These errors do not appear on the live db as far as I can
>>>>>>>>>>>>> see, could you please look into this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Bert Van Nuffelen
>>>>>>>>>>>
>>>>>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>>>>>> www.tenforce.be
>>>>>>>>>>>
>>>>>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>>>>>> Mobile:+32 479 06 24 26
>>>>>>>>>>> skype: bert.van.nuffelen
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Bert Van Nuffelen
>>>>>>>>>>
>>>>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>>>>> www.tenforce.be
>>>>>>>>>>
>>>>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>>>>> Mobile:+32 479 06 24 26
>>>>>>>>>> skype: bert.van.nuffelen
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Bert Van Nuffelen
>>>>>>>>
>>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>>> www.tenforce.be
>>>>>>>>
>>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>>> Mobile:+32 479 06 24 26
>>>>>>>> skype: bert.van.nuffelen
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Bert Van Nuffelen
>>>>>>>
>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>> www.tenforce.be
>>>>>>>
>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>> Mobile:+32 479 06 24 26
>>>>>>> skype: bert.van.nuffelen
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ecodp-dev mailing list
>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ecodp-dev mailing list
>>>>> Ecodp-dev at lists.okfn.org
>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Bert Van Nuffelen
>>>>
>>>> Semantic Technologies Software Architect at TenForce
>>>> www.tenforce.be
>>>>
>>>> Bert.Van.Nuffelen at tenforce.com
>>>> Office: +32 (0)16 31 48 60
>>>> Mobile:+32 479 06 24 26
>>>> skype: bert.van.nuffelen
>>>>
>>>> _______________________________________________
>>>> Ecodp-dev mailing list
>>>> Ecodp-dev at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Ecodp-dev mailing list
>>> Ecodp-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>
>>>
>>
>>
>> --
>> Bert Van Nuffelen
>>
>> Semantic Technologies Software Architect at TenForce
>> www.tenforce.be
>>
>> Bert.Van.Nuffelen at tenforce.com
>> Office: +32 (0)16 31 48 60
>> Mobile:+32 479 06 24 26
>> skype: bert.van.nuffelen
>>
>> _______________________________________________
>> Ecodp-dev mailing list
>> Ecodp-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>
>>
>
> _______________________________________________
> Ecodp-dev mailing list
> Ecodp-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>
>
--
Bert Van Nuffelen
Semantic Technologies Software Architect at TenForce
www.tenforce.be
Bert.Van.Nuffelen at tenforce.com
Office: +32 (0)16 31 48 60
Mobile:+32 479 06 24 26
skype: bert.van.nuffelen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/23bdc12a/attachment.html>
More information about the ecodp-dev
mailing list