[ECODP-dev] Malformed rdf

John Glover john.glover at okfn.org
Tue Nov 12 10:38:52 UTC 2013


Hi Bert,

Yes I tested both create and update (both of which go through the same RDF
validator).

Ok thanks for the info.

Regards,
John


On 12 November 2013 11:34, Bert Van Nuffelen <bert.van.nuffelen at tenforce.com
> wrote:

> Hi John,
>
> strange. I will create a full dump of the request to see the content. Did
> you do a create and an update?
>
> best,
>
> Bert
>
> ps. we have an power interrupt from 12h (in half an hour) till 13h. So
> there might be communication glitches.
>
> Bert
>
>
>
> 2013/11/12 John Glover <john.glover at okfn.org>
>
>> Hi Bert,
>>
>> I am unable to reproduce your error here. Both of those JSON files are
>> imported fine for me using a simple Python client to make an API request
>> (release 01), neither produce the error. Could you therefore please double
>> check that your API request is correct, and let me know when the test
>> server is back to release 01 so that we can test it again there.
>>
>> I have pushed a change to ckanext-ecportal so it will now raise a
>> ValidationError if it cannot parse the RDF in the dataset JSON as requested.
>>
>> Regards,
>> John
>>
>>
>> On 12 November 2013 09:44, John Glover <john.glover at okfn.org> wrote:
>>
>>> Hi Bert,
>>>
>>> Thanks, I will have a look at that JSON file and get back to you.
>>>
>>> Regards,
>>> John
>>>
>>>
>>> On 12 November 2013 09:08, Bert Van Nuffelen <
>>> bert.van.nuffelen at tenforce.com> wrote:
>>>
>>>> Hi John,
>>>>
>>>> I forgot to add the message that I attached example json's  to the
>>>> previous respons.
>>>>
>>>> Bert
>>>>
>>>>
>>>> 2013/11/12 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>>
>>>>> Hi John,
>>>>>
>>>>>
>>>>> 2013/11/11 John Glover <john.glover at okfn.org>
>>>>>
>>>>>> Hi Bert,
>>>>>>
>>>>>> The last time this issue came up, the only way that we could recreate
>>>>>> the problem was when making a request without setting the content type to
>>>>>> 'application/json'. As you are now doing this, could you please send us an
>>>>>> example of the JSON that you are sending to CKAN in a failing request so
>>>>>> that we can investigate further.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>> > When I do my tests, it seems that in case the error occurs the
>>>>>> submitted RDF is not saved, and hence the RDF in CKAN is
>>>>>> > that one of the CKAN RDF generation process. Is that true?
>>>>>>
>>>>>> Yes this is correct.
>>>>>>
>>>>>> > In that case the level of the message ERROR and the result by the
>>>>>> API call return to it are not in sync. On an ERROR I
>>>>>> > expect that the api call return an error code. If that was a warn I
>>>>>> can understand that the api call returns success.
>>>>>>
>>>>>> We could of course return a validation error here (although this has
>>>>>> not been requested before). I believe the thinking was that it was best to
>>>>>> save the data and ignore this error as CKAN can generate RDF, but this
>>>>>> logic could be changed.
>>>>>>
>>>>>
>>>>> I understand the reasoning, but it results in an incorrect end-result.
>>>>> In this case it even has hidden a communication error between RDF2CKAN and
>>>>> CKAN. Since the api provides a correct feedback mechanism I would like to
>>>>> rely on that.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 8 November 2013 11:47, Bert Van Nuffelen <
>>>>>> bert.van.nuffelen at tenforce.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> one addition, I run overnight with the same RDF2CKAN and the same
>>>>>>> ESTAT package an upload on the release 09.00.0x version and there in the
>>>>>>> CKAN logs the above message is not  present.
>>>>>>>
>>>>>>> But when I do the synchronisation with virtuoso then I get
>>>>>>> exceptions on parsing the RDF files as they have been converted by CKAN.
>>>>>>>
>>>>>>> So it seems that
>>>>>>>    * in release 09.00.0x the check on this issue was not present at
>>>>>>> all,
>>>>>>>    * in release 01.00.0x the check is there, but is silently
>>>>>>> replaces it with an empty value
>>>>>>>
>>>>>>> In both releases the accept-header is not bound to the desired
>>>>>>> execution.
>>>>>>>
>>>>>>> kind regards,
>>>>>>>
>>>>>>> Bert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/7 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>>>>>
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> We have again this problem. This is related to the issue of ODP-294
>>>>>>>>
>>>>>>>> in the apache logs:
>>>>>>>>
>>>>>>>> [Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306 ERROR
>>>>>>>> [ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column 103
>>>>>>>>
>>>>>>>> in the CKAN logs
>>>>>>>>
>>>>>>>> 2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil] EntityRef:
>>>>>>>> expecting ';', line 39, column 102
>>>>>>>>
>>>>>>>> As this is silently in the logs instead of a _proper_ validation
>>>>>>>> error like:
>>>>>>>>
>>>>>>>> 2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation
>>>>>>>> error: '{\'keyword_string\': [u\'Tag "inter
>>>>>>>> national trade,trade statistics" must be alphanumeric characters or
>>>>>>>> symbols: -_.\'], \'__type\': \'Validati
>>>>>>>> on Error\'}'
>>>>>>>>
>>>>>>>> rdf2ckan can only conclude it has inserted the data correctly while
>>>>>>>> it is not.
>>>>>>>> So the outcome of rdf2ckan is hence untrust whorthy. For that
>>>>>>>> reason this issue remained unseen by TF development.
>>>>>>>>
>>>>>>>> Secondly I am concerned about the fact if the above configuration
>>>>>>>> solution described by David is working.
>>>>>>>> It seems that the setting of the header to "application/json" does
>>>>>>>> not result in the correct handling by CKAN.
>>>>>>>> Here under is the extract of the java code showing that we sent the
>>>>>>>> content-type as requested.
>>>>>>>>
>>>>>>>> private static final String CONTENT_JSON = "application/json";
>>>>>>>>
>>>>>>>> public Map<String, Object> post(String json, String method) {
>>>>>>>>         if (json.isEmpty() || method.isEmpty())
>>>>>>>>             return null;
>>>>>>>>         DefaultHttpClient httpClient = new DefaultHttpClient();
>>>>>>>>
>>>>>>>>         HttpPost httpPost = requestCredentials(method);
>>>>>>>>         json = json.replaceAll(";", "");
>>>>>>>>         httpPost.setEntity(streamify(json, CONTENT_JSON));
>>>>>>>>        ...
>>>>>>>>
>>>>>>>>  public InputStreamEntity streamify(String json, String
>>>>>>>> contentType) {
>>>>>>>>         json = json.replaceAll(";", "");
>>>>>>>>         InputStreamEntity inputStreamEntity = new
>>>>>>>> InputStreamEntity(new
>>>>>>>> ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
>>>>>>>> json.getBytes().length);
>>>>>>>>         inputStreamEntity.setContentType(contentType);
>>>>>>>>         return inputStreamEntity;
>>>>>>>>     }
>>>>>>>>
>>>>>>>>
>>>>>>>> Note that the submitted RDF is all valid rdf and XML in the request
>>>>>>>> body: it passes through all XML & RDF validations (e.g. rapper).
>>>>>>>>
>>>>>>>> When I do my tests, it seems that in case the error occurs the
>>>>>>>> submitted RDF is not saved, and hence the RDF in CKAN is that one of the
>>>>>>>> CKAN RDF generation process. Is that true? In that case the level of the
>>>>>>>> message ERROR and the result by the API call return to it are not in sync.
>>>>>>>> On an ERROR I expect that the api call return an error code. If that was a
>>>>>>>> warn I can understand that the api call returns success.
>>>>>>>>
>>>>>>>> So the next actions should be taken:
>>>>>>>>
>>>>>>>> 1) make the above error message a proper validation result.
>>>>>>>>     Note this solves also the problem of the meaningless error
>>>>>>>> message. There is no reference at all in the message to know about which
>>>>>>>> call/object it is. So the problem is unresolvable with this message. And
>>>>>>>> putting CKAN in debug mode logging for that is not very sensible.
>>>>>>>> 2) ensure that the handling of RDF (in rdfxml-format) inside the
>>>>>>>> json is treated properly and not interpreted as an html encoding.
>>>>>>>>
>>>>>>>> best regards,
>>>>>>>>
>>>>>>>> Bert
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/9/18 David Raznick <david.raznick at okfn.org>
>>>>>>>>
>>>>>>>>> Hello
>>>>>>>>>
>>>>>>>>> It looks like the error is because the wrong content type is sent
>>>>>>>>> when posting to ckan in rdf2ckan.   Content-Type:application/json should be
>>>>>>>>> sent otherwise CKAN will think it is urlencoded and therefore make the
>>>>>>>>> final document encoded wrongly.
>>>>>>>>>
>>>>>>>>> This must have changed at somepoint, as the live db does not have
>>>>>>>>> this issue and is likely what caused the unicode issues from happening too.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 18 September 2013 16:16, Dimitrios Mexis <
>>>>>>>>> dimitrios.mexis at tenforce.com> wrote:
>>>>>>>>>
>>>>>>>>>>  Hello David,
>>>>>>>>>>
>>>>>>>>>> we don't do any postprocessing for the data we send.
>>>>>>>>>>
>>>>>>>>>> It's pure rdf content as read from the file. So, we depend on
>>>>>>>>>> CKAN how it will digest it.
>>>>>>>>>> If we need to do some postprocessing, can you give clarification
>>>>>>>>>> on the matter ?
>>>>>>>>>>
>>>>>>>>>> Also, does that have to do with the problem we faced with unicode
>>>>>>>>>> exceptions from CKAN as well ?
>>>>>>>>>>
>>>>>>>>>> Kind regards
>>>>>>>>>> Dimitrios
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 18/09/2013 17:12, David Raznick wrote:
>>>>>>>>>>
>>>>>>>>>>  Hello Bert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  There appears to be a lot of malformed rdf documents in the
>>>>>>>>>> test system e.g
>>>>>>>>>>
>>>>>>>>>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>>>>>>>>>
>>>>>>>>>>  After some investigation this seems to due to the data send by
>>>>>>>>>> rdf2ckan.  It appears that xml "&" are not being escaped correctly in some
>>>>>>>>>> places.
>>>>>>>>>>
>>>>>>>>>>  These errors do not appear on the live db as far as I can see,
>>>>>>>>>> could you please look into this.
>>>>>>>>>>
>>>>>>>>>>  Thanks
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Bert Van Nuffelen
>>>>>>>>
>>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>>> www.tenforce.be
>>>>>>>>
>>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>>> Mobile:+32 479 06 24 26
>>>>>>>> skype: bert.van.nuffelen
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Bert Van Nuffelen
>>>>>>>
>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>> www.tenforce.be
>>>>>>>
>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>> Mobile:+32 479 06 24 26
>>>>>>> skype: bert.van.nuffelen
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ecodp-dev mailing list
>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ecodp-dev mailing list
>>>>>> Ecodp-dev at lists.okfn.org
>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bert Van Nuffelen
>>>>>
>>>>> Semantic Technologies Software Architect at TenForce
>>>>> www.tenforce.be
>>>>>
>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>> Office: +32 (0)16 31 48 60
>>>>> Mobile:+32 479 06 24 26
>>>>> skype: bert.van.nuffelen
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bert Van Nuffelen
>>>>
>>>> Semantic Technologies Software Architect at TenForce
>>>> www.tenforce.be
>>>>
>>>> Bert.Van.Nuffelen at tenforce.com
>>>> Office: +32 (0)16 31 48 60
>>>> Mobile:+32 479 06 24 26
>>>> skype: bert.van.nuffelen
>>>>
>>>> _______________________________________________
>>>> Ecodp-dev mailing list
>>>> Ecodp-dev at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Ecodp-dev mailing list
>> Ecodp-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>
>>
>
>
> --
> Bert Van Nuffelen
>
> Semantic Technologies Software Architect at TenForce
> www.tenforce.be
>
> Bert.Van.Nuffelen at tenforce.com
> Office: +32 (0)16 31 48 60
> Mobile:+32 479 06 24 26
> skype: bert.van.nuffelen
>
> _______________________________________________
> Ecodp-dev mailing list
> Ecodp-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/8512fe16/attachment.html>


More information about the ecodp-dev mailing list