[ECODP-dev] Malformed rdf

Bert Van Nuffelen bert.van.nuffelen at tenforce.com
Tue Nov 12 10:34:31 UTC 2013


Hi John,

strange. I will create a full dump of the request to see the content. Did
you do a create and an update?

best,

Bert

ps. we have an power interrupt from 12h (in half an hour) till 13h. So
there might be communication glitches.

Bert



2013/11/12 John Glover <john.glover at okfn.org>

> Hi Bert,
>
> I am unable to reproduce your error here. Both of those JSON files are
> imported fine for me using a simple Python client to make an API request
> (release 01), neither produce the error. Could you therefore please double
> check that your API request is correct, and let me know when the test
> server is back to release 01 so that we can test it again there.
>
> I have pushed a change to ckanext-ecportal so it will now raise a
> ValidationError if it cannot parse the RDF in the dataset JSON as requested.
>
> Regards,
> John
>
>
> On 12 November 2013 09:44, John Glover <john.glover at okfn.org> wrote:
>
>> Hi Bert,
>>
>> Thanks, I will have a look at that JSON file and get back to you.
>>
>> Regards,
>> John
>>
>>
>> On 12 November 2013 09:08, Bert Van Nuffelen <
>> bert.van.nuffelen at tenforce.com> wrote:
>>
>>> Hi John,
>>>
>>> I forgot to add the message that I attached example json's  to the
>>> previous respons.
>>>
>>> Bert
>>>
>>>
>>> 2013/11/12 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>
>>>> Hi John,
>>>>
>>>>
>>>> 2013/11/11 John Glover <john.glover at okfn.org>
>>>>
>>>>> Hi Bert,
>>>>>
>>>>> The last time this issue came up, the only way that we could recreate
>>>>> the problem was when making a request without setting the content type to
>>>>> 'application/json'. As you are now doing this, could you please send us an
>>>>> example of the JSON that you are sending to CKAN in a failing request so
>>>>> that we can investigate further.
>>>>>
>>>>>
>>>>
>>>>
>>>>> > When I do my tests, it seems that in case the error occurs the
>>>>> submitted RDF is not saved, and hence the RDF in CKAN is
>>>>> > that one of the CKAN RDF generation process. Is that true?
>>>>>
>>>>> Yes this is correct.
>>>>>
>>>>> > In that case the level of the message ERROR and the result by the
>>>>> API call return to it are not in sync. On an ERROR I
>>>>> > expect that the api call return an error code. If that was a warn I
>>>>> can understand that the api call returns success.
>>>>>
>>>>> We could of course return a validation error here (although this has
>>>>> not been requested before). I believe the thinking was that it was best to
>>>>> save the data and ignore this error as CKAN can generate RDF, but this
>>>>> logic could be changed.
>>>>>
>>>>
>>>> I understand the reasoning, but it results in an incorrect end-result.
>>>> In this case it even has hidden a communication error between RDF2CKAN and
>>>> CKAN. Since the api provides a correct feedback mechanism I would like to
>>>> rely on that.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Regards,
>>>>> John
>>>>>
>>>>>
>>>>> On 8 November 2013 11:47, Bert Van Nuffelen <
>>>>> bert.van.nuffelen at tenforce.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> one addition, I run overnight with the same RDF2CKAN and the same
>>>>>> ESTAT package an upload on the release 09.00.0x version and there in the
>>>>>> CKAN logs the above message is not  present.
>>>>>>
>>>>>> But when I do the synchronisation with virtuoso then I get exceptions
>>>>>> on parsing the RDF files as they have been converted by CKAN.
>>>>>>
>>>>>> So it seems that
>>>>>>    * in release 09.00.0x the check on this issue was not present at
>>>>>> all,
>>>>>>    * in release 01.00.0x the check is there, but is silently replaces
>>>>>> it with an empty value
>>>>>>
>>>>>> In both releases the accept-header is not bound to the desired
>>>>>> execution.
>>>>>>
>>>>>> kind regards,
>>>>>>
>>>>>> Bert
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/11/7 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> We have again this problem. This is related to the issue of ODP-294
>>>>>>>
>>>>>>> in the apache logs:
>>>>>>>
>>>>>>> [Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306 ERROR
>>>>>>> [ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column 103
>>>>>>>
>>>>>>> in the CKAN logs
>>>>>>>
>>>>>>> 2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil] EntityRef:
>>>>>>> expecting ';', line 39, column 102
>>>>>>>
>>>>>>> As this is silently in the logs instead of a _proper_ validation
>>>>>>> error like:
>>>>>>>
>>>>>>> 2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation
>>>>>>> error: '{\'keyword_string\': [u\'Tag "inter
>>>>>>> national trade,trade statistics" must be alphanumeric characters or
>>>>>>> symbols: -_.\'], \'__type\': \'Validati
>>>>>>> on Error\'}'
>>>>>>>
>>>>>>> rdf2ckan can only conclude it has inserted the data correctly while
>>>>>>> it is not.
>>>>>>> So the outcome of rdf2ckan is hence untrust whorthy. For that reason
>>>>>>> this issue remained unseen by TF development.
>>>>>>>
>>>>>>> Secondly I am concerned about the fact if the above configuration
>>>>>>> solution described by David is working.
>>>>>>> It seems that the setting of the header to "application/json" does
>>>>>>> not result in the correct handling by CKAN.
>>>>>>> Here under is the extract of the java code showing that we sent the
>>>>>>> content-type as requested.
>>>>>>>
>>>>>>> private static final String CONTENT_JSON = "application/json";
>>>>>>>
>>>>>>> public Map<String, Object> post(String json, String method) {
>>>>>>>         if (json.isEmpty() || method.isEmpty())
>>>>>>>             return null;
>>>>>>>         DefaultHttpClient httpClient = new DefaultHttpClient();
>>>>>>>
>>>>>>>         HttpPost httpPost = requestCredentials(method);
>>>>>>>         json = json.replaceAll(";", "");
>>>>>>>         httpPost.setEntity(streamify(json, CONTENT_JSON));
>>>>>>>        ...
>>>>>>>
>>>>>>>  public InputStreamEntity streamify(String json, String contentType)
>>>>>>> {
>>>>>>>         json = json.replaceAll(";", "");
>>>>>>>         InputStreamEntity inputStreamEntity = new
>>>>>>> InputStreamEntity(new
>>>>>>> ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
>>>>>>> json.getBytes().length);
>>>>>>>         inputStreamEntity.setContentType(contentType);
>>>>>>>         return inputStreamEntity;
>>>>>>>     }
>>>>>>>
>>>>>>>
>>>>>>> Note that the submitted RDF is all valid rdf and XML in the request
>>>>>>> body: it passes through all XML & RDF validations (e.g. rapper).
>>>>>>>
>>>>>>> When I do my tests, it seems that in case the error occurs the
>>>>>>> submitted RDF is not saved, and hence the RDF in CKAN is that one of the
>>>>>>> CKAN RDF generation process. Is that true? In that case the level of the
>>>>>>> message ERROR and the result by the API call return to it are not in sync.
>>>>>>> On an ERROR I expect that the api call return an error code. If that was a
>>>>>>> warn I can understand that the api call returns success.
>>>>>>>
>>>>>>> So the next actions should be taken:
>>>>>>>
>>>>>>> 1) make the above error message a proper validation result.
>>>>>>>     Note this solves also the problem of the meaningless error
>>>>>>> message. There is no reference at all in the message to know about which
>>>>>>> call/object it is. So the problem is unresolvable with this message. And
>>>>>>> putting CKAN in debug mode logging for that is not very sensible.
>>>>>>> 2) ensure that the handling of RDF (in rdfxml-format) inside the
>>>>>>> json is treated properly and not interpreted as an html encoding.
>>>>>>>
>>>>>>> best regards,
>>>>>>>
>>>>>>> Bert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/9/18 David Raznick <david.raznick at okfn.org>
>>>>>>>
>>>>>>>> Hello
>>>>>>>>
>>>>>>>> It looks like the error is because the wrong content type is sent
>>>>>>>> when posting to ckan in rdf2ckan.   Content-Type:application/json should be
>>>>>>>> sent otherwise CKAN will think it is urlencoded and therefore make the
>>>>>>>> final document encoded wrongly.
>>>>>>>>
>>>>>>>> This must have changed at somepoint, as the live db does not have
>>>>>>>> this issue and is likely what caused the unicode issues from happening too.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>> On 18 September 2013 16:16, Dimitrios Mexis <
>>>>>>>> dimitrios.mexis at tenforce.com> wrote:
>>>>>>>>
>>>>>>>>>  Hello David,
>>>>>>>>>
>>>>>>>>> we don't do any postprocessing for the data we send.
>>>>>>>>>
>>>>>>>>> It's pure rdf content as read from the file. So, we depend on CKAN
>>>>>>>>> how it will digest it.
>>>>>>>>> If we need to do some postprocessing, can you give clarification
>>>>>>>>> on the matter ?
>>>>>>>>>
>>>>>>>>> Also, does that have to do with the problem we faced with unicode
>>>>>>>>> exceptions from CKAN as well ?
>>>>>>>>>
>>>>>>>>> Kind regards
>>>>>>>>> Dimitrios
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 18/09/2013 17:12, David Raznick wrote:
>>>>>>>>>
>>>>>>>>>  Hello Bert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  There appears to be a lot of malformed rdf documents in the test
>>>>>>>>> system e.g
>>>>>>>>>
>>>>>>>>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>>>>>>>>
>>>>>>>>>  After some investigation this seems to due to the data send by
>>>>>>>>> rdf2ckan.  It appears that xml "&" are not being escaped correctly in some
>>>>>>>>> places.
>>>>>>>>>
>>>>>>>>>  These errors do not appear on the live db as far as I can see,
>>>>>>>>> could you please look into this.
>>>>>>>>>
>>>>>>>>>  Thanks
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Ecodp-dev mailing list
>>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Ecodp-dev mailing list
>>>>>>>> Ecodp-dev at lists.okfn.org
>>>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Bert Van Nuffelen
>>>>>>>
>>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>>> www.tenforce.be
>>>>>>>
>>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>>> Office: +32 (0)16 31 48 60
>>>>>>> Mobile:+32 479 06 24 26
>>>>>>> skype: bert.van.nuffelen
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Bert Van Nuffelen
>>>>>>
>>>>>> Semantic Technologies Software Architect at TenForce
>>>>>> www.tenforce.be
>>>>>>
>>>>>> Bert.Van.Nuffelen at tenforce.com
>>>>>> Office: +32 (0)16 31 48 60
>>>>>> Mobile:+32 479 06 24 26
>>>>>> skype: bert.van.nuffelen
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ecodp-dev mailing list
>>>>>> Ecodp-dev at lists.okfn.org
>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ecodp-dev mailing list
>>>>> Ecodp-dev at lists.okfn.org
>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Bert Van Nuffelen
>>>>
>>>> Semantic Technologies Software Architect at TenForce
>>>> www.tenforce.be
>>>>
>>>> Bert.Van.Nuffelen at tenforce.com
>>>> Office: +32 (0)16 31 48 60
>>>> Mobile:+32 479 06 24 26
>>>> skype: bert.van.nuffelen
>>>>
>>>
>>>
>>>
>>> --
>>> Bert Van Nuffelen
>>>
>>> Semantic Technologies Software Architect at TenForce
>>> www.tenforce.be
>>>
>>> Bert.Van.Nuffelen at tenforce.com
>>> Office: +32 (0)16 31 48 60
>>> Mobile:+32 479 06 24 26
>>> skype: bert.van.nuffelen
>>>
>>> _______________________________________________
>>> Ecodp-dev mailing list
>>> Ecodp-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>
>>>
>>
>
> _______________________________________________
> Ecodp-dev mailing list
> Ecodp-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>
>


-- 
Bert Van Nuffelen

Semantic Technologies Software Architect at TenForce
www.tenforce.be

Bert.Van.Nuffelen at tenforce.com
Office: +32 (0)16 31 48 60
Mobile:+32 479 06 24 26
skype: bert.van.nuffelen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/8c477a40/attachment.html>


More information about the ecodp-dev mailing list