[ECODP-dev] Malformed rdf
Bert Van Nuffelen
bert.van.nuffelen at tenforce.com
Tue Nov 12 08:08:24 UTC 2013
Hi John,
I forgot to add the message that I attached example json's to the previous
respons.
Bert
2013/11/12 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
> Hi John,
>
>
> 2013/11/11 John Glover <john.glover at okfn.org>
>
>> Hi Bert,
>>
>> The last time this issue came up, the only way that we could recreate the
>> problem was when making a request without setting the content type to
>> 'application/json'. As you are now doing this, could you please send us an
>> example of the JSON that you are sending to CKAN in a failing request so
>> that we can investigate further.
>>
>>
>
>
>> > When I do my tests, it seems that in case the error occurs the
>> submitted RDF is not saved, and hence the RDF in CKAN is
>> > that one of the CKAN RDF generation process. Is that true?
>>
>> Yes this is correct.
>>
>> > In that case the level of the message ERROR and the result by the API
>> call return to it are not in sync. On an ERROR I
>> > expect that the api call return an error code. If that was a warn I can
>> understand that the api call returns success.
>>
>> We could of course return a validation error here (although this has not
>> been requested before). I believe the thinking was that it was best to save
>> the data and ignore this error as CKAN can generate RDF, but this logic
>> could be changed.
>>
>
> I understand the reasoning, but it results in an incorrect end-result. In
> this case it even has hidden a communication error between RDF2CKAN and
> CKAN. Since the api provides a correct feedback mechanism I would like to
> rely on that.
>
>
>
>
>
>
>>
>> Regards,
>> John
>>
>>
>> On 8 November 2013 11:47, Bert Van Nuffelen <
>> bert.van.nuffelen at tenforce.com> wrote:
>>
>>> Hi,
>>>
>>> one addition, I run overnight with the same RDF2CKAN and the same ESTAT
>>> package an upload on the release 09.00.0x version and there in the CKAN
>>> logs the above message is not present.
>>>
>>> But when I do the synchronisation with virtuoso then I get exceptions on
>>> parsing the RDF files as they have been converted by CKAN.
>>>
>>> So it seems that
>>> * in release 09.00.0x the check on this issue was not present at all,
>>> * in release 01.00.0x the check is there, but is silently replaces it
>>> with an empty value
>>>
>>> In both releases the accept-header is not bound to the desired execution.
>>>
>>> kind regards,
>>>
>>> Bert
>>>
>>>
>>>
>>>
>>> 2013/11/7 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>
>>>
>>>> Dear all,
>>>>
>>>> We have again this problem. This is related to the issue of ODP-294
>>>>
>>>> in the apache logs:
>>>>
>>>> [Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306 ERROR
>>>> [ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column 103
>>>>
>>>> in the CKAN logs
>>>>
>>>> 2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil] EntityRef:
>>>> expecting ';', line 39, column 102
>>>>
>>>> As this is silently in the logs instead of a _proper_ validation error
>>>> like:
>>>>
>>>> 2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation error:
>>>> '{\'keyword_string\': [u\'Tag "inter
>>>> national trade,trade statistics" must be alphanumeric characters or
>>>> symbols: -_.\'], \'__type\': \'Validati
>>>> on Error\'}'
>>>>
>>>> rdf2ckan can only conclude it has inserted the data correctly while it
>>>> is not.
>>>> So the outcome of rdf2ckan is hence untrust whorthy. For that reason
>>>> this issue remained unseen by TF development.
>>>>
>>>> Secondly I am concerned about the fact if the above configuration
>>>> solution described by David is working.
>>>> It seems that the setting of the header to "application/json" does not
>>>> result in the correct handling by CKAN.
>>>> Here under is the extract of the java code showing that we sent the
>>>> content-type as requested.
>>>>
>>>> private static final String CONTENT_JSON = "application/json";
>>>>
>>>> public Map<String, Object> post(String json, String method) {
>>>> if (json.isEmpty() || method.isEmpty())
>>>> return null;
>>>> DefaultHttpClient httpClient = new DefaultHttpClient();
>>>>
>>>> HttpPost httpPost = requestCredentials(method);
>>>> json = json.replaceAll(";", "");
>>>> httpPost.setEntity(streamify(json, CONTENT_JSON));
>>>> ...
>>>>
>>>> public InputStreamEntity streamify(String json, String contentType) {
>>>> json = json.replaceAll(";", "");
>>>> InputStreamEntity inputStreamEntity = new InputStreamEntity(new
>>>> ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
>>>> json.getBytes().length);
>>>> inputStreamEntity.setContentType(contentType);
>>>> return inputStreamEntity;
>>>> }
>>>>
>>>>
>>>> Note that the submitted RDF is all valid rdf and XML in the request
>>>> body: it passes through all XML & RDF validations (e.g. rapper).
>>>>
>>>> When I do my tests, it seems that in case the error occurs the
>>>> submitted RDF is not saved, and hence the RDF in CKAN is that one of the
>>>> CKAN RDF generation process. Is that true? In that case the level of the
>>>> message ERROR and the result by the API call return to it are not in sync.
>>>> On an ERROR I expect that the api call return an error code. If that was a
>>>> warn I can understand that the api call returns success.
>>>>
>>>> So the next actions should be taken:
>>>>
>>>> 1) make the above error message a proper validation result.
>>>> Note this solves also the problem of the meaningless error message.
>>>> There is no reference at all in the message to know about which call/object
>>>> it is. So the problem is unresolvable with this message. And putting CKAN
>>>> in debug mode logging for that is not very sensible.
>>>> 2) ensure that the handling of RDF (in rdfxml-format) inside the json
>>>> is treated properly and not interpreted as an html encoding.
>>>>
>>>> best regards,
>>>>
>>>> Bert
>>>>
>>>>
>>>>
>>>>
>>>> 2013/9/18 David Raznick <david.raznick at okfn.org>
>>>>
>>>>> Hello
>>>>>
>>>>> It looks like the error is because the wrong content type is sent when
>>>>> posting to ckan in rdf2ckan. Content-Type:application/json should be sent
>>>>> otherwise CKAN will think it is urlencoded and therefore make the final
>>>>> document encoded wrongly.
>>>>>
>>>>> This must have changed at somepoint, as the live db does not have this
>>>>> issue and is likely what caused the unicode issues from happening too.
>>>>>
>>>>> Thanks
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On 18 September 2013 16:16, Dimitrios Mexis <
>>>>> dimitrios.mexis at tenforce.com> wrote:
>>>>>
>>>>>> Hello David,
>>>>>>
>>>>>> we don't do any postprocessing for the data we send.
>>>>>>
>>>>>> It's pure rdf content as read from the file. So, we depend on CKAN
>>>>>> how it will digest it.
>>>>>> If we need to do some postprocessing, can you give clarification on
>>>>>> the matter ?
>>>>>>
>>>>>> Also, does that have to do with the problem we faced with unicode
>>>>>> exceptions from CKAN as well ?
>>>>>>
>>>>>> Kind regards
>>>>>> Dimitrios
>>>>>>
>>>>>>
>>>>>> On 18/09/2013 17:12, David Raznick wrote:
>>>>>>
>>>>>> Hello Bert
>>>>>>
>>>>>>
>>>>>> There appears to be a lot of malformed rdf documents in the test
>>>>>> system e.g
>>>>>>
>>>>>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>>>>>
>>>>>> After some investigation this seems to due to the data send by
>>>>>> rdf2ckan. It appears that xml "&" are not being escaped correctly in some
>>>>>> places.
>>>>>>
>>>>>> These errors do not appear on the live db as far as I can see,
>>>>>> could you please look into this.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ecodp-dev mailing list
>>>>>> Ecodp-dev at lists.okfn.org
>>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ecodp-dev mailing list
>>>>> Ecodp-dev at lists.okfn.org
>>>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Bert Van Nuffelen
>>>>
>>>> Semantic Technologies Software Architect at TenForce
>>>> www.tenforce.be
>>>>
>>>> Bert.Van.Nuffelen at tenforce.com
>>>> Office: +32 (0)16 31 48 60
>>>> Mobile:+32 479 06 24 26
>>>> skype: bert.van.nuffelen
>>>>
>>>
>>>
>>>
>>> --
>>> Bert Van Nuffelen
>>>
>>> Semantic Technologies Software Architect at TenForce
>>> www.tenforce.be
>>>
>>> Bert.Van.Nuffelen at tenforce.com
>>> Office: +32 (0)16 31 48 60
>>> Mobile:+32 479 06 24 26
>>> skype: bert.van.nuffelen
>>>
>>> _______________________________________________
>>> Ecodp-dev mailing list
>>> Ecodp-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>
>>>
>>
>> _______________________________________________
>> Ecodp-dev mailing list
>> Ecodp-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>
>>
>
>
> --
> Bert Van Nuffelen
>
> Semantic Technologies Software Architect at TenForce
> www.tenforce.be
>
> Bert.Van.Nuffelen at tenforce.com
> Office: +32 (0)16 31 48 60
> Mobile:+32 479 06 24 26
> skype: bert.van.nuffelen
>
--
Bert Van Nuffelen
Semantic Technologies Software Architect at TenForce
www.tenforce.be
Bert.Van.Nuffelen at tenforce.com
Office: +32 (0)16 31 48 60
Mobile:+32 479 06 24 26
skype: bert.van.nuffelen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131112/b5bed2c4/attachment.html>
More information about the ecodp-dev
mailing list