[ECODP-dev] Malformed rdf

Bert Van Nuffelen bert.van.nuffelen at tenforce.com
Fri Nov 8 10:47:01 UTC 2013


Hi,

one addition, I run overnight with the same RDF2CKAN and the same ESTAT
package an upload on the release 09.00.0x version and there in the CKAN
logs the above message is not  present.

But when I do the synchronisation with virtuoso then I get exceptions on
parsing the RDF files as they have been converted by CKAN.

So it seems that
   * in release 09.00.0x the check on this issue was not present at all,
   * in release 01.00.0x the check is there, but is silently replaces it
with an empty value

In both releases the accept-header is not bound to the desired execution.

kind regards,

Bert




2013/11/7 Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>

> Dear all,
>
> We have again this problem. This is related to the issue of ODP-294
>
> in the apache logs:
>
> [Tue Nov 05 21:16:57 2013] [error] 2013-11-05 21:16:57,306 ERROR
> [ckanext.ecportal.rdfutil] EntityRef: expecting ';', line 39, column 103
>
> in the CKAN logs
>
> 2013-11-07 22:14:04,975 ERROR [ckanext.ecportal.rdfutil] EntityRef:
> expecting ';', line 39, column 102
>
> As this is silently in the logs instead of a _proper_ validation error
> like:
>
> 2013-11-07 22:09:21,996 ERROR [ckan.controllers.api] Validation error:
> '{\'keyword_string\': [u\'Tag "inter
> national trade,trade statistics" must be alphanumeric characters or
> symbols: -_.\'], \'__type\': \'Validati
> on Error\'}'
>
> rdf2ckan can only conclude it has inserted the data correctly while it is
> not.
> So the outcome of rdf2ckan is hence untrust whorthy. For that reason this
> issue remained unseen by TF development.
>
> Secondly I am concerned about the fact if the above configuration solution
> described by David is working.
> It seems that the setting of the header to "application/json" does not
> result in the correct handling by CKAN.
> Here under is the extract of the java code showing that we sent the
> content-type as requested.
>
> private static final String CONTENT_JSON = "application/json";
>
> public Map<String, Object> post(String json, String method) {
>         if (json.isEmpty() || method.isEmpty())
>             return null;
>         DefaultHttpClient httpClient = new DefaultHttpClient();
>
>         HttpPost httpPost = requestCredentials(method);
>         json = json.replaceAll(";", "");
>         httpPost.setEntity(streamify(json, CONTENT_JSON));
>        ...
>
>  public InputStreamEntity streamify(String json, String contentType) {
>         json = json.replaceAll(";", "");
>         InputStreamEntity inputStreamEntity = new InputStreamEntity(new
> ByteArrayInputStream(json.getBytes(Charset.forName("UTF-8"))),
> json.getBytes().length);
>         inputStreamEntity.setContentType(contentType);
>         return inputStreamEntity;
>     }
>
>
> Note that the submitted RDF is all valid rdf and XML in the request body:
> it passes through all XML & RDF validations (e.g. rapper).
>
> When I do my tests, it seems that in case the error occurs the submitted
> RDF is not saved, and hence the RDF in CKAN is that one of the CKAN RDF
> generation process. Is that true? In that case the level of the message
> ERROR and the result by the API call return to it are not in sync. On an
> ERROR I expect that the api call return an error code. If that was a warn I
> can understand that the api call returns success.
>
> So the next actions should be taken:
>
> 1) make the above error message a proper validation result.
>     Note this solves also the problem of the meaningless error message.
> There is no reference at all in the message to know about which call/object
> it is. So the problem is unresolvable with this message. And putting CKAN
> in debug mode logging for that is not very sensible.
> 2) ensure that the handling of RDF (in rdfxml-format) inside the json is
> treated properly and not interpreted as an html encoding.
>
> best regards,
>
> Bert
>
>
>
>
> 2013/9/18 David Raznick <david.raznick at okfn.org>
>
>> Hello
>>
>> It looks like the error is because the wrong content type is sent when
>> posting to ckan in rdf2ckan.   Content-Type:application/json should be sent
>> otherwise CKAN will think it is urlencoded and therefore make the final
>> document encoded wrongly.
>>
>> This must have changed at somepoint, as the live db does not have this
>> issue and is likely what caused the unicode issues from happening too.
>>
>> Thanks
>>
>> David
>>
>>
>> On 18 September 2013 16:16, Dimitrios Mexis <dimitrios.mexis at tenforce.com
>> > wrote:
>>
>>>  Hello David,
>>>
>>> we don't do any postprocessing for the data we send.
>>>
>>> It's pure rdf content as read from the file. So, we depend on CKAN how
>>> it will digest it.
>>> If we need to do some postprocessing, can you give clarification on the
>>> matter ?
>>>
>>> Also, does that have to do with the problem we faced with unicode
>>> exceptions from CKAN as well ?
>>>
>>> Kind regards
>>> Dimitrios
>>>
>>>
>>> On 18/09/2013 17:12, David Raznick wrote:
>>>
>>>  Hello Bert
>>>
>>>
>>>  There appears to be a lot of malformed rdf documents in the test
>>> system e.g
>>>
>>> http://212.71.25.148/en/data/dataset/BrvXA5sZQ1AFKgE4Pktw.rdf
>>>
>>>  After some investigation this seems to due to the data send by
>>> rdf2ckan.  It appears that xml "&" are not being escaped correctly in some
>>> places.
>>>
>>>  These errors do not appear on the live db as far as I can see, could
>>> you please look into this.
>>>
>>>  Thanks
>>>
>>> David
>>>
>>>
>>> _______________________________________________
>>> Ecodp-dev mailing listEcodp-dev at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> Ecodp-dev mailing list
>>> Ecodp-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>>
>>>
>>
>> _______________________________________________
>> Ecodp-dev mailing list
>> Ecodp-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>>
>>
>
>
> --
> Bert Van Nuffelen
>
> Semantic Technologies Software Architect at TenForce
> www.tenforce.be
>
> Bert.Van.Nuffelen at tenforce.com
> Office: +32 (0)16 31 48 60
> Mobile:+32 479 06 24 26
> skype: bert.van.nuffelen
>



-- 
Bert Van Nuffelen

Semantic Technologies Software Architect at TenForce
www.tenforce.be

Bert.Van.Nuffelen at tenforce.com
Office: +32 (0)16 31 48 60
Mobile:+32 479 06 24 26
skype: bert.van.nuffelen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131108/4bda7aa3/attachment.html>


More information about the ecodp-dev mailing list