[ckan-dev] Escaping strings in N3 output

Joshua Tauberer tauberer+consulting at govtrack.us
Tue Sep 18 17:52:45 UTC 2012


Thanks, Toby.

I think it might just be a good idea to drop support for N3, but....

It should be safe-ish to \-escape backslashes, quotes, CRs, and LFs.

These seem to be required in "..."-type literals according to the latest 
N3 draft [1]. In """..."""-style literals less escaping may be 
necessary, but it is all still allowed as far as I see. While N3 permits 
escaping of other characters, the Turtle spec only allows the four 
escapes I listed and \t, which doesn't seem to ever be necessary. [2] In 
order to support Turtle output, I'd suggest not escaping more than 
necessary.

Strings can not only appear quoted but also inside IRIs like 
<http://www.example.org/>. Assuming the string is a valid IRI then it 
will never need escaping since a valid IRI cannot contain >'s. And since 
it cannot contain quotes, newlines, or backslashes either, it would be 
safe to apply the same escaping rules as above. Of course, if someone 
has an invalid URI containing one of those characters, then the escaping 
rules above would destroy the value since the escaping for "..." doesn't 
apply in <...> (at least in N3; in Turtle it applies but with slightly 
different rules).

And all that is my best guess. Figuring out what's a valid string where 
is hard!

[1] http://www.w3.org/TeamSubmission/n3/
[2] http://www.w3.org/TeamSubmission/turtle/

-- 
- Joshua Tauberer
- http://razor.occams.info



On 09/18/2012 11:52 AM, Toby Dacre wrote:
>
>
> On 18 September 2012 16:09, Joshua Tauberer 
> <tauberer+consulting at govtrack.us 
> <mailto:tauberer+consulting at govtrack.us>> wrote:
>
>     Hi again,
>
>     I was able to edit the package .rdf RDF/XML output template to add
>     new predicates. For instance, we store URIs in an extras field to
>     reliably identify the author of a dataset.
>
>       example:
>     http://hub.healthdata.gov/dataset/2008-basic-stand-alone-carrier.rdf
>          our template adds dct:creator / dc:publisher nodes
>
>     Now I'm trying to bring the N3 template up to date with the
>     RDF/XML template changes. However I'm finding that strings aren't
>     escaped in the output, so embedded quotes in extras field values
>     (for example) bust the file if I have something like this in the
>     template:
>
>        dc:subject "${extra_dict.get('value','')}";
>
>     because it may generate
>
>        dc:subject "quote chars "here" break N3 syntax";
>
>
> Thanks for this I can try to get a fix in for 1.8 what is your 
> understanding of the rules? should a " just be replaced by \" if so I 
> can create a simple helper function to resolve this issue.  In the 
> short term it looks like you may be able to use dc:subject """ now " 
> is ok """  I am no expert in N3
>
> Toby
>
>
>     Is this a bug? A limitation? I know that Genshi doesn't know what
>     N3 is, so I'm not entirely expecting CKAN to be able to handle
>     this easily. It's fine if it's not really supported --- this is
>     just icing on the cake for us. The RDF/XML works fine for us.
>
>     -- 
>     - Joshua Tauberer
>     -http://razor.occams.info
>
>
>     _______________________________________________
>     ckan-dev mailing list
>     ckan-dev at lists.okfn.org <mailto:ckan-dev at lists.okfn.org>
>     http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120918/5d36277c/attachment-0001.html>


More information about the ckan-dev mailing list