[ckan-dev] Special HTML Characters in Ckan and Harvest/Spatial Harvest Plugins

Nathan Hook nhook at ucar.edu
Wed Oct 11 21:00:05 UTC 2017


Good Day,

I know that html encoded characters is close to everyone's least favorite
topic (besides the handling of unicode characters), but we have run into a
problem and need some friendly knowledge or advice to figure out if we have
discovered a bug (or bugs) in ckan.

We are using both the Harvest and Spatial Harvest plugins to import ISO xml
files into our ckan instance.

Some of our iso xml files use the html encoded values for the less than and
greater than symbols:  < and >

And we are seeing some strange behavior with these symbols.  Below are all
the use cases that we could come up with to show the behavior of these
characters based on the view and input (harvest or created via the ckan UI).


When importing < and &gt without a CDATA via xml and the harvester...

Text in xml:
Here is the lessthan < this text should appear > the greaterthan
should be before this text.

Dataset view:
Here is the lessthan the greaterthan sould be before this text.

NOTE:
The 'this text should appear' is missing.

API Json Output:
"Here is the lessthan < this text should appear > the greaterthan sould be
before this text."

Search Page Output:
Here is the lessthan < this text should appear > the greaterthan sould be
before this text.


When importing < and &gt with a CDATA via xml and the harvester...

Text in xml:
<![CDATA[Here is the lessthan < this text should appear > the
greaterthan should be before this text.]]>

Dataset View:
Here is the lessthan < this text should appear > the greaterthan should be
before this text.

API Json Output:
"Here is the lessthan < this text should appear > the greaterthan
should be before this text."

Search Page Output:
Here is the lessthan < this text should appear > the greaterthan sould be
before this text.


When importing < and > with a CDATA via xml and the harvester...

Text in xml:
<![CDATA[Here is the lessthan < this text should appear > the greaterthan
should be before this text.]]>

Dataset view:
Here is the lessthan the greaterthan should be before this text.

NOTE:
The 'this text should appear' is missing.

API Json Output:
"Here is the lessthan < this text should appear > the greaterthan should be
before this text."

Search Page Output:
Here is the lessthan < this text should appear > the greaterthan sould be
before this text.


When hand creating a record with < and > via the UI...

Text entered:
Here is the html encoded lessthan < this text should appear > the
html encoded greaterthan should be before this text.

Dataset view:
Here is the html encoded lessthan < this text should appear > the html
encoded greaterthan should be before this text.

API Json Output:
"Here is the html encoded lessthan < this text should appear > the
html encoded greaterthan should be before this text."

Search Page Output:
Here is the html encoded lessthan < this text should appear > the html
encoded greaterthan should be before this text.


When hand creating a record with < and > via the UI...

Text entered:
Here is the lessthan < this text should appear > the greaterthan sould be
before this text.

Dataset view:
Here is the lessthan the greaterthan sould be before this text.

NOTE:
The 'this text should appear' is missing.

API Json Output:
"Here is the lessthan < this text should appear > the greaterthan sould be
before this text."

Search Page Output:
Here is the lessthan < this text should appear > the greaterthan sould be
before this text.



Those are all the use cases we could come up with to show the different
ways that <, >, <, and > are used throughout our ckan installation.


>From this developer's/user's viewpoint.

I feel that it would be best if ckan would store the < and > in the
database and then use view/controller behavior to translate those values to
html encoded characters when being used on an html page.

Not always easy to do, but it would allow us to stop placing html
characters in our iso xml.  Which I think is a big no no.

It would also stop storing html encoded characters in the database and
having those characters bleeding out to other views (like the api json
view) of ckan.

Is there something that I am missing or does this seem like a bug/issue
with ckan?


Thank you for your time and knowledge.  They are both greatly appreciated.

Regards,

Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20171011/a4f0cbce/attachment-0002.html>


More information about the ckan-dev mailing list