[ckan-dev] Special HTML Characters in Ckan and Harvest/Spatial Harvest Plugins

Stefan Oderbolz stefan.oderbolz at liip.ch
Tue Oct 31 04:23:31 UTC 2017


I think this should be a ticket in the ckan repository. Do you have an idea
what code causes this error? It very much feels like a simple unit test
should be able to spot this error.

It definitely should be "correct" (or "as-is") in the database, and each
view has to decide how to display it.

On Oct 26, 2017 17:51, "Nathan Hook" <nhook at ucar.edu> wrote:

> A friendly 'bump' on this issue.  Should I create a bug report in the ckan
> github repository?
>
> Thank you for your time,
>
> Nathan
>
> On Wed, Oct 11, 2017 at 3:00 PM, Nathan Hook <nhook at ucar.edu> wrote:
>
>> Good Day,
>>
>> I know that html encoded characters is close to everyone's least favorite
>> topic (besides the handling of unicode characters), but we have run into a
>> problem and need some friendly knowledge or advice to figure out if we have
>> discovered a bug (or bugs) in ckan.
>>
>> We are using both the Harvest and Spatial Harvest plugins to import ISO
>> xml files into our ckan instance.
>>
>> Some of our iso xml files use the html encoded values for the less than
>> and greater than symbols:  < and >
>>
>> And we are seeing some strange behavior with these symbols.  Below are
>> all the use cases that we could come up with to show the behavior of these
>> characters based on the view and input (harvest or created via the ckan UI).
>>
>>
>> When importing < and &gt without a CDATA via xml and the harvester...
>>
>> Text in xml:
>> Here is the lessthan < this text should appear > the greaterthan
>> should be before this text.
>>
>> Dataset view:
>> Here is the lessthan the greaterthan sould be before this text.
>>
>> NOTE:
>> The 'this text should appear' is missing.
>>
>> API Json Output:
>> "Here is the lessthan < this text should appear > the greaterthan sould
>> be before this text."
>>
>> Search Page Output:
>> Here is the lessthan < this text should appear > the greaterthan sould be
>> before this text.
>>
>>
>> When importing < and &gt with a CDATA via xml and the harvester...
>>
>> Text in xml:
>> <![CDATA[Here is the lessthan < this text should appear > the
>> greaterthan should be before this text.]]>
>>
>> Dataset View:
>> Here is the lessthan < this text should appear > the greaterthan should
>> be before this text.
>>
>> API Json Output:
>> "Here is the lessthan < this text should appear > the greaterthan
>> should be before this text."
>>
>> Search Page Output:
>> Here is the lessthan < this text should appear > the greaterthan sould be
>> before this text.
>>
>>
>> When importing < and > with a CDATA via xml and the harvester...
>>
>> Text in xml:
>> <![CDATA[Here is the lessthan < this text should appear > the greaterthan
>> should be before this text.]]>
>>
>> Dataset view:
>> Here is the lessthan the greaterthan should be before this text.
>>
>> NOTE:
>> The 'this text should appear' is missing.
>>
>> API Json Output:
>> "Here is the lessthan < this text should appear > the greaterthan should
>> be before this text."
>>
>> Search Page Output:
>> Here is the lessthan < this text should appear > the greaterthan sould be
>> before this text.
>>
>>
>> When hand creating a record with < and > via the UI...
>>
>> Text entered:
>> Here is the html encoded lessthan < this text should appear > the
>> html encoded greaterthan should be before this text.
>>
>> Dataset view:
>> Here is the html encoded lessthan < this text should appear > the html
>> encoded greaterthan should be before this text.
>>
>> API Json Output:
>> "Here is the html encoded lessthan < this text should appear > the
>> html encoded greaterthan should be before this text."
>>
>> Search Page Output:
>> Here is the html encoded lessthan < this text should appear > the html
>> encoded greaterthan should be before this text.
>>
>>
>> When hand creating a record with < and > via the UI...
>>
>> Text entered:
>> Here is the lessthan < this text should appear > the greaterthan sould be
>> before this text.
>>
>> Dataset view:
>> Here is the lessthan the greaterthan sould be before this text.
>>
>> NOTE:
>> The 'this text should appear' is missing.
>>
>> API Json Output:
>> "Here is the lessthan < this text should appear > the greaterthan sould
>> be before this text."
>>
>> Search Page Output:
>> Here is the lessthan < this text should appear > the greaterthan sould be
>> before this text.
>>
>>
>>
>> Those are all the use cases we could come up with to show the different
>> ways that <, >, <, and > are used throughout our ckan installation.
>>
>>
>> From this developer's/user's viewpoint.
>>
>> I feel that it would be best if ckan would store the < and > in the
>> database and then use view/controller behavior to translate those values to
>> html encoded characters when being used on an html page.
>>
>> Not always easy to do, but it would allow us to stop placing html
>> characters in our iso xml.  Which I think is a big no no.
>>
>> It would also stop storing html encoded characters in the database and
>> having those characters bleeding out to other views (like the api json
>> view) of ckan.
>>
>> Is there something that I am missing or does this seem like a bug/issue
>> with ckan?
>>
>>
>> Thank you for your time and knowledge.  They are both greatly appreciated.
>>
>> Regards,
>>
>> Nathan
>>
>>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20171031/bbdd6e04/attachment-0003.html>


More information about the ckan-dev mailing list