You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/01/27 23:05:00 UTC

[jira] [Work logged] (KNOX-2202) Knox should use UTF-8 as default encoding instead of ISO-8859-1

     [ https://issues.apache.org/jira/browse/KNOX-2202?focusedWorklogId=377900&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377900 ]

ASF GitHub Bot logged work on KNOX-2202:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jan/20 23:04
            Start Date: 27/Jan/20 23:04
    Worklog Time Spent: 10m 
      Work Description: risdenk commented on pull request #244: KNOX-2202 - Knox should use UTF-8 as default encoding instead of ISO-8859-1
URL: https://github.com/apache/knox/pull/244
 
 
   ## What changes were proposed in this pull request?
   
   Using UTF-8 for default content encoding instead of ISO-8859-1. 
   
   ## How was this patch tested?
   
   * Added unit tests showing failure before change
   * Checked that failing unit tests pass after change
   * Manually verified that changes fix Oozie UTF-8 XML issue
   * `mvn -T.75C verify -Ppackage,release -Dshellcheck`
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 377900)
    Remaining Estimate: 0h
            Time Spent: 10m

> Knox should use UTF-8 as default encoding instead of ISO-8859-1
> ---------------------------------------------------------------
>
>                 Key: KNOX-2202
>                 URL: https://issues.apache.org/jira/browse/KNOX-2202
>             Project: Apache Knox
>          Issue Type: Bug
>            Reporter: Kevin Risden
>            Assignee: Kevin Risden
>            Priority: Major
>             Fix For: 1.4.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If you send in an XML doc with unicode characters you get the following:
> {code:java}
> ...
> Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
>  at [row,col {unknown-source}]: [1,0]
>         at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:687)
>         at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2220)
>         at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2126)
>         at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1181)
>         at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:255)
>         at org.apache.knox.gateway.filter.rewrite.impl.xml.XmlFilterReader.read(XmlFilterReader.java:122)
>         ... 133 more
> {code}
> Knox default falls back to ISO-8859-1 encoding instead of UTF-8.
> I did some research and the default encoding specification has changed over the years. It looks like ISO-8859-1 was the default historically, but currently it should be UTF-8.
> https://stackoverflow.com/questions/58337900/how-to-change-default-character-encoding-configuration-in-jetty-app-server-from
> There are very few cases where ISO-8859-1 and UTF-8 are incompatible and it would be outside the default ASCII charset.
> I also found that the default XML encoding is UTF-8 so even if we don't change all the defaults to UTF-8 we should do so for XML.
> https://www.w3schools.com/xml/xml_syntax.asp



--
This message was sent by Atlassian Jira
(v8.3.4#803005)