You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2011/06/21 17:47:16 UTC

DO NOT REPLY [Bug 51408] New: String.getBytes() and new String(byte[]) use default charset - may cause problems in some Locales

https://issues.apache.org/bugzilla/show_bug.cgi?id=51408

             Bug #: 51408
           Summary: String.getBytes() and new String(byte[]) use default
                    charset - may cause problems in some Locales
           Product: Tomcat 7
           Version: trunk
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Catalina
        AssignedTo: dev@tomcat.apache.org
        ReportedBy: sebb@apache.org
    Classification: Unclassified


There seem to be rather a lot of instances where Strings and bytes are
cconverted using the default charset.

Since the default charset is unknown, its behaviour cannot be relied on.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51408] String.getBytes() and new String(byte[]) use default charset - may cause problems in some Locales

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51408

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

--- Comment #3 from Mark Thomas <ma...@apache.org> 2011-06-21 16:48:00 UTC ---
What exactly needs fixing? I don't see a report from a user reporting an error.
Neither do I see any analysis that demonstrates how an error may occur.

Given that the code has remained unchanged for many years without any reports
of errors from a world-wide user base, I find it unlikely there is actually
anything here to fix.

There is a greater risk, in my view, of a change introducing a regression. "If
it isn't broken, don't fix it."

I have changed this to enhancement. If there has been no activity when I next
come back to this, it is likely to get re-closed.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51408] String.getBytes() and new String(byte[]) use default charset - may cause problems in some Locales

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51408

--- Comment #4 from Konstantin Kolinko <kn...@gmail.com> 2011-06-21 23:57:20 UTC ---
(In reply to comment #0)
> There seem to be rather a lot of instances where Strings and bytes are
> cconverted using the default charset.

Please be specific!

Searching through the TC7 source code I found

=====================================
1) 0 wrong calls to String.getBytes()

That is
1. Disregarding the test classes - have not looked through them at all,
2. Disregarding some never called test code in
o.a.c.tribes.membership.Constants.main()
3. Disregading legit calls to this method, when encoding is configurable and
the default encoding is used as fallback.

Nothing to do there.

=====================================
2) 2 places where new String(byte[]) calls might be fixed:

2.1. In SSIFilter#doFilter(), the following line:

   res.getWriter().write(new String(bytes));

It is some fallback code, when there is no OutputStream available, but only a
writer. Is it possible to get encoding of the writer there, e.g. from
content-type header?

Patches are welcome. It is an odd and rare use case though.

2.2. In JNDIRealm#getAttributeValue(String,Attributes)

  valueString = new String((byte[]) value);

I do not know JNDI API enough to comment here.

The rest are fallbacks, debug code etc.

=====================================
3) new String(byte[],int,int):

3.1. WebappClassLoader#findResourceInternal()

  String str = new String(binaryContent,0,pos);

That is the correct code and should stay as is. See r303915.

The rest are fallbacks, debug code etc.

=====================================
4) new FileWriter(String) or new FileWriter(String,[boolean]) instead of
OutputStreamWriter + FileOutputStream

Is used in one place only:

4.1. in AccessLogValve.open()

This can be fixed - to add "encoding" configuration option for the valve. I
think there was an enhancement request for that.

=====================================

That is all.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51408] String.getBytes() and new String(byte[]) use default charset - may cause problems in some Locales

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51408

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID

--- Comment #1 from Mark Thomas <ma...@apache.org> 2011-06-21 15:52:58 UTC ---
It has been like this for many years with zero issues reported by users.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51408] String.getBytes() and new String(byte[]) use default charset - may cause problems in some Locales

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51408

Christopher Schultz <ch...@christopherschultz.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |

--- Comment #2 from Christopher Schultz <ch...@christopherschultz.net> 2011-06-21 16:28:53 UTC ---
I object to marking this bug as INVALID, since I think it's something that
should be fixed.

How about marking it "minor" and fixing it lazily. Just being explicit about
the character set (ISO-8859-1? UTF-8?) should be easy to do and is unlikely to
affect anyone.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51408] String.getBytes() and new String(byte[]) use default charset - may cause problems in some Locales

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51408

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #6 from Mark Thomas <ma...@apache.org> 2012-01-19 15:05:26 UTC ---
I'm reluctant to change this for 7.0.x since there is a risk it could break
something and no-one has reported a bug as a result of this.

I have changed these to explicitly use ISO-8859-1 for 8.0.x onwards. There are
some places where UTF-8 may be a better choice in the future as standards move
towards UTF-8. We can change these as users request it.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 51408] String.getBytes() and new String(byte[]) use default charset - may cause problems in some Locales

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51408

--- Comment #5 from Sebb <se...@apache.org> 2011-06-22 00:19:06 UTC ---
Sorry, I see now that the subject was misleading.

I was actually referring to the calls to getBytes(Charset.defaultCharset())
which were added recently in r1138019.

There are lots more of those, for example:

DigestAuthenticator.generateNonce
JNDIRealm.compareCredentials

etc.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org