You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2010/07/16 10:19:06 UTC

DO NOT REPLY [Bug 49599] New: Comment.setAuthor does not encode multi-byte characters (Chinese) well

https://issues.apache.org/bugzilla/show_bug.cgi?id=49599

           Summary: Comment.setAuthor does not encode multi-byte
                    characters (Chinese) well
           Product: POI
           Version: 3.7-dev
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HSSF
        AssignedTo: dev@poi.apache.org
        ReportedBy: lovetide@qq.com


Created an attachment (id=25765)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=25765)
Test class to show this issue

When I setAuthor() for Comment with some Chinese characters, these Chinese
characters will transformed to ? when open the file in Microsoft Excel.

Please see the attachment.

My environment: 
- JDK 6 Update 21 
- POI 3.7 Beta 1 
- Windows XP Professional (Simplified Chinese) SP3 
- Microsoft Office Excel 2003 (11.8324.8324) SP3 

for more details, please visit this thread:
http://old.nabble.com/charset-encoding-of-Comment.setAuthor%28%29-ts29115649.html

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode multi-byte characters (Chinese) well

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599

--- Comment #3 from André-John Mas <an...@gmail.com> 2010-07-16 09:22:54 EDT ---
Created an attachment (id=25769)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=25769)
Patch for defaulting to multi-byte

Patch for defaulting to multi-byte

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode multi-byte characters (Chinese) well

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599

Nick Burch <ni...@alfresco.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #5 from Nick Burch <ni...@alfresco.com> 2010-07-16 09:58:45 EDT ---
Fixed in r964800, along with a unit test. Thanks for your investigations and
patch!

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode multi-byte characters (Chinese) well

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599

--- Comment #1 from André-John Mas <an...@gmail.com> 2010-07-16 07:54:44 EDT ---
Just to sum up the thread:

The serialize() method in org.apache.poi.hssf.record.NoteRecord is not calling
the StringUtil.putUnicodeLE() method, because the field_5_hasMultibyte instance
variable is false, even when the author field contains double-byte characters.
In fact other than when a file is read field_5_hasMultibyte is never set to
true.

Two possible solutions:
 - add logic to work out if we have non-latin characters, since the issue is
not just affecting double-byte characters
 - set the field_5_hasMultibyte variable to be true and always write out
unicode characters, unless there is a usage scenario this could break.

I tested on MacOS X 10.6.4 and used Excel 2008 to see the result. Changing the
variable to true resulted in Chinese text to appear correctly for the author.

BTW We should probably be extending the unit tests for ensuring non-latin
characters are getting stored properly.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode multi-byte characters (Chinese) well

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599

--- Comment #2 from André-John Mas <an...@gmail.com> 2010-07-16 09:22:09 EDT ---
Created an attachment (id=25768)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=25768)
Patch test for unicode on setAuthor()

Added patch that tests for unicode on setAuthor()

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode multi-byte characters (Chinese) well

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599

--- Comment #4 from Nick Burch <ni...@alfresco.com> 2010-07-16 09:44:03 EDT ---
Thanks for investigating this

The usual way in most records is to update the multibyte flag when updating the
string

I'll make this change, and write a unit test for it shortly

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org