You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2010/07/16 10:19:06 UTC
DO NOT REPLY [Bug 49599] New: Comment.setAuthor does not encode
multi-byte characters (Chinese) well
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599
Summary: Comment.setAuthor does not encode multi-byte
characters (Chinese) well
Product: POI
Version: 3.7-dev
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: normal
Priority: P2
Component: HSSF
AssignedTo: dev@poi.apache.org
ReportedBy: lovetide@qq.com
Created an attachment (id=25765)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=25765)
Test class to show this issue
When I setAuthor() for Comment with some Chinese characters, these Chinese
characters will transformed to ? when open the file in Microsoft Excel.
Please see the attachment.
My environment:
- JDK 6 Update 21
- POI 3.7 Beta 1
- Windows XP Professional (Simplified Chinese) SP3
- Microsoft Office Excel 2003 (11.8324.8324) SP3
for more details, please visit this thread:
http://old.nabble.com/charset-encoding-of-Comment.setAuthor%28%29-ts29115649.html
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode
multi-byte characters (Chinese) well
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599
--- Comment #3 from André-John Mas <an...@gmail.com> 2010-07-16 09:22:54 EDT ---
Created an attachment (id=25769)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=25769)
Patch for defaulting to multi-byte
Patch for defaulting to multi-byte
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode
multi-byte characters (Chinese) well
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599
Nick Burch <ni...@alfresco.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
--- Comment #5 from Nick Burch <ni...@alfresco.com> 2010-07-16 09:58:45 EDT ---
Fixed in r964800, along with a unit test. Thanks for your investigations and
patch!
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode
multi-byte characters (Chinese) well
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599
--- Comment #1 from André-John Mas <an...@gmail.com> 2010-07-16 07:54:44 EDT ---
Just to sum up the thread:
The serialize() method in org.apache.poi.hssf.record.NoteRecord is not calling
the StringUtil.putUnicodeLE() method, because the field_5_hasMultibyte instance
variable is false, even when the author field contains double-byte characters.
In fact other than when a file is read field_5_hasMultibyte is never set to
true.
Two possible solutions:
- add logic to work out if we have non-latin characters, since the issue is
not just affecting double-byte characters
- set the field_5_hasMultibyte variable to be true and always write out
unicode characters, unless there is a usage scenario this could break.
I tested on MacOS X 10.6.4 and used Excel 2008 to see the result. Changing the
variable to true resulted in Chinese text to appear correctly for the author.
BTW We should probably be extending the unit tests for ensuring non-latin
characters are getting stored properly.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode
multi-byte characters (Chinese) well
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599
--- Comment #2 from André-John Mas <an...@gmail.com> 2010-07-16 09:22:09 EDT ---
Created an attachment (id=25768)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=25768)
Patch test for unicode on setAuthor()
Added patch that tests for unicode on setAuthor()
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 49599] Comment.setAuthor does not encode
multi-byte characters (Chinese) well
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=49599
--- Comment #4 from Nick Burch <ni...@alfresco.com> 2010-07-16 09:44:03 EDT ---
Thanks for investigating this
The usual way in most records is to update the multibyte flag when updating the
string
I'll make this change, and write a unit test for it shortly
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org