You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2009/06/03 07:32:11 UTC

DO NOT REPLY [Bug 47304] New: WordDocument uses platform default encoding

https://issues.apache.org/bugzilla/show_bug.cgi?id=47304

           Summary: WordDocument uses platform default encoding
           Product: POI
           Version: 3.5-dev
          Platform: PC
        OS/Version: Mac OS X 10.4
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HDF
        AssignedTo: dev@poi.apache.org
        ReportedBy: jelmer@jteam.nl


When using the following code to read the attached word document the text is
not read correctly on macosx

WordDocument wordDoc = new WordDocument(new FileInputStream("test.doc"));

StringWriter docTextWriter = new StringWriter();
wordDoc.writeAllText(new PrintWriter(docTextWriter));
wordDoc.writeAllText(writer);
docTextWriter.close();

System.out.println(docTextWriter.toString());


The reason for this is that the  platform default encoding is used to read the
document when the text found is not unicode while windows-1252 should be used

Here's the offending code

if(unicode)
{
 ....
}
else
{
   String sText = new String(_header, start, end-start);
   out.write(sText);
}

On windows the platform default encoding is windows-1252, on osx it's macroman

To fix this 


String sText = new String(_header, start, end-start);

should be changed to

String sText = new String(_header, start, end-start, "windows-1252");

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47304] WordDocument uses platform default encoding

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47304





--- Comment #1 from Jelmer Kuperus <je...@jteam.nl>  2009-06-02 22:33:23 PST ---
Created an attachment (id=23746)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=23746)
example

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org