You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cxf.apache.org by "Lars Svensson (JIRA)" <ji...@apache.org> on 2012/10/02 10:51:07 UTC
[jira] [Created] (CXF-4533) Encoding error in CachedOutputStream
when double-byte char is on 1024 byte boundary
Lars Svensson created CXF-4533:
----------------------------------
Summary: Encoding error in CachedOutputStream when double-byte char is on 1024 byte boundary
Key: CXF-4533
URL: https://issues.apache.org/jira/browse/CXF-4533
Project: CXF
Issue Type: Bug
Affects Versions: 2.6.2, 2.3
Reporter: Lars Svensson
Hi,
We experience occasional encoding errors where a small number of two-byte chars get encoded wrong in an otherwise correct encoded message. I have traced the problem to the writeCacheTo method of CachedOutputStream where the temp cached file is read as 1024 bytes at the time which are then converted to a String before getting appended to the StringBuilder. If the 1024 byte boundary falls right between the two bytes of a two byte char the encoding fails.
public void writeCacheTo(StringBuilder out, String charsetName) throws IOException {
flush();
if (inmem) {
if (currentStream instanceof ByteArrayOutputStream) {
byte[] bytes = ((ByteArrayOutputStream)currentStream).toByteArray();
out.append(IOUtils.newStringFromBytes(bytes, charsetName));
} else {
throw new IOException("Unknown format of currentStream");
}
} else {
// read the file
FileInputStream fin = new FileInputStream(tempFile);
byte bytes[] = new byte[1024];
int x = fin.read(bytes);
while (x != -1) {
out.append(IOUtils.newStringFromBytes(bytes, charsetName, 0, x));
x = fin.read(bytes);
}
fin.close();
}
}
Below is a couple of lines from the hex-dump of the cache-file where you can see that the second o-slash in the file fall on a 1024 byte boundary and therefore gets corrupted in the outgoing message:
0001fbe0: 66 66 65 6e 74 6c 69 67 20 66 c3 b8 72 74 69 64 73 70 65 6e 73 69 6f 6e 2c 20 73 6f 6d 20 66 c3 ffentlig førtidspension, som f?
0001fc00: b8 72 65 72 20 74 69 6c 2c 20 61 74 20 6d 65 64 6c 65 6d 3c 2f 70 67 66 3a 52 65 70 75 72 63 68 ?rer til, at medlem</pgf:Repurch
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CXF-4533) Encoding error in CachedOutputStream
when double-byte char is on 1024 byte boundary
Posted by "Daniel Kulp (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CXF-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Kulp resolved CXF-4533.
------------------------------
Resolution: Fixed
Fix Version/s: 2.7.0
2.6.3
2.5.6
2.4.10
Assignee: Daniel Kulp
> Encoding error in CachedOutputStream when double-byte char is on 1024 byte boundary
> -----------------------------------------------------------------------------------
>
> Key: CXF-4533
> URL: https://issues.apache.org/jira/browse/CXF-4533
> Project: CXF
> Issue Type: Bug
> Affects Versions: 2.3, 2.6.2
> Reporter: Lars Svensson
> Assignee: Daniel Kulp
> Fix For: 2.4.10, 2.5.6, 2.6.3, 2.7.0
>
>
> Hi,
> We experience occasional encoding errors where a small number of two-byte chars get encoded wrong in an otherwise correct encoded message. I have traced the problem to the writeCacheTo method of CachedOutputStream where the temp cached file is read as 1024 bytes at the time which are then converted to a String before getting appended to the StringBuilder. If the 1024 byte boundary falls right between the two bytes of a two byte char the encoding fails.
> public void writeCacheTo(StringBuilder out, String charsetName) throws IOException {
> flush();
> if (inmem) {
> if (currentStream instanceof ByteArrayOutputStream) {
> byte[] bytes = ((ByteArrayOutputStream)currentStream).toByteArray();
> out.append(IOUtils.newStringFromBytes(bytes, charsetName));
> } else {
> throw new IOException("Unknown format of currentStream");
> }
> } else {
> // read the file
> FileInputStream fin = new FileInputStream(tempFile);
> byte bytes[] = new byte[1024];
> int x = fin.read(bytes);
> while (x != -1) {
> out.append(IOUtils.newStringFromBytes(bytes, charsetName, 0, x));
> x = fin.read(bytes);
> }
> fin.close();
> }
> }
> Below is a couple of lines from the hex-dump of the cache-file where you can see that the second o-slash in the file fall on a 1024 byte boundary and therefore gets corrupted in the outgoing message:
> 0001fbe0: 66 66 65 6e 74 6c 69 67 20 66 c3 b8 72 74 69 64 73 70 65 6e 73 69 6f 6e 2c 20 73 6f 6d 20 66 c3 ffentlig førtidspension, som f?
> 0001fc00: b8 72 65 72 20 74 69 6c 2c 20 61 74 20 6d 65 64 6c 65 6d 3c 2f 70 67 66 3a 52 65 70 75 72 63 68 ?rer til, at medlem</pgf:Repurch
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira