You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Kristian Waagan (JIRA)" <ji...@apache.org> on 2007/05/14 12:24:16 UTC

[jira] Updated: (DERBY-2618) EmbedClob.setAsciiStream does not handle non-ascii characters correctly

     [ https://issues.apache.org/jira/browse/DERBY-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan updated DERBY-2618:
-----------------------------------

    Attachment: Derby2618BugsInClobAsciiStream.java

Seems there are more than one kind of bug in this area.
>From the users perspective, you can write any int to the clob ascii stream, since it is a OutputStream and has the method write(int).

I see errors happening in two places; some when calling write, some when calling read when reading the value back.
Would be nice if anyone could try the repro and confirm my findings, or let me know if I have been using the JDBC API incorrectly. Note that the repro must be run with Java SE 6 because it uses Connection.createClob().


I have not looked at this in detail, but I think the stream used to write to the Clob must do some filtering on the value. 
This is typically ANDing with 0xff and/or replacing the incoming int with the Unicode marker for unknown character (\uFFFD).

Then the question is, which values are considered "non-ASCII"?
According to JDBC, an ASCII value is between 0 and 255, inclusive. Represented as byte, you will get negative values for a part of this range. I assume ISO-8859-1 is the encoding standard to be used, and further that these values will be mapped directly into Unicode.

Say the Tamil letter with Unicode value '\u0B88' is written to the stream returned by Clob.setAsciiStream(1) with OutputStream.write(int). Should we do "if i > 255 write '\uFFFD'", or should we ignore the higher bits and say this is value 136 (this is mentioned in the comment for ClobAsciiStream), which happens to be an unused code in ISO-8859-1?

When/if the unknown character code is stored internally (\uFFFD), it must be converted to '?' if it is read back using getAsciiStream (returns an InputStream).

The default behavior for OutputStream.write(int), is to cast the int to char and then call the abstract method write(char[],int,int).

No matter what the answers to the questions above are, Derby should not fail with a UTFDataFormatException when reading data you have already been allowed to insert.

I'm on thin ice for how to correctly handle these issues, and I'm sure there are more, so please correct me and add additional information.

> EmbedClob.setAsciiStream does not handle non-ascii characters correctly
> -----------------------------------------------------------------------
>
>                 Key: DERBY-2618
>                 URL: https://issues.apache.org/jira/browse/DERBY-2618
>             Project: Derby
>          Issue Type: Bug
>          Components: JDBC
>    Affects Versions: 10.3.0.0
>            Reporter: Kristian Waagan
>         Assigned To: Kristian Waagan
>         Attachments: Derby2618BugsInClobAsciiStream.java
>
>
> If non-ascii characters are written to the Writer returned by EmbedClob.setAsciiStream, Derby fails with a 'java.io.UTFDataFormatException' when the CLOB value is read back.
> I'm filing this bug with 'Major' priority, as the bug does not manifest itself when entering data, just when you try to get it back. Except from filtering the data yourself before entering it, I don't think there is any workaround.
> Sample stack trace from a modified test:
> 1) testClobAsciiWrite1ParamKRISTIWAA(org.apache.derbyTesting.functionTests.tests.jdbcapi.LobStreamsTest)java.sql.SQLException: Unable to set stream: 'java.io.UTFDataFormatException'.
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95)
>         at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:88)
>         at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:94)
>         at org.apache.derby.impl.jdbc.Util.setStreamFailure(Util.java:246)
>         at org.apache.derby.impl.jdbc.EmbedClob.length(EmbedClob.java:190)
>         at org.apache.derby.impl.jdbc.EmbedPreparedStatement.setClob(EmbedPreparedStatement.java:1441)
>         at org.apache.derbyTesting.functionTests.tests.jdbcapi.LobStreamsTest.testClobAsciiWrite1ParamKRISTIWAA(LobStreamsTest.java:255)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:88)
> Caused by: java.sql.SQLException: Unable to set stream: 'java.io.UTFDataFormatException'.
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:135)
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70)
>         ... 22 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.