You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Regunath Balasubramanian <re...@mindtree.com> on 2006/07/11 11:50:54 UTC

Error reading CLOB

Hi, 
 
I chose to use Derby as an embedded DB to store text parsed/stripped from web
pages, MS Office files and PDF documents while implementing an indexing and
search solution. I need the parsed text of the document to enable search term
highlighting to produce an effective summary of search hits.
The natural choice was to use the CLOB data type. I store the contents using
PreparedStatement.setCharacterStream(column, reader) where reader is a
java.io.StringReader constructed from the java.lang.String instance
representing the entire parsed contents. I then read the contents out using
ResultSet.getClob(column).getCharacterStream().
 
This works fine during write always but fails for a few during the read. What
surprises me is the fact  that I read and write using the Derby classes and
therfore naturally expect that they work. The error is in the of the
fillBuffer() method of the UTF8Reader class. It throws a
UTFDataFormatException. 
 
I made a few frustating attempts at trying to get it work - I tried
constructing the parsed string using different encodings (UTF-8, ISO-8859-1)
at the time of write, tried to read it as a binary stream - failed with a
nice exception stating that I was trying to read a CLOB as binary, ascii
stream - failed with the same data format exception.
 
Finally I decided to write the contents as a BLOB instead. The bytes for
writing were constructed using String.getBytes(). I read the contents as
Blob.getBytes() and  then construct the String using the new String(byte[]).
This works!
 
I wonder why the UTF8 reader of Derby failed? I have the above mentioned
workaround but would like to know if there is an alternative.
 
Cheers!
Regu

Re: Error reading CLOB

Posted by Kristian Waagan <Kr...@Sun.COM>.
Regunath Balasubramanian wrote:
> Hi,
>  
> I chose to use Derby as an embedded DB to store text parsed/stripped 
> from web pages, MS Office files and PDF documents while implementing an 
> indexing and search solution. I need the parsed text of the document to 
> enable search term highlighting to produce an effective summary of 
> search hits.
> The natural choice was to use the CLOB data type. I store the contents 
> using PreparedStatement.setCharacterStream(column, reader) where reader 
> is a java.io.StringReader constructed from the java.lang.String instance 
> representing the entire parsed contents. I then read the contents out 
> using ResultSet.getClob(column).getCharacterStream().
>  
> This works fine during write always but fails for a few during the read. 
> What surprises me is the fact  that I read and write using the Derby 
> classes and therfore naturally expect that they work. The error is in 
> the of the fillBuffer() method of the UTF8Reader class. It throws a 
> UTFDataFormatException.

Hello Regu,

Could you please tell us in which version(s) of Derby you are seeing 
this problem?


Also, if you have a repro application that can be used to demonstrate 
the problem, it would be great :)
It would be very handy to have the data that causes the 
UTFDataFormatException to be thrown.



Thanks,
-- 
Kristian

>  
> I made a few frustating attempts at trying to get it work - I tried 
> constructing the parsed string using different encodings (UTF-8, 
> ISO-8859-1) at the time of write, tried to read it as a binary stream - 
> failed with a nice exception stating that I was trying to read a CLOB as 
> binary, ascii stream - failed with the same data format exception.
>  
> Finally I decided to write the contents as a BLOB instead. The bytes for 
> writing were constructed using String.getBytes(). I read the contents as 
> Blob.getBytes() and  then construct the String using the new 
> String(byte[]). This works!
>  
> I wonder why the UTF8 reader of Derby failed? I have the above mentioned 
> workaround but would like to know if there is an alternative.
>  
> Cheers!
> Regu
> 
> 
> ------------------------------------------------------------------------
> 
> -----------------------------------------------------------------------------------------------------------------------------
> Disclaimer
> -----------------------------------------------------------------------------------------------------------------------------
> 
> "This message(including attachment if any)is confidential and may be privileged.Before opening attachments please check them
> for viruses and defects.MindTree Consulting Private Limited (MindTree)will not be responsible for any viruses or defects or
> any forwarded attachments emanating either from within MindTree or outside.If you have received this message by mistake please notify the sender by return  e-mail and delete this message from your system. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited.  Please note that e-mails are susceptible to change and MindTree shall not be liable for any improper, untimely or incomplete transmission."
> 
> -----------------------------------------------------------------------------------------------------------------------------