You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by James Yu <cy...@gmail.com> on 2008/10/23 09:20:07 UTC

How to accept NULL when having double-quote as char-delimiter ?

Hi all,

I have some data in CSV was exported from Oracle.
The char-delimiter in the CSV files are double-quote (\"), so NULL is
represented by two consecutive double-quote (\"\"), let\'s call it
2-double-quote.

When I import with SYSCS_UTIL.SYSCS_IMPORT_DATA, Derby errors out when
the column in any NUMBER type.
Although 2-double-quote does incurs errors for VARCHAR type, but it
doesn\'t give the right meaning.
The only way to fix it is remove the 2-double-quote, but this can take
some time when the files are large in size.

Here is my question:
What should I do to let Derby know that 2-double-quote is NULL and
doesn\'t error out ?
--
This is a UTF-8 formatted mail
-----------------------------------------------
James C.-C.Yu

Re: How to accept NULL when having double-quote as char-delimiter ?

Posted by Rick Hillegas <Ri...@Sun.COM>.
Hi James,

One solution would be to write a table function which parses the CSV 
lines into rows. Your import statement would then look like this:

  insert into myTable select * from table( myTableFunction(...) ) s

Note that under the hood, Derby's import procedure does exactly this, 
that is, Derby wraps a VTI around the external data and then does a 
"insert into ...select" from that VTI. So using a table function should 
perform as well as native import does.

If you are interested in this approach, you may want to read the table 
function white paper located here: 
http://developers.sun.com/javadb/reference/whitepapers/index.jsp The 
white paper comes with a zip file of sample code. You should be able to 
do what you need by extending FlatFileTableFunction.

Hope this helps,
-Rick

James Yu wrote:
> Hi all,
>
> I have some data in CSV was exported from Oracle.
> The char-delimiter in the CSV files are double-quote (\"), so NULL is
> represented by two consecutive double-quote (\"\"), let\'s call it
> 2-double-quote.
>
> When I import with SYSCS_UTIL.SYSCS_IMPORT_DATA, Derby errors out when
> the column in any NUMBER type.
> Although 2-double-quote does incurs errors for VARCHAR type, but it
> doesn\'t give the right meaning.
> The only way to fix it is remove the 2-double-quote, but this can take
> some time when the files are large in size.
>
> Here is my question:
> What should I do to let Derby know that 2-double-quote is NULL and
> doesn\'t error out ?
> --
> This is a UTF-8 formatted mail
> -----------------------------------------------
> James C.-C.Yu
>