You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Albert Kam <mo...@gmail.com> on 2008/06/25 12:59:21 UTC
CSV with fails with doublequote inside the value
Hello Apache Derby,
I'm using db-derby-10.4.1.3-bin, and when i'm importing a csv file like this
:
*ij> CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE
(null,'SMS_OUT','/home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv',null,null,null,0);
*
an error pops out like this :
*ERROR XIE0R: Import error on line 11 of file
/home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv: Data found on line
11 for column 3 after the stop delimiter.
ERROR XIE03: Data found on line 11 for column 3 after the stop delimiter.
*
I check the file, and in the line 11, it goes like this :
"92","+628xxxxxxx","*Testinggg -__-"*","2006-08-03
11:43:42","0","0","-1","-1","-1"
(Notice the -__-"") the content should be *Testinggg -__-"*
So, i tried changing the *Testinggg -__-"* into *Testinggg -__-\"*, but
still errornous ..
Later after googling for a while, i found out about this :
http://www.perlmonks.org/?node_id=678257
Here's one interesting quote :
*
CSV as defined by RFC 4180 <http://tools.ietf.org/html/rfc4180> does not
"escape" double quotes with a backslash, but rather by an additional set of
double quotes. Your parser fails to handle this format properly.
CSV is hard.
*
Is this a bug or is it something that i did ? For now, i removed the
doublequotes and it worked like charm.
Regards,
Albert Kam
--
Do not pursue the past. Do not lose yourself in the future.
The past no longer is. The future has not yet come.
Looking deeply at life as it is in the very here and now,
the practitioner dwells in stability and freedom.
(Thich Nhat Hanh)
Re: CSV with fails with doublequote inside the value
Posted by Rick Hillegas <Ri...@Sun.COM>.
Hi Albert,
Derby's understanding of CSV predates RFC 4180. For instance, Derby
trims leading and trailing spaces from imported columns although this
behavior is forbidden by RFC 4180. Derby's CSV dialect is described in
the Tools Guide in a section titled "File format for input and output".
This description is impressionistic rather than rigorous. I don't think
that the grammar of Derby's CSV dialect is published anywhere. Other
than that sketch in the Tools Guide, I think that the only contract is
that Derby promises to be able to re-import anything which Derby itself
exported.
In practice, I think that this means that plenty of edge cases plague
Derby when it imports files exported by other data sources. A lot of
these edge cases can be handled by using the middle arguments of
SYSCS_IMPORT_TABLE to fine-tune delimiters and code sets. If you can't
fine-tune SYSCS_IMPORT_TABLE well enough, you can always write your own
table function to wrap the foreign data file and then import the file
like this
insert into MyTable
select * from table( MyTableFunction( '/path/to/the/data/file' ) ) s
Given the guidelines sketched in the Tools Guide, I would expect Derby
to object to the file you are trying to import. If you want Derby to
include the extra double-quote in the imported text, then I would escape
it with another double-quote. For instance, the following file:
"foo", 1, "bar"
"wibble""", 2, "wombat"
imports successfully for me:
ij> call syscs_util.syscs_import_table
( 'APP', 'T', 'doubleQuote.csv', null, null, null, 0 );
0 rows inserted/updated/deleted
ij> select * from t;
A |B
|C
-----------------------------------------------------------------------------------------------------------------
foo |1
|bar
wibble" |2
|wombat
2 rows selected
Hope this helps,
-Rick
Albert Kam wrote:
> Hello Apache Derby,
>
> I'm using db-derby-10.4.1.3-bin, and when i'm importing a csv file
> like this :
>
> *ij> CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE
> (null,'SMS_OUT','/home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv',null,null,null,0);
> *
> an error pops out like this :
> *ERROR XIE0R: Import error on line 11 of file
> /home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv: Data found on
> line 11 for column 3 after the stop delimiter.
> ERROR XIE03: Data found on line 11 for column 3 after the stop delimiter.
> *
> I check the file, and in the line 11, it goes like this :
> "92","+628xxxxxxx","*Testinggg -__-"*","2006-08-03
> 11:43:42","0","0","-1","-1","-1"
> (Notice the -__-"") the content should be *Testinggg -__-"*
>
> So, i tried changing the *Testinggg -__-"* into *Testinggg -__-\"*,
> but still errornous ..
>
> Later after googling for a while, i found out about this :
> http://www.perlmonks.org/?node_id=678257
>
> Here's one interesting quote :
> /
>
> CSV as defined by RFC 4180 <http://tools.ietf.org/html/rfc4180> does
> not "escape" double quotes with a backslash, but rather by an
> additional set of double quotes. Your parser fails to handle this
> format properly.
>
> CSV is hard.
>
> /
> Is this a bug or is it something that i did ? For now, i removed the
> doublequotes and it worked like charm.
>
> Regards,
> Albert Kam
>
> --
> Do not pursue the past. Do not lose yourself in the future.
> The past no longer is. The future has not yet come.
> Looking deeply at life as it is in the very here and now,
> the practitioner dwells in stability and freedom.
> (Thich Nhat Hanh)