You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Don Seiler <do...@seiler.us> on 2005/05/25 23:16:07 UTC

CSV parsing/writing?

Afternoon.  Just writing to ask if anyone knows of any commons/jakarta
packages that may do CSV parsing and writing.  I'm aware of the jcsv
package but thought I would try and utilize commons as much as possible.
I looked at jakarta-oro as well but don't seem to see anything CSV
related.

Thanks in advance.
-- 
Don Seiler
don@seiler.us

Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xFC87F041
Fingerprint: 0B56 50D5 E91E 4D4C 83B7  207C 76AC 5DA2 FC87 F041

Re: CSV parsing/writing?

Posted by Paul DeCoursey <pa...@decoursey.net>.
Yes you are missing something, escaped commas and Quoted fields.  I  don't
know of any part of commons that parses it.

pd

On May 25, 2005, at 4:46 PM, Frank W. Zammetti wrote:

> I might be missing something, but doesn't StringTokenizer do the trick 
for you?
> Don Seiler wrote:
>> Afternoon.  Just writing to ask if anyone knows of any commons/jakarta
packages that may do CSV parsing and writing.  I'm aware of the jcsv
package but thought I would try and utilize commons as much as 
possible.
>> I looked at jakarta-oro as well but don't seem to see anything CSV
related.
>> Thanks in advance.
> --
> Frank W. Zammetti
> Founder and Chief Software Architect
> Omnytex Technologies
> http://www.omnytex.com
> --------------------------------------------------------------------- To
unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org For
additional commands, e-mail: commons-user-help@jakarta.apache.org






---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: CSV parsing/writing?

Posted by Don Seiler <do...@seiler.us>.
On 16:40 Thu 26 May     , Simon Kitching wrote:
> There was a thread on this topic almost exactly two years ago, with
> subject "[SURVEY] Commons-csv or not?":
>   http://tinyurl.com/bojgz

Sounds like a good conversation, but it seemed to suddenly die with no
action.  As I said, I'd be happy to contribute at least a brute-force
parser to begin with for commons-io or whatever the jakarta gods deem
appropriate (IO makes the most sense to me, but I'm new here).

And, in my mind, CSV is not just "comma separated," so I would support
user-specified delimiters and field qualifiers (defaulting to comma and
double-quotes, respectively).

-- 
Don Seiler
don@seiler.us

Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xFC87F041
Fingerprint: 0B56 50D5 E91E 4D4C 83B7  207C 76AC 5DA2 FC87 F041

Re: CSV parsing/writing?

Posted by Simon Kitching <sk...@apache.org>.
On Wed, 2005-05-25 at 23:24 -0500, Don Seiler wrote:
> On 19:05 Wed 25 May     , James Sangster wrote:
> > I was looking to doing CSV parsing using regular expressions, but I came
> > across one post in a newsgroup where it was stated that regular expressions
> > themselves couldn't handle it alone.   Because the environment I was working
> > with had restricted regular expression capabilities and no third party
> > package integration capabilities, I instead just went for the brute force
> > method of parsing character for character on each line and using a state
> > machine.
> > 
> > It seems to work very well, but the performance could be a little better.
> 
> Would the jakarta community welcome a CSV parsing/writing module for
> commons?  I'd be happy to work on it, no doubt I would start down a
> similar path of having to look at each character and track the state of
> what is a field and what isn't.
> 

There was a thread on this topic almost exactly two years ago, with
subject "[SURVEY] Commons-csv or not?":
  http://tinyurl.com/bojgz

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: CSV parsing/writing?

Posted by Don Seiler <do...@seiler.us>.
On 21:33 Wed 25 May     , Martin Cooper wrote:
> On 5/25/05, Don Seiler <do...@seiler.us> wrote:
> > Would the jakarta community welcome a CSV parsing/writing module for
> > commons?  I'd be happy to work on it, no doubt I would start down a
> > similar path of having to look at each character and track the state of
> > what is a field and what isn't.
> 
> I'd be happy to see such a thing here in Commons. However, it would be
> hard to believe that there isn't already such a thing in some Jakarta
> or other ASF Java project that we could bring here, instead of writing
> one from scratch.

I was thinking the same thing.  As I said, I looked around in
jakarta-oro and the other commons packages.  From the available
descriptions I didn't see anything CSV-related.

-- 
Don Seiler
don@seiler.us

Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xFC87F041
Fingerprint: 0B56 50D5 E91E 4D4C 83B7  207C 76AC 5DA2 FC87 F041

Re: CSV parsing/writing?

Posted by Martin Cooper <mf...@gmail.com>.
On 5/25/05, Don Seiler <do...@seiler.us> wrote:
> On 19:05 Wed 25 May     , James Sangster wrote:
> > I was looking to doing CSV parsing using regular expressions, but I came
> > across one post in a newsgroup where it was stated that regular expressions
> > themselves couldn't handle it alone.   Because the environment I was working
> > with had restricted regular expression capabilities and no third party
> > package integration capabilities, I instead just went for the brute force
> > method of parsing character for character on each line and using a state
> > machine.
> >
> > It seems to work very well, but the performance could be a little better.
> 
> Would the jakarta community welcome a CSV parsing/writing module for
> commons?  I'd be happy to work on it, no doubt I would start down a
> similar path of having to look at each character and track the state of
> what is a field and what isn't.

I'd be happy to see such a thing here in Commons. However, it would be
hard to believe that there isn't already such a thing in some Jakarta
or other ASF Java project that we could bring here, instead of writing
one from scratch.

--
Martin Cooper


> --
> Don Seiler
> don@seiler.us
> 
> Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xFC87F041
> Fingerprint: 0B56 50D5 E91E 4D4C 83B7  207C 76AC 5DA2 FC87 F041
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: CSV parsing/writing?

Posted by Don Seiler <do...@seiler.us>.
On 19:05 Wed 25 May     , James Sangster wrote:
> I was looking to doing CSV parsing using regular expressions, but I came
> across one post in a newsgroup where it was stated that regular expressions
> themselves couldn't handle it alone.   Because the environment I was working
> with had restricted regular expression capabilities and no third party
> package integration capabilities, I instead just went for the brute force
> method of parsing character for character on each line and using a state
> machine.
> 
> It seems to work very well, but the performance could be a little better.

Would the jakarta community welcome a CSV parsing/writing module for
commons?  I'd be happy to work on it, no doubt I would start down a
similar path of having to look at each character and track the state of
what is a field and what isn't.

-- 
Don Seiler
don@seiler.us

Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xFC87F041
Fingerprint: 0B56 50D5 E91E 4D4C 83B7  207C 76AC 5DA2 FC87 F041

RE: CSV parsing/writing?

Posted by James Sangster <ja...@newswire.ca>.
I was looking to doing CSV parsing using regular expressions, but I came
across one post in a newsgroup where it was stated that regular expressions
themselves couldn't handle it alone.   Because the environment I was working
with had restricted regular expression capabilities and no third party
package integration capabilities, I instead just went for the brute force
method of parsing character for character on each line and using a state
machine.

It seems to work very well, but the performance could be a little better.

james



-----Original Message-----
From: Frank W. Zammetti [mailto:fzlists@omnytex.com] 
Sent: Wednesday, May 25, 2005 7:01 PM
To: Don Seiler
Cc: Jakarta Commons Users List
Subject: Re: CSV parsing/writing?


Fair enough.  I have parsed CSVs a number of times, I guess I've been 
lucky in that one of the design criteria was no occurances of the 
delimiter within data elements.  Certainly if there is a chance of that, 
then sure, you need something more advanced.

Frank


Don Seiler wrote:
> On 17:46 Wed 25 May     , Frank W. Zammetti wrote:
> 
>>I might be missing something, but doesn't StringTokenizer do the trick
>>for you?
> 
> 
> Anyone with experience parsing CSVs knows there are the cases of 
> delimiters within quotes that make the parsing a bigger headache than 
> just using StringTokenizer (or String.split()).  Why else would there 
> be so many other third-party APIs for it?
> 
> 
>>Don Seiler wrote:
>>
>>>Afternoon.  Just writing to ask if anyone knows of any 
>>>commons/jakarta packages that may do CSV parsing and writing.  I'm 
>>>aware of the jcsv package but thought I would try and utilize commons 
>>>as much as possible. I looked at jakarta-oro as well but don't seem 
>>>to see anything CSV related.
>>>
>>>Thanks in advance.
> 
> 

-- 
Frank W. Zammetti
Founder and Chief Software Architect
Omnytex Technologies
http://www.omnytex.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: CSV parsing/writing?

Posted by "Frank W. Zammetti" <fz...@omnytex.com>.
Fair enough.  I have parsed CSVs a number of times, I guess I've been 
lucky in that one of the design criteria was no occurances of the 
delimiter within data elements.  Certainly if there is a chance of that, 
then sure, you need something more advanced.

Frank


Don Seiler wrote:
> On 17:46 Wed 25 May     , Frank W. Zammetti wrote:
> 
>>I might be missing something, but doesn't StringTokenizer do the trick 
>>for you?
> 
> 
> Anyone with experience parsing CSVs knows there are the cases of
> delimiters within quotes that make the parsing a bigger headache than
> just using StringTokenizer (or String.split()).  Why else would there be
> so many other third-party APIs for it?
> 
> 
>>Don Seiler wrote:
>>
>>>Afternoon.  Just writing to ask if anyone knows of any commons/jakarta
>>>packages that may do CSV parsing and writing.  I'm aware of the jcsv
>>>package but thought I would try and utilize commons as much as possible.
>>>I looked at jakarta-oro as well but don't seem to see anything CSV
>>>related.
>>>
>>>Thanks in advance.
> 
> 

-- 
Frank W. Zammetti
Founder and Chief Software Architect
Omnytex Technologies
http://www.omnytex.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: CSV parsing/writing?

Posted by Don Seiler <do...@seiler.us>.
On 17:46 Wed 25 May     , Frank W. Zammetti wrote:
> I might be missing something, but doesn't StringTokenizer do the trick 
> for you?

Anyone with experience parsing CSVs knows there are the cases of
delimiters within quotes that make the parsing a bigger headache than
just using StringTokenizer (or String.split()).  Why else would there be
so many other third-party APIs for it?

> Don Seiler wrote:
> >Afternoon.  Just writing to ask if anyone knows of any commons/jakarta
> >packages that may do CSV parsing and writing.  I'm aware of the jcsv
> >package but thought I would try and utilize commons as much as possible.
> >I looked at jakarta-oro as well but don't seem to see anything CSV
> >related.
> >
> >Thanks in advance.

-- 
Don Seiler
don@seiler.us

Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xFC87F041
Fingerprint: 0B56 50D5 E91E 4D4C 83B7  207C 76AC 5DA2 FC87 F041

Re: CSV parsing/writing?

Posted by "Frank W. Zammetti" <fz...@omnytex.com>.
I might be missing something, but doesn't StringTokenizer do the trick 
for you?

Don Seiler wrote:
> Afternoon.  Just writing to ask if anyone knows of any commons/jakarta
> packages that may do CSV parsing and writing.  I'm aware of the jcsv
> package but thought I would try and utilize commons as much as possible.
> I looked at jakarta-oro as well but don't seem to see anything CSV
> related.
> 
> Thanks in advance.

-- 
Frank W. Zammetti
Founder and Chief Software Architect
Omnytex Technologies
http://www.omnytex.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: CSV parsing/writing?

Posted by Martin Cooper <mf...@gmail.com>.
On 5/25/05, Catalin Grigoroscuta <c....@moodmedia.ro> wrote:
> No need to re-invent the wheel, try ostermiller CSV parser (see
> ostermiller.org) - open  source, GPL licence.
> It works fine for me.

A GPL license might be fine for people who want to pick up this
package and include it in their applications. However, the GPL is
fundamentally incompatible with the ASL, so it's not something we
could pick up and include in any Jakarta Commons component.

--
Martin Cooper


> 
> Don Seiler wrote:
> 
> >Afternoon.  Just writing to ask if anyone knows of any commons/jakarta
> >packages that may do CSV parsing and writing.  I'm aware of the jcsv
> >package but thought I would try and utilize commons as much as possible.
> >I looked at jakarta-oro as well but don't seem to see anything CSV
> >related.
> >
> >Thanks in advance.
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: CSV parsing/writing?

Posted by Catalin Grigoroscuta <c....@moodmedia.ro>.
No need to re-invent the wheel, try ostermiller CSV parser (see 
ostermiller.org) - open  source, GPL licence.
It works fine for me.

Don Seiler wrote:

>Afternoon.  Just writing to ask if anyone knows of any commons/jakarta
>packages that may do CSV parsing and writing.  I'm aware of the jcsv
>package but thought I would try and utilize commons as much as possible.
>I looked at jakarta-oro as well but don't seem to see anything CSV
>related.
>
>Thanks in advance.
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org