You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Daryl Stultz <da...@opentempo.com> on 2019/06/13 20:01:12 UTC

Quoted content

Hello,


I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.


I'm using Commons CSV 1.7 and Open CSV 4.6.


Here is a code snippet with OpenCSV and Commons CSV:


import com.opencsv.CSVReader;

import org.apache.commons.csv.CSVFormat;

import org.apache.commons.csv.CSVParser;

System.out.println("OpenCSV");
CSVReader reader = new CSVReader(new StringReader("1, 2, \"A, B\", 4"));
reader.iterator().forEachRemaining((sa) -> {
  Arrays.stream(sa).forEach((s) -> System.out.println(s.trim()));
});

System.out.println("Commons CSV");
CSVParser parser = CSVFormat.DEFAULT.withTrim().parse(new StringReader("1, 2, \"A, B\", 4"));
parser.iterator().next().iterator().forEachRemaining(System.out::println);

The output is:


OpenCSV

1

2

A, B

4

Commons CSV

1

2

"A

B"

4


My expectation is that OpenCSV and Commons CSV would yield the same results (which would also agree with the library I'm yanking out).


I've tried fiddling with settings and with different CSVFormat instances with no change in behavior.


Any help appreciated.


--

Daryl Stultz
Principal Software Developer
_____________________________________
OpenTempo, Inc
http://www.opentempo.com<http://www.opentempo.com/>
mailto:daryl.stultz@opentempo.com<ma...@opentempo.com>


Re: Quoted content

Posted by Remko Popma <re...@gmail.com>.
Also check out https://github.com/osiegmar/FastCSV

(Shameless plug) Every java main() method deserves http://picocli.info

> On Jun 14, 2019, at 23:53, Gary Gregory <ga...@gmail.com> wrote:
> 
> I've never like mashing formatting and parsing options together. Should we
> have CSVFormat subclasses called CSVPrintingFormat and CSVParsingFormat?
> 
> Gary
> 
> On Fri, Jun 14, 2019 at 10:24 AM Daryl Stultz <da...@opentempo.com>
> wrote:
> 
>> 
>> 
>>> withIgnoreSurroundingSpaces() affects parsing
>>> withTrim() affects printing.
>> 
>> Ah, that is exactly what I needed, withIgnoreSurroundingSpaces() solves my
>> problem. (Definitely hard to understand that which applies to parsing and
>> that which applies to printing!)
>> 
>> Thank you so much.
>> 
>> /Daryl
>> 
>> 
>> 

Re: Quoted content

Posted by Gary Gregory <ga...@gmail.com>.
I've never like mashing formatting and parsing options together. Should we
have CSVFormat subclasses called CSVPrintingFormat and CSVParsingFormat?

Gary

On Fri, Jun 14, 2019 at 10:24 AM Daryl Stultz <da...@opentempo.com>
wrote:

>
>
> > withIgnoreSurroundingSpaces() affects parsing
> > withTrim() affects printing.
>
> Ah, that is exactly what I needed, withIgnoreSurroundingSpaces() solves my
> problem. (Definitely hard to understand that which applies to parsing and
> that which applies to printing!)
>
> Thank you so much.
>
> /Daryl
>
>
>

Re: Quoted content

Posted by Daryl Stultz <da...@opentempo.com>.

> withIgnoreSurroundingSpaces() affects parsing
> withTrim() affects printing.

Ah, that is exactly what I needed, withIgnoreSurroundingSpaces() solves my problem. (Definitely hard to understand that which applies to parsing and that which applies to printing!)

Thank you so much.

/Daryl



Re: Quoted content

Posted by sebb <se...@gmail.com>.
I should have added:

withIgnoreSurroundingSpaces() affects parsing
withTrim() affects printing.

On Fri, 14 Jun 2019 at 15:05, sebb <se...@gmail.com> wrote:
>
> Try using:
>
> withIgnoreSurroundingSpaces()
>
> On Fri, 14 Jun 2019 at 14:03, sebb <se...@gmail.com> wrote:
> >
> > On Fri, 14 Jun 2019 at 13:34, Daryl Stultz <da...@opentempo.com> wrote:
> > >
> > > I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.
> > >
> > > I've discovered this bug here:
> > > https://issues.apache.org/jira/browse/CSV-228
> > >
> > > <https://issues.apache.org/jira/browse/CSV-228>The issue refers to the parsing of the header, but it doesn't seem to matter what row the comma-quoting is on.
> >
> > According to my reading of RFC4180[1], the fields between delimiters
> > are either either escaped or non-escaped.
> > non-escaped fields can include spaces, but not comma
> > escaped fields must start with the double-quote; leading spaces are
> > not permitted.
> >
> > [1] https://tools.ietf.org/html/rfc4180
> >
> > > There's no way I can use this product with this defect. That's unfortunate, I like the API and OpenCSV quotes every bit of content when printing which I don't like.
> >
> > Now that is the case for DEFAULT and RFC4180.
> >
> > I've not looked into the withTrim() option.
> > If that is supposed to trim before handling quoted fields, then I
> > agree that there seems to be a bug here.
> > But if the trim is only supposed to apply to the un-quoted field, then
> > the current behaviour seems OK, even if it's not what you expect.
> >
> > > --
> > >
> > > Daryl Stultz
> > > Principal Software Developer
> > > _____________________________________
> > > OpenTempo, Inc
> > > http://www.opentempo.com<http://www.opentempo.com/>
> > > mailto:daryl.stultz@opentempo.com<ma...@opentempo.com>
> > >

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: Quoted content

Posted by sebb <se...@gmail.com>.
Try using:

withIgnoreSurroundingSpaces()

On Fri, 14 Jun 2019 at 14:03, sebb <se...@gmail.com> wrote:
>
> On Fri, 14 Jun 2019 at 13:34, Daryl Stultz <da...@opentempo.com> wrote:
> >
> > I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.
> >
> > I've discovered this bug here:
> > https://issues.apache.org/jira/browse/CSV-228
> >
> > <https://issues.apache.org/jira/browse/CSV-228>The issue refers to the parsing of the header, but it doesn't seem to matter what row the comma-quoting is on.
>
> According to my reading of RFC4180[1], the fields between delimiters
> are either either escaped or non-escaped.
> non-escaped fields can include spaces, but not comma
> escaped fields must start with the double-quote; leading spaces are
> not permitted.
>
> [1] https://tools.ietf.org/html/rfc4180
>
> > There's no way I can use this product with this defect. That's unfortunate, I like the API and OpenCSV quotes every bit of content when printing which I don't like.
>
> Now that is the case for DEFAULT and RFC4180.
>
> I've not looked into the withTrim() option.
> If that is supposed to trim before handling quoted fields, then I
> agree that there seems to be a bug here.
> But if the trim is only supposed to apply to the un-quoted field, then
> the current behaviour seems OK, even if it's not what you expect.
>
> > --
> >
> > Daryl Stultz
> > Principal Software Developer
> > _____________________________________
> > OpenTempo, Inc
> > http://www.opentempo.com<http://www.opentempo.com/>
> > mailto:daryl.stultz@opentempo.com<ma...@opentempo.com>
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: Quoted content

Posted by sebb <se...@gmail.com>.
On Fri, 14 Jun 2019 at 13:34, Daryl Stultz <da...@opentempo.com> wrote:
>
> I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.
>
> I've discovered this bug here:
> https://issues.apache.org/jira/browse/CSV-228
>
> <https://issues.apache.org/jira/browse/CSV-228>The issue refers to the parsing of the header, but it doesn't seem to matter what row the comma-quoting is on.

According to my reading of RFC4180[1], the fields between delimiters
are either either escaped or non-escaped.
non-escaped fields can include spaces, but not comma
escaped fields must start with the double-quote; leading spaces are
not permitted.

[1] https://tools.ietf.org/html/rfc4180

> There's no way I can use this product with this defect. That's unfortunate, I like the API and OpenCSV quotes every bit of content when printing which I don't like.

Now that is the case for DEFAULT and RFC4180.

I've not looked into the withTrim() option.
If that is supposed to trim before handling quoted fields, then I
agree that there seems to be a bug here.
But if the trim is only supposed to apply to the un-quoted field, then
the current behaviour seems OK, even if it's not what you expect.

> --
>
> Daryl Stultz
> Principal Software Developer
> _____________________________________
> OpenTempo, Inc
> http://www.opentempo.com<http://www.opentempo.com/>
> mailto:daryl.stultz@opentempo.com<ma...@opentempo.com>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: Quoted content

Posted by Daryl Stultz <da...@opentempo.com>.
I'm trying to replace an old CSV library with commons-csv. I seem to be having trouble with the most basic idea of the parser recognizing content that is quoted.

I've discovered this bug here:
https://issues.apache.org/jira/browse/CSV-228

<https://issues.apache.org/jira/browse/CSV-228>The issue refers to the parsing of the header, but it doesn't seem to matter what row the comma-quoting is on.

There's no way I can use this product with this defect. That's unfortunate, I like the API and OpenCSV quotes every bit of content when printing which I don't like.

--

Daryl Stultz
Principal Software Developer
_____________________________________
OpenTempo, Inc
http://www.opentempo.com<http://www.opentempo.com/>
mailto:daryl.stultz@opentempo.com<ma...@opentempo.com>