You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Gary Gregory <ga...@gmail.com> on 2012/10/16 14:41:43 UTC

[csv] CSVFormat API names

Hi All:

The format object can configure various aspects of input and output
formatting.

With my recent addition of the Quote enum for [CSV-53], there are now two
aspects of quoting to configure: the quote character and the quote policy
(minimal, all, non-numeric, and none.) FYI, 'none' is currently not
implemented.

First, I changed (without consulting this list, and please accept my
apologies for this) the - IMO - cryptic and burdensome terminology of
"encapsulator" to "quote char", and added "quote policy":

- withQuoteChar(char)
- withQuotePolicy(Quote)

My intention here is that all Quote APIs start with "withQuote" followed by
what aspect of quoting is being configured.

Alternatively, we could have:

- withQuote(char)
- withQuotePolicy(Quote)

Which makes the API more consistent with the other char/Character based
properties:

- withEscape
- withDelimiter
- withLineSeparator
- withCommentStart

none of the above are post-fixed with a "Char" in the name.

As far as reading, for me, the "-r" names are OK because the they are nouns
(things): "a delimiter", "a line separator." But I do not talk about "an
escape" because that would be an act (think Alcatraz) as opposed to what we
have here: a character used to /perform/ escapes.

So I propose to change "escape" to "escape char" because "escaper" is not a
word.

The name "comment start" is not great also because it implies (to me) that
there is a "comment end" missing. So plain "comment" or "comment char"
would be better.

Circling back to "quote char" which I have the way it is now for the same
reason as for the "escape" property.

In summary, using *Char names is better IMO.

Discuss! :)

Gary

[CSV-53] https://issues.apache.org/jira/browse/CSV-53
-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
<Jo...@scalaris.com>wrote:

> Hi Gary,
>
> Gary Gregory wrote:
>
> > Hi All:
> >
> > The format object can configure various aspects of input and output
> > formatting.
> >
> > With my recent addition of the Quote enum for [CSV-53], there are now two
> > aspects of quoting to configure: the quote character and the quote policy
> > (minimal, all, non-numeric, and none.) FYI, 'none' is currently not
> > implemented.
> >
> > First, I changed (without consulting this list, and please accept my
> > apologies for this) the - IMO - cryptic and burdensome terminology of
> > "encapsulator" to "quote char", and added "quote policy":
> >
> > - withQuoteChar(char)
> > - withQuotePolicy(Quote)
> >
> > My intention here is that all Quote APIs start with "withQuote" followed
> > by what aspect of quoting is being configured.
> >
> > Alternatively, we could have:
> >
> > - withQuote(char)
> > - withQuotePolicy(Quote)
>
> or
>
> - withQuote(char)
> - withQuote(Quote)
>
> ;-)
>
> > Which makes the API more consistent with the other char/Character based
> > properties:
> >
> > - withEscape
> > - withDelimiter
> > - withLineSeparator
> > - withCommentStart
> >
> > none of the above are post-fixed with a "Char" in the name.
> >
> > As far as reading, for me, the "-r" names are OK because the they are
> > nouns (things): "a delimiter", "a line separator." But I do not talk
> about
> > "an escape" because that would be an act (think Alcatraz) as opposed to
> > what we have here: a character used to /perform/ escapes.
> >
> > So I propose to change "escape" to "escape char" because "escaper" is not
> > a word.
> >
> > The name "comment start" is not great also because it implies (to me)
> that
> > there is a "comment end" missing. So plain "comment" or "comment char"
> > would be better.
>
> Who said it has to be a single char?
>
> .withEOLComment("//")
>
>
> Same applies to the line separator:
>
> .withLineSeparator("\n\r")
>

My mistake there, I should not have mentioned this API. LineSeparator is
nice because it matches the line.separator system property name.

Gary


>
> > Circling back to "quote char" which I have the way it is now for the same
> > reason as for the "escape" property.
> >
> > In summary, using *Char names is better IMO.
>
> Only if it can be a single char only. If it can either be a single char or
> a
> String, I normally tend to use overloaded methods:
>
> - withEOLComment(char)
> - withEOLComment(CharSequence)
>
> > Discuss! :)
>
> Can or worms opened :))
>
> - Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by "Honton, Charles" <Ch...@intuit.com>.
Wikipedia has "Delimiter separated text" and "Delimiter-separated values"
(http://en.wikipedia.org/wiki/Delimiter-separated_values)

This suggests that CsvFormat, Rfc4180Format, and TsvFormat could be final
classes extending DsvFormat.

chas

On 10/16/12 12:50 PM, "Gary Gregory" <ga...@gmail.com> wrote:

>On Tue, Oct 16, 2012 at 3:38 PM, James Carman
><ja...@carmanconsulting.com>wrote:
>
>> On Tue, Oct 16, 2012 at 2:25 PM, Gary Gregory <ga...@gmail.com>
>> wrote:
>> >
>> > I did not do this one as is it seems RFC4180 defines CR+LF as the
>>record
>> > separator as noted in the Javadoc for
>> > org.apache.commons.csv.CSVFormat.DEFAULT.
>> >
>>
>> That's where the name of this component gets confusing to me.  Since
>> it's called "CSV", it would make sense that we follow RFC 4180, which
>> defines the standard for comma-separated value files and thus the
>> default record separator would be CRLF.  However, we are allowing
>> users to define whatever format they want using properties of the
>> CSVFormat class (of course, if you use delimiter != ',', then it's not
>> really CSV).  So, what's the intent?  This is more of a
>> delimited-record format parser/writer component which supports CSV.
>> Thus, it is not really very well-named.
>>
>
>Right! Tab delimited is common too. So... what's a better name?
>
>Commons DSV (Delimiter-Separated Values)?
>Commons Text Records?
>Commons Text Table?
>
>Gary
>
>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
>-- 
>E-Mail: garydgregory@gmail.com | ggregory@apache.org
>JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
>Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
>Blog: http://garygregory.wordpress.com
>Home: http://garygregory.com/
>Tweet! http://twitter.com/GaryGregory


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Oct 16, 2012 at 3:38 PM, James Carman <ja...@carmanconsulting.com>wrote:

> On Tue, Oct 16, 2012 at 2:25 PM, Gary Gregory <ga...@gmail.com>
> wrote:
> >
> > I did not do this one as is it seems RFC4180 defines CR+LF as the record
> > separator as noted in the Javadoc for
> > org.apache.commons.csv.CSVFormat.DEFAULT.
> >
>
> That's where the name of this component gets confusing to me.  Since
> it's called "CSV", it would make sense that we follow RFC 4180, which
> defines the standard for comma-separated value files and thus the
> default record separator would be CRLF.  However, we are allowing
> users to define whatever format they want using properties of the
> CSVFormat class (of course, if you use delimiter != ',', then it's not
> really CSV).  So, what's the intent?  This is more of a
> delimited-record format parser/writer component which supports CSV.
> Thus, it is not really very well-named.
>

Right! Tab delimited is common too. So... what's a better name?

Commons DSV (Delimiter-Separated Values)?
Commons Text Records?
Commons Text Table?

Gary


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Oct 16, 2012 at 7:14 PM, sebb <se...@gmail.com> wrote:

> On 16 October 2012 21:00, Gary Gregory <ga...@gmail.com> wrote:
> > On Tue, Oct 16, 2012 at 3:38 PM, James Carman <
> james@carmanconsulting.com>wrote:
> >
> >> On Tue, Oct 16, 2012 at 2:25 PM, Gary Gregory <ga...@gmail.com>
> >> wrote:
> >> >
> >> > I did not do this one as is it seems RFC4180 defines CR+LF as the
> record
> >> > separator as noted in the Javadoc for
> >> > org.apache.commons.csv.CSVFormat.DEFAULT.
> >> >
> >>
> >> That's where the name of this component gets confusing to me.  Since
> >> it's called "CSV", it would make sense that we follow RFC 4180, which
> >> defines the standard for comma-separated value files and thus the
> >> default record separator would be CRLF.  However, we are allowing
> >> users to define whatever format they want using properties of the
> >> CSVFormat class (of course, if you use delimiter != ',', then it's not
> >> really CSV).  So, what's the intent?  This is more of a
> >> delimited-record format parser/writer component which supports CSV.
> >> Thus, it is not really very well-named.
>
> CSV could stand for Character Separated Variables.
>

Ah... very clever. TLA overloading, I love it!

Gary


>
> Although CSV usually means comma-separated, I think it is treated as a
> generic name sufficiently often that the it is not likely to be a big
> problem.
>
> The difficulty with all the other names is that they are not at all well
> known.
>
> >>
> >
> > Why not rename DEFAULT to RFC418?
>
> I agree that DEFAULT is a poor name, but could not get agreement to
> change it previously.
>
> There is already an RFC4180; DEFAULT is RFC4180 + allow blank lines.
>
> > Gary
> >
> >
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> >>
> >
> >
> > --
> > E-Mail: garydgregory@gmail.com | ggregory@apache.org
> > JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> > Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> > Blog: http://garygregory.wordpress.com
> > Home: http://garygregory.com/
> > Tweet! http://twitter.com/GaryGregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by sebb <se...@gmail.com>.
On 16 October 2012 21:00, Gary Gregory <ga...@gmail.com> wrote:
> On Tue, Oct 16, 2012 at 3:38 PM, James Carman <ja...@carmanconsulting.com>wrote:
>
>> On Tue, Oct 16, 2012 at 2:25 PM, Gary Gregory <ga...@gmail.com>
>> wrote:
>> >
>> > I did not do this one as is it seems RFC4180 defines CR+LF as the record
>> > separator as noted in the Javadoc for
>> > org.apache.commons.csv.CSVFormat.DEFAULT.
>> >
>>
>> That's where the name of this component gets confusing to me.  Since
>> it's called "CSV", it would make sense that we follow RFC 4180, which
>> defines the standard for comma-separated value files and thus the
>> default record separator would be CRLF.  However, we are allowing
>> users to define whatever format they want using properties of the
>> CSVFormat class (of course, if you use delimiter != ',', then it's not
>> really CSV).  So, what's the intent?  This is more of a
>> delimited-record format parser/writer component which supports CSV.
>> Thus, it is not really very well-named.

CSV could stand for Character Separated Variables.

Although CSV usually means comma-separated, I think it is treated as a
generic name sufficiently often that the it is not likely to be a big
problem.

The difficulty with all the other names is that they are not at all well known.

>>
>
> Why not rename DEFAULT to RFC418?

I agree that DEFAULT is a poor name, but could not get agreement to
change it previously.

There is already an RFC4180; DEFAULT is RFC4180 + allow blank lines.

> Gary
>
>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> --
> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Oct 16, 2012 at 3:38 PM, James Carman <ja...@carmanconsulting.com>wrote:

> On Tue, Oct 16, 2012 at 2:25 PM, Gary Gregory <ga...@gmail.com>
> wrote:
> >
> > I did not do this one as is it seems RFC4180 defines CR+LF as the record
> > separator as noted in the Javadoc for
> > org.apache.commons.csv.CSVFormat.DEFAULT.
> >
>
> That's where the name of this component gets confusing to me.  Since
> it's called "CSV", it would make sense that we follow RFC 4180, which
> defines the standard for comma-separated value files and thus the
> default record separator would be CRLF.  However, we are allowing
> users to define whatever format they want using properties of the
> CSVFormat class (of course, if you use delimiter != ',', then it's not
> really CSV).  So, what's the intent?  This is more of a
> delimited-record format parser/writer component which supports CSV.
> Thus, it is not really very well-named.
>

Why not rename DEFAULT to RFC418?

Gary


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by James Carman <ja...@carmanconsulting.com>.
On Tue, Oct 16, 2012 at 2:25 PM, Gary Gregory <ga...@gmail.com> wrote:
>
> I did not do this one as is it seems RFC4180 defines CR+LF as the record
> separator as noted in the Javadoc for
> org.apache.commons.csv.CSVFormat.DEFAULT.
>

That's where the name of this component gets confusing to me.  Since
it's called "CSV", it would make sense that we follow RFC 4180, which
defines the standard for comma-separated value files and thus the
default record separator would be CRLF.  However, we are allowing
users to define whatever format they want using properties of the
CSVFormat class (of course, if you use delimiter != ',', then it's not
really CSV).  So, what's the intent?  This is more of a
delimited-record format parser/writer component which supports CSV.
Thus, it is not really very well-named.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Oct 16, 2012 at 11:43 AM, sebb <se...@gmail.com> wrote:

> On 16 October 2012 16:34, Gary Gregory <ga...@gmail.com> wrote:
> > On Tue, Oct 16, 2012 at 11:29 AM, Gary Gregory <garydgregory@gmail.com
> >wrote:
> >
> >> On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <gudnabrsam@gmail.com
> >wrote:
> >>
> >>> Random thoughts--no real context here, so no way to inline:
> >>>
> >>> - "line separator" concept, while harmonizing with the line.separator
> >>> system property, might be better represented as "row separator" so as
> >>> not to imply that the parameter should be in any way limited to \r or
> >>> \n .  I would think the default for this would be the line.separator
> >>> property, however, and thus should take a String or CharSequence
> >>> (perhaps it already does, but there's been so much talk about char
> >>> parameters...).
> >>>
> >>
> >> Now that you mention it, this should have been obvious as soon as we
> wrote
> >> the test cases where a record is split over more than one line.
> >>
> >> There is a difference between line number and record number which the
> API
> >> tracks.
> >>
> >> I propose to change "line separator" to "record separator".


Folks seem to like this one, it is now in SVN.


> The default
> >> can be line.separator.
>

I did not do this one as is it seems RFC4180 defines CR+LF as the record
separator as noted in the Javadoc for
org.apache.commons.csv.CSVFormat.DEFAULT.

Gary


> >>
> >> Gary
> >>
> >>
> >>>
> >>> - with* methods:  just something to think about here, but while we're
> >>> creating a fluent API, would e.g. #delimitedBy('\t') read more
> >>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
> >>> #withEscape('\\') ?
> >>>
> >>
> > I find that the combination of the fluent API style AND immutability of
> the
> > format class ugly because of the PRISTINE & DISABLED internal crud.
> >
> > Why not just have DEFAULT and dump PRISTINE? Other formats should be
> based
> > on DEFAULT.
>
> No, because DEFAULT includes several settings that may not be required.
>
> > With PRISTINE, the door is open for a future format to not override
> > DISABLED and create a bug, as unlikely as it is.
>
> With DEFAULT, the door is *already* open for bugs due to failure to
> reset the unwanted settings.
>
> It's not possible currently to create an instance without overriding
> the DISABLED delimiter.
>
> > Gary
> >
> >
> >
> >
> >>> $0.02,
> >>> Matt
> >>>
> >>> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
> >>> <Jo...@scalaris.com> wrote:
> >>> > Gary Gregory wrote:
> >>> >
> >>> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
> >>> >> <Jo...@scalaris.com>wrote:
> >>> >>
> >>> >>> Hi Gary,
> >>> >>>
> >>> >>> Gary Gregory wrote:
> >>> >>>
> >>> >>> > Hi All:
> >>> >>> >
> >>> >>> > The format object can configure various aspects of input and
> output
> >>> >>> > formatting.
> >>> >>> >
> >>> >>> > With my recent addition of the Quote enum for [CSV-53], there are
> >>> now
> >>> >>> > two aspects of quoting to configure: the quote character and the
> >>> quote
> >>> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
> >>> currently
> >>> >>> > not implemented.
> >>> >>> >
> >>> >>> > First, I changed (without consulting this list, and please
> accept my
> >>> >>> > apologies for this) the - IMO - cryptic and burdensome
> terminology
> >>> of
> >>> >>> > "encapsulator" to "quote char", and added "quote policy":
> >>> >>> >
> >>> >>> > - withQuoteChar(char)
> >>> >>> > - withQuotePolicy(Quote)
> >>> >>> >
> >>> >>> > My intention here is that all Quote APIs start with "withQuote"
> >>> >>> > followed by what aspect of quoting is being configured.
> >>> >>> >
> >>> >>> > Alternatively, we could have:
> >>> >>> >
> >>> >>> > - withQuote(char)
> >>> >>> > - withQuotePolicy(Quote)
> >>> >>>
> >>> >>> or
> >>> >>>
> >>> >>> - withQuote(char)
> >>> >>> - withQuote(Quote)
> >>> >>>
> >>> >>> ;-)
> >>> >>>
> >>> >>
> >>> >> Darn, I wish I knew you better to know if you were joking! :)
> >>> >>
> >>> >> This would not be good IMO because you are configuring two different
> >>> >> aspects of the behavior. When I see the same API name with different
> >>> >> parameters, I think that they are the same and that the API just
> does
> >>> >> conversions.
> >>> >>
> >>> >> We could consider making Quote a class instead of an enum and have
> it
> >>> >> carry a char and an enum, such that one object defines all quoting
> >>> >> aspects. This might be too normalized a design for something so
> simple
> >>> >> though.
> >>> >
> >>> > Actually I did not had a closer look to the API. You're definitely
> >>> right to
> >>> > use different names for different aspects. It does not make sense to
> >>> > overload just for fun.
> >>> >
> >>> >>
> >>> >>
> >>> >>>
> >>> >>> > Which makes the API more consistent with the other char/Character
> >>> based
> >>> >>> > properties:
> >>> >>> >
> >>> >>> > - withEscape
> >>> >>> > - withDelimiter
> >>> >>> > - withLineSeparator
> >>> >>> > - withCommentStart
> >>> >>> >
> >>> >>> > none of the above are post-fixed with a "Char" in the name.
> >>> >>> >
> >>> >>> > As far as reading, for me, the "-r" names are OK because the they
> >>> are
> >>> >>> > nouns (things): "a delimiter", "a line separator." But I do not
> talk
> >>> >>> about
> >>> >>> > "an escape" because that would be an act (think Alcatraz) as
> >>> opposed to
> >>> >>> > what we have here: a character used to /perform/ escapes.
> >>> >>> >
> >>> >>> > So I propose to change "escape" to "escape char" because
> "escaper"
> >>> is
> >>> >>> > not a word.
> >>> >>> >
> >>> >>> > The name "comment start" is not great also because it implies (to
> >>> me)
> >>> >>> that
> >>> >>> > there is a "comment end" missing. So plain "comment" or "comment
> >>> char"
> >>> >>> > would be better.
> >>> >>>
> >>> >>> Who said it has to be a single char?
> >>> >>>
> >>> >>
> >>> >> The current implementation does. ;)
> >>> >>
> >>> >> Are comments even in any RFC?
> >>> >
> >>> > Not that I am aware of.
> >>> >
> >>> >>> .withEOLComment("//")
> >>> >>>
> >>> >>>
> >>> >>> Same applies to the line separator:
> >>> >>>
> >>> >>> .withLineSeparator("\n\r")
> >>> >>>
> >>> >>> > Circling back to "quote char" which I have the way it is now for
> the
> >>> >>> > same reason as for the "escape" property.
> >>> >>> >
> >>> >>> > In summary, using *Char names is better IMO.
> >>> >>>
> >>> >>> Only if it can be a single char only. If it can either be a single
> >>> char
> >>> >>> or a
> >>> >>> String, I normally tend to use overloaded methods:
> >>> >>>
> >>> >>> - withEOLComment(char)
> >>> >>> - withEOLComment(CharSequence)
> >>> >>>
> >>> >>
> >>> >> If you want to add // to the mix, please start a different thread.
> I'm
> >>> not
> >>> >> sure this is really needed. Do you have a real life use case?
> >>> >
> >>> > People come up with all kind of "solutions" they are used to. CSV is
> >>> brittle
> >>> > anyway, just because there is no "real" standard.
> >>> >
> >>> > Cheers,
> >>> > Jörg
> >>> >
> >>> >
> >>> > ---------------------------------------------------------------------
> >>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> > For additional commands, e-mail: dev-help@commons.apache.org
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> >>>
> >>
> >>
> >> --
> >> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> >> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> >> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> >> Blog: http://garygregory.wordpress.com
> >> Home: http://garygregory.com/
> >> Tweet! http://twitter.com/GaryGregory
> >>
> >
> >
> >
> > --
> > E-Mail: garydgregory@gmail.com | ggregory@apache.org
> > JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> > Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> > Blog: http://garygregory.wordpress.com
> > Home: http://garygregory.com/
> > Tweet! http://twitter.com/GaryGregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by sebb <se...@gmail.com>.
On 16 October 2012 16:34, Gary Gregory <ga...@gmail.com> wrote:
> On Tue, Oct 16, 2012 at 11:29 AM, Gary Gregory <ga...@gmail.com>wrote:
>
>> On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <gu...@gmail.com>wrote:
>>
>>> Random thoughts--no real context here, so no way to inline:
>>>
>>> - "line separator" concept, while harmonizing with the line.separator
>>> system property, might be better represented as "row separator" so as
>>> not to imply that the parameter should be in any way limited to \r or
>>> \n .  I would think the default for this would be the line.separator
>>> property, however, and thus should take a String or CharSequence
>>> (perhaps it already does, but there's been so much talk about char
>>> parameters...).
>>>
>>
>> Now that you mention it, this should have been obvious as soon as we wrote
>> the test cases where a record is split over more than one line.
>>
>> There is a difference between line number and record number which the API
>> tracks.
>>
>> I propose to change "line separator" to "record separator". The default
>> can be line.separator.
>>
>> Gary
>>
>>
>>>
>>> - with* methods:  just something to think about here, but while we're
>>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>>> #withEscape('\\') ?
>>>
>>
> I find that the combination of the fluent API style AND immutability of the
> format class ugly because of the PRISTINE & DISABLED internal crud.
>
> Why not just have DEFAULT and dump PRISTINE? Other formats should be based
> on DEFAULT.

No, because DEFAULT includes several settings that may not be required.

> With PRISTINE, the door is open for a future format to not override
> DISABLED and create a bug, as unlikely as it is.

With DEFAULT, the door is *already* open for bugs due to failure to
reset the unwanted settings.

It's not possible currently to create an instance without overriding
the DISABLED delimiter.

> Gary
>
>
>
>
>>> $0.02,
>>> Matt
>>>
>>> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
>>> <Jo...@scalaris.com> wrote:
>>> > Gary Gregory wrote:
>>> >
>>> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
>>> >> <Jo...@scalaris.com>wrote:
>>> >>
>>> >>> Hi Gary,
>>> >>>
>>> >>> Gary Gregory wrote:
>>> >>>
>>> >>> > Hi All:
>>> >>> >
>>> >>> > The format object can configure various aspects of input and output
>>> >>> > formatting.
>>> >>> >
>>> >>> > With my recent addition of the Quote enum for [CSV-53], there are
>>> now
>>> >>> > two aspects of quoting to configure: the quote character and the
>>> quote
>>> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
>>> currently
>>> >>> > not implemented.
>>> >>> >
>>> >>> > First, I changed (without consulting this list, and please accept my
>>> >>> > apologies for this) the - IMO - cryptic and burdensome terminology
>>> of
>>> >>> > "encapsulator" to "quote char", and added "quote policy":
>>> >>> >
>>> >>> > - withQuoteChar(char)
>>> >>> > - withQuotePolicy(Quote)
>>> >>> >
>>> >>> > My intention here is that all Quote APIs start with "withQuote"
>>> >>> > followed by what aspect of quoting is being configured.
>>> >>> >
>>> >>> > Alternatively, we could have:
>>> >>> >
>>> >>> > - withQuote(char)
>>> >>> > - withQuotePolicy(Quote)
>>> >>>
>>> >>> or
>>> >>>
>>> >>> - withQuote(char)
>>> >>> - withQuote(Quote)
>>> >>>
>>> >>> ;-)
>>> >>>
>>> >>
>>> >> Darn, I wish I knew you better to know if you were joking! :)
>>> >>
>>> >> This would not be good IMO because you are configuring two different
>>> >> aspects of the behavior. When I see the same API name with different
>>> >> parameters, I think that they are the same and that the API just does
>>> >> conversions.
>>> >>
>>> >> We could consider making Quote a class instead of an enum and have it
>>> >> carry a char and an enum, such that one object defines all quoting
>>> >> aspects. This might be too normalized a design for something so simple
>>> >> though.
>>> >
>>> > Actually I did not had a closer look to the API. You're definitely
>>> right to
>>> > use different names for different aspects. It does not make sense to
>>> > overload just for fun.
>>> >
>>> >>
>>> >>
>>> >>>
>>> >>> > Which makes the API more consistent with the other char/Character
>>> based
>>> >>> > properties:
>>> >>> >
>>> >>> > - withEscape
>>> >>> > - withDelimiter
>>> >>> > - withLineSeparator
>>> >>> > - withCommentStart
>>> >>> >
>>> >>> > none of the above are post-fixed with a "Char" in the name.
>>> >>> >
>>> >>> > As far as reading, for me, the "-r" names are OK because the they
>>> are
>>> >>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>>> >>> about
>>> >>> > "an escape" because that would be an act (think Alcatraz) as
>>> opposed to
>>> >>> > what we have here: a character used to /perform/ escapes.
>>> >>> >
>>> >>> > So I propose to change "escape" to "escape char" because "escaper"
>>> is
>>> >>> > not a word.
>>> >>> >
>>> >>> > The name "comment start" is not great also because it implies (to
>>> me)
>>> >>> that
>>> >>> > there is a "comment end" missing. So plain "comment" or "comment
>>> char"
>>> >>> > would be better.
>>> >>>
>>> >>> Who said it has to be a single char?
>>> >>>
>>> >>
>>> >> The current implementation does. ;)
>>> >>
>>> >> Are comments even in any RFC?
>>> >
>>> > Not that I am aware of.
>>> >
>>> >>> .withEOLComment("//")
>>> >>>
>>> >>>
>>> >>> Same applies to the line separator:
>>> >>>
>>> >>> .withLineSeparator("\n\r")
>>> >>>
>>> >>> > Circling back to "quote char" which I have the way it is now for the
>>> >>> > same reason as for the "escape" property.
>>> >>> >
>>> >>> > In summary, using *Char names is better IMO.
>>> >>>
>>> >>> Only if it can be a single char only. If it can either be a single
>>> char
>>> >>> or a
>>> >>> String, I normally tend to use overloaded methods:
>>> >>>
>>> >>> - withEOLComment(char)
>>> >>> - withEOLComment(CharSequence)
>>> >>>
>>> >>
>>> >> If you want to add // to the mix, please start a different thread. I'm
>>> not
>>> >> sure this is really needed. Do you have a real life use case?
>>> >
>>> > People come up with all kind of "solutions" they are used to. CSV is
>>> brittle
>>> > anyway, just because there is no "real" standard.
>>> >
>>> > Cheers,
>>> > Jörg
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> > For additional commands, e-mail: dev-help@commons.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> --
>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
>> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>
>
>
> --
> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Oct 16, 2012 at 11:29 AM, Gary Gregory <ga...@gmail.com>wrote:

> On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <gu...@gmail.com>wrote:
>
>> Random thoughts--no real context here, so no way to inline:
>>
>> - "line separator" concept, while harmonizing with the line.separator
>> system property, might be better represented as "row separator" so as
>> not to imply that the parameter should be in any way limited to \r or
>> \n .  I would think the default for this would be the line.separator
>> property, however, and thus should take a String or CharSequence
>> (perhaps it already does, but there's been so much talk about char
>> parameters...).
>>
>
> Now that you mention it, this should have been obvious as soon as we wrote
> the test cases where a record is split over more than one line.
>
> There is a difference between line number and record number which the API
> tracks.
>
> I propose to change "line separator" to "record separator". The default
> can be line.separator.
>
> Gary
>
>
>>
>> - with* methods:  just something to think about here, but while we're
>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>> #withEscape('\\') ?
>>
>
I find that the combination of the fluent API style AND immutability of the
format class ugly because of the PRISTINE & DISABLED internal crud.

Why not just have DEFAULT and dump PRISTINE? Other formats should be based
on DEFAULT.

With PRISTINE, the door is open for a future format to not override
DISABLED and create a bug, as unlikely as it is.

Gary




>> $0.02,
>> Matt
>>
>> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
>> <Jo...@scalaris.com> wrote:
>> > Gary Gregory wrote:
>> >
>> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
>> >> <Jo...@scalaris.com>wrote:
>> >>
>> >>> Hi Gary,
>> >>>
>> >>> Gary Gregory wrote:
>> >>>
>> >>> > Hi All:
>> >>> >
>> >>> > The format object can configure various aspects of input and output
>> >>> > formatting.
>> >>> >
>> >>> > With my recent addition of the Quote enum for [CSV-53], there are
>> now
>> >>> > two aspects of quoting to configure: the quote character and the
>> quote
>> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
>> currently
>> >>> > not implemented.
>> >>> >
>> >>> > First, I changed (without consulting this list, and please accept my
>> >>> > apologies for this) the - IMO - cryptic and burdensome terminology
>> of
>> >>> > "encapsulator" to "quote char", and added "quote policy":
>> >>> >
>> >>> > - withQuoteChar(char)
>> >>> > - withQuotePolicy(Quote)
>> >>> >
>> >>> > My intention here is that all Quote APIs start with "withQuote"
>> >>> > followed by what aspect of quoting is being configured.
>> >>> >
>> >>> > Alternatively, we could have:
>> >>> >
>> >>> > - withQuote(char)
>> >>> > - withQuotePolicy(Quote)
>> >>>
>> >>> or
>> >>>
>> >>> - withQuote(char)
>> >>> - withQuote(Quote)
>> >>>
>> >>> ;-)
>> >>>
>> >>
>> >> Darn, I wish I knew you better to know if you were joking! :)
>> >>
>> >> This would not be good IMO because you are configuring two different
>> >> aspects of the behavior. When I see the same API name with different
>> >> parameters, I think that they are the same and that the API just does
>> >> conversions.
>> >>
>> >> We could consider making Quote a class instead of an enum and have it
>> >> carry a char and an enum, such that one object defines all quoting
>> >> aspects. This might be too normalized a design for something so simple
>> >> though.
>> >
>> > Actually I did not had a closer look to the API. You're definitely
>> right to
>> > use different names for different aspects. It does not make sense to
>> > overload just for fun.
>> >
>> >>
>> >>
>> >>>
>> >>> > Which makes the API more consistent with the other char/Character
>> based
>> >>> > properties:
>> >>> >
>> >>> > - withEscape
>> >>> > - withDelimiter
>> >>> > - withLineSeparator
>> >>> > - withCommentStart
>> >>> >
>> >>> > none of the above are post-fixed with a "Char" in the name.
>> >>> >
>> >>> > As far as reading, for me, the "-r" names are OK because the they
>> are
>> >>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>> >>> about
>> >>> > "an escape" because that would be an act (think Alcatraz) as
>> opposed to
>> >>> > what we have here: a character used to /perform/ escapes.
>> >>> >
>> >>> > So I propose to change "escape" to "escape char" because "escaper"
>> is
>> >>> > not a word.
>> >>> >
>> >>> > The name "comment start" is not great also because it implies (to
>> me)
>> >>> that
>> >>> > there is a "comment end" missing. So plain "comment" or "comment
>> char"
>> >>> > would be better.
>> >>>
>> >>> Who said it has to be a single char?
>> >>>
>> >>
>> >> The current implementation does. ;)
>> >>
>> >> Are comments even in any RFC?
>> >
>> > Not that I am aware of.
>> >
>> >>> .withEOLComment("//")
>> >>>
>> >>>
>> >>> Same applies to the line separator:
>> >>>
>> >>> .withLineSeparator("\n\r")
>> >>>
>> >>> > Circling back to "quote char" which I have the way it is now for the
>> >>> > same reason as for the "escape" property.
>> >>> >
>> >>> > In summary, using *Char names is better IMO.
>> >>>
>> >>> Only if it can be a single char only. If it can either be a single
>> char
>> >>> or a
>> >>> String, I normally tend to use overloaded methods:
>> >>>
>> >>> - withEOLComment(char)
>> >>> - withEOLComment(CharSequence)
>> >>>
>> >>
>> >> If you want to add // to the mix, please start a different thread. I'm
>> not
>> >> sure this is really needed. Do you have a real life use case?
>> >
>> > People come up with all kind of "solutions" they are used to. CSV is
>> brittle
>> > anyway, just because there is no "real" standard.
>> >
>> > Cheers,
>> > Jörg
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> > For additional commands, e-mail: dev-help@commons.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> --
> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by sebb <se...@gmail.com>.
On 16 October 2012 16:29, Gary Gregory <ga...@gmail.com> wrote:
> On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <gu...@gmail.com> wrote:
>
>> Random thoughts--no real context here, so no way to inline:
>>
>> - "line separator" concept, while harmonizing with the line.separator
>> system property, might be better represented as "row separator" so as
>> not to imply that the parameter should be in any way limited to \r or
>> \n .  I would think the default for this would be the line.separator
>> property, however, and thus should take a String or CharSequence
>> (perhaps it already does, but there's been so much talk about char
>> parameters...).
>>
>
> Now that you mention it, this should have been obvious as soon as we wrote
> the test cases where a record is split over more than one line.
>
> There is a difference between line number and record number which the API
> tracks.
>
> I propose to change "line separator" to "record separator". The default can
> be line.separator.

OK.
I prefer record to row.

> Gary
>
>
>>
>> - with* methods:  just something to think about here, but while we're
>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>> #withEscape('\\') ?
>>
>> $0.02,
>> Matt
>>
>> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
>> <Jo...@scalaris.com> wrote:
>> > Gary Gregory wrote:
>> >
>> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
>> >> <Jo...@scalaris.com>wrote:
>> >>
>> >>> Hi Gary,
>> >>>
>> >>> Gary Gregory wrote:
>> >>>
>> >>> > Hi All:
>> >>> >
>> >>> > The format object can configure various aspects of input and output
>> >>> > formatting.
>> >>> >
>> >>> > With my recent addition of the Quote enum for [CSV-53], there are now
>> >>> > two aspects of quoting to configure: the quote character and the
>> quote
>> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
>> currently
>> >>> > not implemented.
>> >>> >
>> >>> > First, I changed (without consulting this list, and please accept my
>> >>> > apologies for this) the - IMO - cryptic and burdensome terminology of
>> >>> > "encapsulator" to "quote char", and added "quote policy":
>> >>> >
>> >>> > - withQuoteChar(char)
>> >>> > - withQuotePolicy(Quote)
>> >>> >
>> >>> > My intention here is that all Quote APIs start with "withQuote"
>> >>> > followed by what aspect of quoting is being configured.
>> >>> >
>> >>> > Alternatively, we could have:
>> >>> >
>> >>> > - withQuote(char)
>> >>> > - withQuotePolicy(Quote)
>> >>>
>> >>> or
>> >>>
>> >>> - withQuote(char)
>> >>> - withQuote(Quote)
>> >>>
>> >>> ;-)
>> >>>
>> >>
>> >> Darn, I wish I knew you better to know if you were joking! :)
>> >>
>> >> This would not be good IMO because you are configuring two different
>> >> aspects of the behavior. When I see the same API name with different
>> >> parameters, I think that they are the same and that the API just does
>> >> conversions.
>> >>
>> >> We could consider making Quote a class instead of an enum and have it
>> >> carry a char and an enum, such that one object defines all quoting
>> >> aspects. This might be too normalized a design for something so simple
>> >> though.
>> >
>> > Actually I did not had a closer look to the API. You're definitely right
>> to
>> > use different names for different aspects. It does not make sense to
>> > overload just for fun.
>> >
>> >>
>> >>
>> >>>
>> >>> > Which makes the API more consistent with the other char/Character
>> based
>> >>> > properties:
>> >>> >
>> >>> > - withEscape
>> >>> > - withDelimiter
>> >>> > - withLineSeparator
>> >>> > - withCommentStart
>> >>> >
>> >>> > none of the above are post-fixed with a "Char" in the name.
>> >>> >
>> >>> > As far as reading, for me, the "-r" names are OK because the they are
>> >>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>> >>> about
>> >>> > "an escape" because that would be an act (think Alcatraz) as opposed
>> to
>> >>> > what we have here: a character used to /perform/ escapes.
>> >>> >
>> >>> > So I propose to change "escape" to "escape char" because "escaper" is
>> >>> > not a word.
>> >>> >
>> >>> > The name "comment start" is not great also because it implies (to me)
>> >>> that
>> >>> > there is a "comment end" missing. So plain "comment" or "comment
>> char"
>> >>> > would be better.
>> >>>
>> >>> Who said it has to be a single char?
>> >>>
>> >>
>> >> The current implementation does. ;)
>> >>
>> >> Are comments even in any RFC?
>> >
>> > Not that I am aware of.
>> >
>> >>> .withEOLComment("//")
>> >>>
>> >>>
>> >>> Same applies to the line separator:
>> >>>
>> >>> .withLineSeparator("\n\r")
>> >>>
>> >>> > Circling back to "quote char" which I have the way it is now for the
>> >>> > same reason as for the "escape" property.
>> >>> >
>> >>> > In summary, using *Char names is better IMO.
>> >>>
>> >>> Only if it can be a single char only. If it can either be a single char
>> >>> or a
>> >>> String, I normally tend to use overloaded methods:
>> >>>
>> >>> - withEOLComment(char)
>> >>> - withEOLComment(CharSequence)
>> >>>
>> >>
>> >> If you want to add // to the mix, please start a different thread. I'm
>> not
>> >> sure this is really needed. Do you have a real life use case?
>> >
>> > People come up with all kind of "solutions" they are used to. CSV is
>> brittle
>> > anyway, just because there is no "real" standard.
>> >
>> > Cheers,
>> > Jörg
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> > For additional commands, e-mail: dev-help@commons.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> --
> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Oct 16, 2012 at 11:04 AM, Matt Benson <gu...@gmail.com> wrote:

> Random thoughts--no real context here, so no way to inline:
>
> - "line separator" concept, while harmonizing with the line.separator
> system property, might be better represented as "row separator" so as
> not to imply that the parameter should be in any way limited to \r or
> \n .  I would think the default for this would be the line.separator
> property, however, and thus should take a String or CharSequence
> (perhaps it already does, but there's been so much talk about char
> parameters...).
>

Now that you mention it, this should have been obvious as soon as we wrote
the test cases where a record is split over more than one line.

There is a difference between line number and record number which the API
tracks.

I propose to change "line separator" to "record separator". The default can
be line.separator.

Gary


>
> - with* methods:  just something to think about here, but while we're
> creating a fluent API, would e.g. #delimitedBy('\t') read more
> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
> #withEscape('\\') ?
>
> $0.02,
> Matt
>
> On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
> <Jo...@scalaris.com> wrote:
> > Gary Gregory wrote:
> >
> >> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
> >> <Jo...@scalaris.com>wrote:
> >>
> >>> Hi Gary,
> >>>
> >>> Gary Gregory wrote:
> >>>
> >>> > Hi All:
> >>> >
> >>> > The format object can configure various aspects of input and output
> >>> > formatting.
> >>> >
> >>> > With my recent addition of the Quote enum for [CSV-53], there are now
> >>> > two aspects of quoting to configure: the quote character and the
> quote
> >>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is
> currently
> >>> > not implemented.
> >>> >
> >>> > First, I changed (without consulting this list, and please accept my
> >>> > apologies for this) the - IMO - cryptic and burdensome terminology of
> >>> > "encapsulator" to "quote char", and added "quote policy":
> >>> >
> >>> > - withQuoteChar(char)
> >>> > - withQuotePolicy(Quote)
> >>> >
> >>> > My intention here is that all Quote APIs start with "withQuote"
> >>> > followed by what aspect of quoting is being configured.
> >>> >
> >>> > Alternatively, we could have:
> >>> >
> >>> > - withQuote(char)
> >>> > - withQuotePolicy(Quote)
> >>>
> >>> or
> >>>
> >>> - withQuote(char)
> >>> - withQuote(Quote)
> >>>
> >>> ;-)
> >>>
> >>
> >> Darn, I wish I knew you better to know if you were joking! :)
> >>
> >> This would not be good IMO because you are configuring two different
> >> aspects of the behavior. When I see the same API name with different
> >> parameters, I think that they are the same and that the API just does
> >> conversions.
> >>
> >> We could consider making Quote a class instead of an enum and have it
> >> carry a char and an enum, such that one object defines all quoting
> >> aspects. This might be too normalized a design for something so simple
> >> though.
> >
> > Actually I did not had a closer look to the API. You're definitely right
> to
> > use different names for different aspects. It does not make sense to
> > overload just for fun.
> >
> >>
> >>
> >>>
> >>> > Which makes the API more consistent with the other char/Character
> based
> >>> > properties:
> >>> >
> >>> > - withEscape
> >>> > - withDelimiter
> >>> > - withLineSeparator
> >>> > - withCommentStart
> >>> >
> >>> > none of the above are post-fixed with a "Char" in the name.
> >>> >
> >>> > As far as reading, for me, the "-r" names are OK because the they are
> >>> > nouns (things): "a delimiter", "a line separator." But I do not talk
> >>> about
> >>> > "an escape" because that would be an act (think Alcatraz) as opposed
> to
> >>> > what we have here: a character used to /perform/ escapes.
> >>> >
> >>> > So I propose to change "escape" to "escape char" because "escaper" is
> >>> > not a word.
> >>> >
> >>> > The name "comment start" is not great also because it implies (to me)
> >>> that
> >>> > there is a "comment end" missing. So plain "comment" or "comment
> char"
> >>> > would be better.
> >>>
> >>> Who said it has to be a single char?
> >>>
> >>
> >> The current implementation does. ;)
> >>
> >> Are comments even in any RFC?
> >
> > Not that I am aware of.
> >
> >>> .withEOLComment("//")
> >>>
> >>>
> >>> Same applies to the line separator:
> >>>
> >>> .withLineSeparator("\n\r")
> >>>
> >>> > Circling back to "quote char" which I have the way it is now for the
> >>> > same reason as for the "escape" property.
> >>> >
> >>> > In summary, using *Char names is better IMO.
> >>>
> >>> Only if it can be a single char only. If it can either be a single char
> >>> or a
> >>> String, I normally tend to use overloaded methods:
> >>>
> >>> - withEOLComment(char)
> >>> - withEOLComment(CharSequence)
> >>>
> >>
> >> If you want to add // to the mix, please start a different thread. I'm
> not
> >> sure this is really needed. Do you have a real life use case?
> >
> > People come up with all kind of "solutions" they are used to. CSV is
> brittle
> > anyway, just because there is no "real" standard.
> >
> > Cheers,
> > Jörg
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by Benedikt Ritter <be...@gmail.com>.
2012/10/17 sebb <se...@gmail.com>:
> On 16 October 2012 21:56, Benedikt Ritter <be...@gmail.com> wrote:
>> 2012/10/16 Gary Gregory <ga...@gmail.com>:
>>> On Tue, Oct 16, 2012 at 1:00 PM, Stephen Colebourne <sc...@joda.org>wrote:
>>>
>>>> On 16 October 2012 17:44, Matt Benson <gu...@gmail.com> wrote:
>>>> > On Tue, Oct 16, 2012 at 11:42 AM, James Carman
>>>> > <ja...@carmanconsulting.com> wrote:
>>>> >> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <gu...@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>> Are these specific examples not the words you would actually use were
>>>> >>> you having a discussion on the subject in English?  :P
>>>> >>>
>>>> >>
>>>> >> Why not just support both?  The "with*" methods would just be aliases
>>>> >> for the more "natural language" method names.
>>>>
>>>> I would categorise first in two
>>>> - mutable builders producing immutable objects
>>>> - immutable objects
>>>>
>>>> The former should generally have short methods without prefixes, the
>>>> latter is more complex.
>>>>
>>>> For the latter, as a general rule, I use
>>>> withXxx()/plusXxx()/minusXxx() for items that affect the state and
>>>> past participle for other methods that manipulate the object in other
>>>> ways:
>>>>
>>>> // affects state (year/month/day)
>>>>  date = date.withYear(2012)
>>>>  date = date.plusYears(6)
>>>> // aftect multiple pieces of state, so past participle
>>>>  period = period.multipliedBy(6)
>>>>  period = period.negated()
>>>>
>>>> This is simply an extension of when you might use setXxx() on a bean,
>>>> and when you might use a named method.
>>>>
>>>
>>> I like the idea of two classes: CVSFormat and CVSFormatBuilder but...
>>>
>>> My next question is: Does CVSFormat have any public constructors? If not,
>>> the builder can throw exceptions when one of its methods is called and
>>> validation fails. This is nice in the sense that the format object feels
>>> more lightweight and has a simpler/shorter protocol. It also leaves room
>>> for other builders to be added (to configured formats from config files for
>>> example) without growing the format class itself.
>>>
>>> If CVSFormat does have public constructors, then the format class still has
>>> to do its own validation. What I gain is the choice of using a kitchen sink
>>> constructor or the fluent builder API, I can choose my style.
>>>
>>> If there are two classes, and I cannot build a format without a format
>>> builder, then why not collapse the two classes into one?
>>>
>>
>> Hi Gary,
>>
>> I agree. I'd favor to have no public constructors and a builder that
>> is an internal class of CSVFormat. Users create CSVFormatBuilders by
>> calling a static method on CSVFormat:
>>
>> CSVFormat format =
>> CSVFormat.defaults().withDelimiter('#').withCommentStart('/').build();
>>
>> Where defaults() returns a builder that is initialized with (suprise)
>> the values of the default format. No need to call a validate method.
>
> If you mean that the build method does the validation, then I agree.

yep, the build should take care of the validation.

> I think validation is necessary to check that the defined
> meta-characters are distinct.
>
> We could ignore validation and let the user define
> escape=delimiter=quote , but I suspect that would generate a lot of
> unnecessary user queries when things then go wrong in odd ways.
>
>> Benedikt
>>
>>> Gary
>>>
>>>
>>>> Stephen
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>>> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
>>> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by sebb <se...@gmail.com>.
On 16 October 2012 21:56, Benedikt Ritter <be...@gmail.com> wrote:
> 2012/10/16 Gary Gregory <ga...@gmail.com>:
>> On Tue, Oct 16, 2012 at 1:00 PM, Stephen Colebourne <sc...@joda.org>wrote:
>>
>>> On 16 October 2012 17:44, Matt Benson <gu...@gmail.com> wrote:
>>> > On Tue, Oct 16, 2012 at 11:42 AM, James Carman
>>> > <ja...@carmanconsulting.com> wrote:
>>> >> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <gu...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Are these specific examples not the words you would actually use were
>>> >>> you having a discussion on the subject in English?  :P
>>> >>>
>>> >>
>>> >> Why not just support both?  The "with*" methods would just be aliases
>>> >> for the more "natural language" method names.
>>>
>>> I would categorise first in two
>>> - mutable builders producing immutable objects
>>> - immutable objects
>>>
>>> The former should generally have short methods without prefixes, the
>>> latter is more complex.
>>>
>>> For the latter, as a general rule, I use
>>> withXxx()/plusXxx()/minusXxx() for items that affect the state and
>>> past participle for other methods that manipulate the object in other
>>> ways:
>>>
>>> // affects state (year/month/day)
>>>  date = date.withYear(2012)
>>>  date = date.plusYears(6)
>>> // aftect multiple pieces of state, so past participle
>>>  period = period.multipliedBy(6)
>>>  period = period.negated()
>>>
>>> This is simply an extension of when you might use setXxx() on a bean,
>>> and when you might use a named method.
>>>
>>
>> I like the idea of two classes: CVSFormat and CVSFormatBuilder but...
>>
>> My next question is: Does CVSFormat have any public constructors? If not,
>> the builder can throw exceptions when one of its methods is called and
>> validation fails. This is nice in the sense that the format object feels
>> more lightweight and has a simpler/shorter protocol. It also leaves room
>> for other builders to be added (to configured formats from config files for
>> example) without growing the format class itself.
>>
>> If CVSFormat does have public constructors, then the format class still has
>> to do its own validation. What I gain is the choice of using a kitchen sink
>> constructor or the fluent builder API, I can choose my style.
>>
>> If there are two classes, and I cannot build a format without a format
>> builder, then why not collapse the two classes into one?
>>
>
> Hi Gary,
>
> I agree. I'd favor to have no public constructors and a builder that
> is an internal class of CSVFormat. Users create CSVFormatBuilders by
> calling a static method on CSVFormat:
>
> CSVFormat format =
> CSVFormat.defaults().withDelimiter('#').withCommentStart('/').build();
>
> Where defaults() returns a builder that is initialized with (suprise)
> the values of the default format. No need to call a validate method.

If you mean that the build method does the validation, then I agree.
I think validation is necessary to check that the defined
meta-characters are distinct.

We could ignore validation and let the user define
escape=delimiter=quote , but I suspect that would generate a lot of
unnecessary user queries when things then go wrong in odd ways.

> Benedikt
>
>> Gary
>>
>>
>>> Stephen
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> --
>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
>> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Benedikt Ritter <be...@gmail.com>.
2012/10/16 Gary Gregory <ga...@gmail.com>:
> On Tue, Oct 16, 2012 at 1:00 PM, Stephen Colebourne <sc...@joda.org>wrote:
>
>> On 16 October 2012 17:44, Matt Benson <gu...@gmail.com> wrote:
>> > On Tue, Oct 16, 2012 at 11:42 AM, James Carman
>> > <ja...@carmanconsulting.com> wrote:
>> >> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <gu...@gmail.com>
>> wrote:
>> >>>
>> >>> Are these specific examples not the words you would actually use were
>> >>> you having a discussion on the subject in English?  :P
>> >>>
>> >>
>> >> Why not just support both?  The "with*" methods would just be aliases
>> >> for the more "natural language" method names.
>>
>> I would categorise first in two
>> - mutable builders producing immutable objects
>> - immutable objects
>>
>> The former should generally have short methods without prefixes, the
>> latter is more complex.
>>
>> For the latter, as a general rule, I use
>> withXxx()/plusXxx()/minusXxx() for items that affect the state and
>> past participle for other methods that manipulate the object in other
>> ways:
>>
>> // affects state (year/month/day)
>>  date = date.withYear(2012)
>>  date = date.plusYears(6)
>> // aftect multiple pieces of state, so past participle
>>  period = period.multipliedBy(6)
>>  period = period.negated()
>>
>> This is simply an extension of when you might use setXxx() on a bean,
>> and when you might use a named method.
>>
>
> I like the idea of two classes: CVSFormat and CVSFormatBuilder but...
>
> My next question is: Does CVSFormat have any public constructors? If not,
> the builder can throw exceptions when one of its methods is called and
> validation fails. This is nice in the sense that the format object feels
> more lightweight and has a simpler/shorter protocol. It also leaves room
> for other builders to be added (to configured formats from config files for
> example) without growing the format class itself.
>
> If CVSFormat does have public constructors, then the format class still has
> to do its own validation. What I gain is the choice of using a kitchen sink
> constructor or the fluent builder API, I can choose my style.
>
> If there are two classes, and I cannot build a format without a format
> builder, then why not collapse the two classes into one?
>

Hi Gary,

I agree. I'd favor to have no public constructors and a builder that
is an internal class of CSVFormat. Users create CSVFormatBuilders by
calling a static method on CSVFormat:

CSVFormat format =
CSVFormat.defaults().withDelimiter('#').withCommentStart('/').build();

Where defaults() returns a builder that is initialized with (suprise)
the values of the default format. No need to call a validate method.

Benedikt

> Gary
>
>
>> Stephen
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> --
> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Oct 16, 2012 at 1:00 PM, Stephen Colebourne <sc...@joda.org>wrote:

> On 16 October 2012 17:44, Matt Benson <gu...@gmail.com> wrote:
> > On Tue, Oct 16, 2012 at 11:42 AM, James Carman
> > <ja...@carmanconsulting.com> wrote:
> >> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <gu...@gmail.com>
> wrote:
> >>>
> >>> Are these specific examples not the words you would actually use were
> >>> you having a discussion on the subject in English?  :P
> >>>
> >>
> >> Why not just support both?  The "with*" methods would just be aliases
> >> for the more "natural language" method names.
>
> I would categorise first in two
> - mutable builders producing immutable objects
> - immutable objects
>
> The former should generally have short methods without prefixes, the
> latter is more complex.
>
> For the latter, as a general rule, I use
> withXxx()/plusXxx()/minusXxx() for items that affect the state and
> past participle for other methods that manipulate the object in other
> ways:
>
> // affects state (year/month/day)
>  date = date.withYear(2012)
>  date = date.plusYears(6)
> // aftect multiple pieces of state, so past participle
>  period = period.multipliedBy(6)
>  period = period.negated()
>
> This is simply an extension of when you might use setXxx() on a bean,
> and when you might use a named method.
>

I like the idea of two classes: CVSFormat and CVSFormatBuilder but...

My next question is: Does CVSFormat have any public constructors? If not,
the builder can throw exceptions when one of its methods is called and
validation fails. This is nice in the sense that the format object feels
more lightweight and has a simpler/shorter protocol. It also leaves room
for other builders to be added (to configured formats from config files for
example) without growing the format class itself.

If CVSFormat does have public constructors, then the format class still has
to do its own validation. What I gain is the choice of using a kitchen sink
constructor or the fluent builder API, I can choose my style.

If there are two classes, and I cannot build a format without a format
builder, then why not collapse the two classes into one?

Gary


> Stephen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by Benedikt Ritter <be...@gmail.com>.
2012/10/16 Stephen Colebourne <sc...@joda.org>:
> On 16 October 2012 17:44, Matt Benson <gu...@gmail.com> wrote:
>> On Tue, Oct 16, 2012 at 11:42 AM, James Carman
>> <ja...@carmanconsulting.com> wrote:
>>> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <gu...@gmail.com> wrote:
>>>>
>>>> Are these specific examples not the words you would actually use were
>>>> you having a discussion on the subject in English?  :P
>>>>
>>>
>>> Why not just support both?  The "with*" methods would just be aliases
>>> for the more "natural language" method names.
>
> I would categorise first in two
> - mutable builders producing immutable objects
> - immutable objects
>

Implementing a builder for CSVFormat was discussed a while ago [1]. I
think it's the best solution, because the validate method can then
made private and no code outside the format has to worry about whether
a format is valid or not (right now CSV code calls validate on newly
created CSVFormat instances to make sure they are valid.).
Anyway there were voices against a builder because it would complicate
the API, so we never implemented something like that...

Benedikt

[1] http://markmail.org/thread/mmeoymd3cpq5jxfr

> The former should generally have short methods without prefixes, the
> latter is more complex.
>
> For the latter, as a general rule, I use
> withXxx()/plusXxx()/minusXxx() for items that affect the state and
> past participle for other methods that manipulate the object in other
> ways:
>
> // affects state (year/month/day)
>  date = date.withYear(2012)
>  date = date.plusYears(6)
> // aftect multiple pieces of state, so past participle
>  period = period.multipliedBy(6)
>  period = period.negated()
>
> This is simply an extension of when you might use setXxx() on a bean,
> and when you might use a named method.
>
> Stephen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Stephen Colebourne <sc...@joda.org>.
On 16 October 2012 17:44, Matt Benson <gu...@gmail.com> wrote:
> On Tue, Oct 16, 2012 at 11:42 AM, James Carman
> <ja...@carmanconsulting.com> wrote:
>> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <gu...@gmail.com> wrote:
>>>
>>> Are these specific examples not the words you would actually use were
>>> you having a discussion on the subject in English?  :P
>>>
>>
>> Why not just support both?  The "with*" methods would just be aliases
>> for the more "natural language" method names.

I would categorise first in two
- mutable builders producing immutable objects
- immutable objects

The former should generally have short methods without prefixes, the
latter is more complex.

For the latter, as a general rule, I use
withXxx()/plusXxx()/minusXxx() for items that affect the state and
past participle for other methods that manipulate the object in other
ways:

// affects state (year/month/day)
 date = date.withYear(2012)
 date = date.plusYears(6)
// aftect multiple pieces of state, so past participle
 period = period.multipliedBy(6)
 period = period.negated()

This is simply an extension of when you might use setXxx() on a bean,
and when you might use a named method.

Stephen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Matt Benson <gu...@gmail.com>.
On Tue, Oct 16, 2012 at 11:42 AM, James Carman
<ja...@carmanconsulting.com> wrote:
> On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <gu...@gmail.com> wrote:
>>
>> Are these specific examples not the words you would actually use were
>> you having a discussion on the subject in English?  :P
>>
>
> Why not just support both?  The "with*" methods would just be aliases
> for the more "natural language" method names.

Or vice versa, sounds reasonable.  :)

Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by James Carman <ja...@carmanconsulting.com>.
On Tue, Oct 16, 2012 at 12:38 PM, Matt Benson <gu...@gmail.com> wrote:
>
> Are these specific examples not the words you would actually use were
> you having a discussion on the subject in English?  :P
>

Why not just support both?  The "with*" methods would just be aliases
for the more "natural language" method names.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Matt Benson <gu...@gmail.com>.
On Tue, Oct 16, 2012 at 11:27 AM, sebb <se...@gmail.com> wrote:
> On 16 October 2012 17:08, Jörg Schaible <jo...@gmx.de> wrote:
>> Matt Benson wrote:
>>
>>> Random thoughts--no real context here, so no way to inline:
>>>
>>> - "line separator" concept, while harmonizing with the line.separator
>>> system property, might be better represented as "row separator" so as
>>> not to imply that the parameter should be in any way limited to \r or
>>> \n .  I would think the default for this would be the line.separator
>>> property, however, and thus should take a String or CharSequence
>>> (perhaps it already does, but there's been so much talk about char
>>> parameters...).
>>>
>>> - with* methods:  just something to think about here, but while we're
>>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>>> #withEscape('\\') ?
>>
>> +1, good idea!
>
> Not sure I agree.
> The advantage of a common prefix is that they work well with IDEs.

I can appreciate that if you began to type "with"... the IDE could
show ten different things you could be trying to use, but I don't know
that I'd go so far as to call this "working well with" the IDE.

>
> Also I think it's confusing to have xxxBy and yyyWith.

Are these specific examples not the words you would actually use were
you having a discussion on the subject in English?  :P

Matt

>
>> - Jörg
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by sebb <se...@gmail.com>.
On 16 October 2012 17:08, Jörg Schaible <jo...@gmx.de> wrote:
> Matt Benson wrote:
>
>> Random thoughts--no real context here, so no way to inline:
>>
>> - "line separator" concept, while harmonizing with the line.separator
>> system property, might be better represented as "row separator" so as
>> not to imply that the parameter should be in any way limited to \r or
>> \n .  I would think the default for this would be the line.separator
>> property, however, and thus should take a String or CharSequence
>> (perhaps it already does, but there's been so much talk about char
>> parameters...).
>>
>> - with* methods:  just something to think about here, but while we're
>> creating a fluent API, would e.g. #delimitedBy('\t') read more
>> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
>> #withEscape('\\') ?
>
> +1, good idea!

Not sure I agree.
The advantage of a common prefix is that they work well with IDEs.

Also I think it's confusing to have xxxBy and yyyWith.

> - Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Jörg Schaible <jo...@gmx.de>.
Matt Benson wrote:

> Random thoughts--no real context here, so no way to inline:
> 
> - "line separator" concept, while harmonizing with the line.separator
> system property, might be better represented as "row separator" so as
> not to imply that the parameter should be in any way limited to \r or
> \n .  I would think the default for this would be the line.separator
> property, however, and thus should take a String or CharSequence
> (perhaps it already does, but there's been so much talk about char
> parameters...).
> 
> - with* methods:  just something to think about here, but while we're
> creating a fluent API, would e.g. #delimitedBy('\t') read more
> fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
> #withEscape('\\') ?

+1, good idea!

- Jörg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Matt Benson <gu...@gmail.com>.
Random thoughts--no real context here, so no way to inline:

- "line separator" concept, while harmonizing with the line.separator
system property, might be better represented as "row separator" so as
not to imply that the parameter should be in any way limited to \r or
\n .  I would think the default for this would be the line.separator
property, however, and thus should take a String or CharSequence
(perhaps it already does, but there's been so much talk about char
parameters...).

- with* methods:  just something to think about here, but while we're
creating a fluent API, would e.g. #delimitedBy('\t') read more
fluently than #withDelimiter('\t') ?  #escapingWith('\\') vs.
#withEscape('\\') ?

$0.02,
Matt

On Tue, Oct 16, 2012 at 8:53 AM, Jörg Schaible
<Jo...@scalaris.com> wrote:
> Gary Gregory wrote:
>
>> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
>> <Jo...@scalaris.com>wrote:
>>
>>> Hi Gary,
>>>
>>> Gary Gregory wrote:
>>>
>>> > Hi All:
>>> >
>>> > The format object can configure various aspects of input and output
>>> > formatting.
>>> >
>>> > With my recent addition of the Quote enum for [CSV-53], there are now
>>> > two aspects of quoting to configure: the quote character and the quote
>>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is currently
>>> > not implemented.
>>> >
>>> > First, I changed (without consulting this list, and please accept my
>>> > apologies for this) the - IMO - cryptic and burdensome terminology of
>>> > "encapsulator" to "quote char", and added "quote policy":
>>> >
>>> > - withQuoteChar(char)
>>> > - withQuotePolicy(Quote)
>>> >
>>> > My intention here is that all Quote APIs start with "withQuote"
>>> > followed by what aspect of quoting is being configured.
>>> >
>>> > Alternatively, we could have:
>>> >
>>> > - withQuote(char)
>>> > - withQuotePolicy(Quote)
>>>
>>> or
>>>
>>> - withQuote(char)
>>> - withQuote(Quote)
>>>
>>> ;-)
>>>
>>
>> Darn, I wish I knew you better to know if you were joking! :)
>>
>> This would not be good IMO because you are configuring two different
>> aspects of the behavior. When I see the same API name with different
>> parameters, I think that they are the same and that the API just does
>> conversions.
>>
>> We could consider making Quote a class instead of an enum and have it
>> carry a char and an enum, such that one object defines all quoting
>> aspects. This might be too normalized a design for something so simple
>> though.
>
> Actually I did not had a closer look to the API. You're definitely right to
> use different names for different aspects. It does not make sense to
> overload just for fun.
>
>>
>>
>>>
>>> > Which makes the API more consistent with the other char/Character based
>>> > properties:
>>> >
>>> > - withEscape
>>> > - withDelimiter
>>> > - withLineSeparator
>>> > - withCommentStart
>>> >
>>> > none of the above are post-fixed with a "Char" in the name.
>>> >
>>> > As far as reading, for me, the "-r" names are OK because the they are
>>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>>> about
>>> > "an escape" because that would be an act (think Alcatraz) as opposed to
>>> > what we have here: a character used to /perform/ escapes.
>>> >
>>> > So I propose to change "escape" to "escape char" because "escaper" is
>>> > not a word.
>>> >
>>> > The name "comment start" is not great also because it implies (to me)
>>> that
>>> > there is a "comment end" missing. So plain "comment" or "comment char"
>>> > would be better.
>>>
>>> Who said it has to be a single char?
>>>
>>
>> The current implementation does. ;)
>>
>> Are comments even in any RFC?
>
> Not that I am aware of.
>
>>> .withEOLComment("//")
>>>
>>>
>>> Same applies to the line separator:
>>>
>>> .withLineSeparator("\n\r")
>>>
>>> > Circling back to "quote char" which I have the way it is now for the
>>> > same reason as for the "escape" property.
>>> >
>>> > In summary, using *Char names is better IMO.
>>>
>>> Only if it can be a single char only. If it can either be a single char
>>> or a
>>> String, I normally tend to use overloaded methods:
>>>
>>> - withEOLComment(char)
>>> - withEOLComment(CharSequence)
>>>
>>
>> If you want to add // to the mix, please start a different thread. I'm not
>> sure this is really needed. Do you have a real life use case?
>
> People come up with all kind of "solutions" they are used to. CSV is brittle
> anyway, just because there is no "real" standard.
>
> Cheers,
> Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Jörg Schaible <Jo...@scalaris.com>.
Gary Gregory wrote:

> On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
> <Jo...@scalaris.com>wrote:
> 
>> Hi Gary,
>>
>> Gary Gregory wrote:
>>
>> > Hi All:
>> >
>> > The format object can configure various aspects of input and output
>> > formatting.
>> >
>> > With my recent addition of the Quote enum for [CSV-53], there are now
>> > two aspects of quoting to configure: the quote character and the quote
>> > policy (minimal, all, non-numeric, and none.) FYI, 'none' is currently
>> > not implemented.
>> >
>> > First, I changed (without consulting this list, and please accept my
>> > apologies for this) the - IMO - cryptic and burdensome terminology of
>> > "encapsulator" to "quote char", and added "quote policy":
>> >
>> > - withQuoteChar(char)
>> > - withQuotePolicy(Quote)
>> >
>> > My intention here is that all Quote APIs start with "withQuote"
>> > followed by what aspect of quoting is being configured.
>> >
>> > Alternatively, we could have:
>> >
>> > - withQuote(char)
>> > - withQuotePolicy(Quote)
>>
>> or
>>
>> - withQuote(char)
>> - withQuote(Quote)
>>
>> ;-)
>>
> 
> Darn, I wish I knew you better to know if you were joking! :)
> 
> This would not be good IMO because you are configuring two different
> aspects of the behavior. When I see the same API name with different
> parameters, I think that they are the same and that the API just does
> conversions.
> 
> We could consider making Quote a class instead of an enum and have it
> carry a char and an enum, such that one object defines all quoting
> aspects. This might be too normalized a design for something so simple
> though.

Actually I did not had a closer look to the API. You're definitely right to 
use different names for different aspects. It does not make sense to 
overload just for fun.

> 
> 
>>
>> > Which makes the API more consistent with the other char/Character based
>> > properties:
>> >
>> > - withEscape
>> > - withDelimiter
>> > - withLineSeparator
>> > - withCommentStart
>> >
>> > none of the above are post-fixed with a "Char" in the name.
>> >
>> > As far as reading, for me, the "-r" names are OK because the they are
>> > nouns (things): "a delimiter", "a line separator." But I do not talk
>> about
>> > "an escape" because that would be an act (think Alcatraz) as opposed to
>> > what we have here: a character used to /perform/ escapes.
>> >
>> > So I propose to change "escape" to "escape char" because "escaper" is
>> > not a word.
>> >
>> > The name "comment start" is not great also because it implies (to me)
>> that
>> > there is a "comment end" missing. So plain "comment" or "comment char"
>> > would be better.
>>
>> Who said it has to be a single char?
>>
> 
> The current implementation does. ;)
> 
> Are comments even in any RFC?

Not that I am aware of.

>> .withEOLComment("//")
>>
>>
>> Same applies to the line separator:
>>
>> .withLineSeparator("\n\r")
>>
>> > Circling back to "quote char" which I have the way it is now for the
>> > same reason as for the "escape" property.
>> >
>> > In summary, using *Char names is better IMO.
>>
>> Only if it can be a single char only. If it can either be a single char
>> or a
>> String, I normally tend to use overloaded methods:
>>
>> - withEOLComment(char)
>> - withEOLComment(CharSequence)
>>
> 
> If you want to add // to the mix, please start a different thread. I'm not
> sure this is really needed. Do you have a real life use case?

People come up with all kind of "solutions" they are used to. CSV is brittle 
anyway, just because there is no "real" standard.

Cheers,
Jörg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Gary Gregory <ga...@gmail.com>.
On Tue, Oct 16, 2012 at 9:14 AM, Jörg Schaible
<Jo...@scalaris.com>wrote:

> Hi Gary,
>
> Gary Gregory wrote:
>
> > Hi All:
> >
> > The format object can configure various aspects of input and output
> > formatting.
> >
> > With my recent addition of the Quote enum for [CSV-53], there are now two
> > aspects of quoting to configure: the quote character and the quote policy
> > (minimal, all, non-numeric, and none.) FYI, 'none' is currently not
> > implemented.
> >
> > First, I changed (without consulting this list, and please accept my
> > apologies for this) the - IMO - cryptic and burdensome terminology of
> > "encapsulator" to "quote char", and added "quote policy":
> >
> > - withQuoteChar(char)
> > - withQuotePolicy(Quote)
> >
> > My intention here is that all Quote APIs start with "withQuote" followed
> > by what aspect of quoting is being configured.
> >
> > Alternatively, we could have:
> >
> > - withQuote(char)
> > - withQuotePolicy(Quote)
>
> or
>
> - withQuote(char)
> - withQuote(Quote)
>
> ;-)
>

Darn, I wish I knew you better to know if you were joking! :)

This would not be good IMO because you are configuring two different
aspects of the behavior. When I see the same API name with different
parameters, I think that they are the same and that the API just does
conversions.

We could consider making Quote a class instead of an enum and have it carry
a char and an enum, such that one object defines all quoting aspects. This
might be too normalized a design for something so simple though.


>
> > Which makes the API more consistent with the other char/Character based
> > properties:
> >
> > - withEscape
> > - withDelimiter
> > - withLineSeparator
> > - withCommentStart
> >
> > none of the above are post-fixed with a "Char" in the name.
> >
> > As far as reading, for me, the "-r" names are OK because the they are
> > nouns (things): "a delimiter", "a line separator." But I do not talk
> about
> > "an escape" because that would be an act (think Alcatraz) as opposed to
> > what we have here: a character used to /perform/ escapes.
> >
> > So I propose to change "escape" to "escape char" because "escaper" is not
> > a word.
> >
> > The name "comment start" is not great also because it implies (to me)
> that
> > there is a "comment end" missing. So plain "comment" or "comment char"
> > would be better.
>
> Who said it has to be a single char?
>

The current implementation does. ;)

Are comments even in any RFC?


>
> .withEOLComment("//")
>
>
> Same applies to the line separator:
>
> .withLineSeparator("\n\r")
>
> > Circling back to "quote char" which I have the way it is now for the same
> > reason as for the "escape" property.
> >
> > In summary, using *Char names is better IMO.
>
> Only if it can be a single char only. If it can either be a single char or
> a
> String, I normally tend to use overloaded methods:
>
> - withEOLComment(char)
> - withEOLComment(CharSequence)
>

If you want to add // to the mix, please start a different thread. I'm not
sure this is really needed. Do you have a real life use case?

Merci!
Gary


>
> > Discuss! :)
>
> Can or worms opened :))
>
> - Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [csv] CSVFormat API names

Posted by Simone Tripodi <si...@apache.org>.
+1 to Jörg, that would be my recommendation as well!

my 0.02 cents,
-Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/


On Tue, Oct 16, 2012 at 3:14 PM, Jörg Schaible
<Jo...@scalaris.com> wrote:
> Hi Gary,
>
> Gary Gregory wrote:
>
>> Hi All:
>>
>> The format object can configure various aspects of input and output
>> formatting.
>>
>> With my recent addition of the Quote enum for [CSV-53], there are now two
>> aspects of quoting to configure: the quote character and the quote policy
>> (minimal, all, non-numeric, and none.) FYI, 'none' is currently not
>> implemented.
>>
>> First, I changed (without consulting this list, and please accept my
>> apologies for this) the - IMO - cryptic and burdensome terminology of
>> "encapsulator" to "quote char", and added "quote policy":
>>
>> - withQuoteChar(char)
>> - withQuotePolicy(Quote)
>>
>> My intention here is that all Quote APIs start with "withQuote" followed
>> by what aspect of quoting is being configured.
>>
>> Alternatively, we could have:
>>
>> - withQuote(char)
>> - withQuotePolicy(Quote)
>
> or
>
> - withQuote(char)
> - withQuote(Quote)
>
> ;-)
>
>> Which makes the API more consistent with the other char/Character based
>> properties:
>>
>> - withEscape
>> - withDelimiter
>> - withLineSeparator
>> - withCommentStart
>>
>> none of the above are post-fixed with a "Char" in the name.
>>
>> As far as reading, for me, the "-r" names are OK because the they are
>> nouns (things): "a delimiter", "a line separator." But I do not talk about
>> "an escape" because that would be an act (think Alcatraz) as opposed to
>> what we have here: a character used to /perform/ escapes.
>>
>> So I propose to change "escape" to "escape char" because "escaper" is not
>> a word.
>>
>> The name "comment start" is not great also because it implies (to me) that
>> there is a "comment end" missing. So plain "comment" or "comment char"
>> would be better.
>
> Who said it has to be a single char?
>
> .withEOLComment("//")
>
>
> Same applies to the line separator:
>
> .withLineSeparator("\n\r")
>
>> Circling back to "quote char" which I have the way it is now for the same
>> reason as for the "escape" property.
>>
>> In summary, using *Char names is better IMO.
>
> Only if it can be a single char only. If it can either be a single char or a
> String, I normally tend to use overloaded methods:
>
> - withEOLComment(char)
> - withEOLComment(CharSequence)
>
>> Discuss! :)
>
> Can or worms opened :))
>
> - Jörg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [csv] CSVFormat API names

Posted by Jörg Schaible <Jo...@scalaris.com>.
Hi Gary,

Gary Gregory wrote:

> Hi All:
> 
> The format object can configure various aspects of input and output
> formatting.
> 
> With my recent addition of the Quote enum for [CSV-53], there are now two
> aspects of quoting to configure: the quote character and the quote policy
> (minimal, all, non-numeric, and none.) FYI, 'none' is currently not
> implemented.
> 
> First, I changed (without consulting this list, and please accept my
> apologies for this) the - IMO - cryptic and burdensome terminology of
> "encapsulator" to "quote char", and added "quote policy":
> 
> - withQuoteChar(char)
> - withQuotePolicy(Quote)
> 
> My intention here is that all Quote APIs start with "withQuote" followed
> by what aspect of quoting is being configured.
> 
> Alternatively, we could have:
> 
> - withQuote(char)
> - withQuotePolicy(Quote)

or

- withQuote(char)
- withQuote(Quote)

;-)

> Which makes the API more consistent with the other char/Character based
> properties:
> 
> - withEscape
> - withDelimiter
> - withLineSeparator
> - withCommentStart
> 
> none of the above are post-fixed with a "Char" in the name.
> 
> As far as reading, for me, the "-r" names are OK because the they are
> nouns (things): "a delimiter", "a line separator." But I do not talk about
> "an escape" because that would be an act (think Alcatraz) as opposed to
> what we have here: a character used to /perform/ escapes.
> 
> So I propose to change "escape" to "escape char" because "escaper" is not
> a word.
> 
> The name "comment start" is not great also because it implies (to me) that
> there is a "comment end" missing. So plain "comment" or "comment char"
> would be better.

Who said it has to be a single char?

.withEOLComment("//")


Same applies to the line separator:

.withLineSeparator("\n\r")

> Circling back to "quote char" which I have the way it is now for the same
> reason as for the "escape" property.
> 
> In summary, using *Char names is better IMO.

Only if it can be a single char only. If it can either be a single char or a 
String, I normally tend to use overloaded methods:

- withEOLComment(char)
- withEOLComment(CharSequence)

> Discuss! :)

Can or worms opened :))

- Jörg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org