You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Chen Guoping1 <ch...@163.com> on 2020/05/12 11:42:34 UTC
[CSV] The Feature Multiple-Character Delimiter
Hi, all
In CSV parsing, there are many scenarios where multiple characters are used as separators,
To support this feature, we should change the char type of delimiter to String. This will lead to
API changes, and old usage code may need to be modified to pass.
When parsing we can get the character array in advance through lookAhead(int n) in the
ExtendedBufferedReader to determine whether it is a delimiter
char[] lookAhead(int n) throws IOException {
char[] buf = new char[n];
super.mark(n);
super.read(buf, 0, n);
super.reset();
return buf;
}
I have a little problem to confirm. The escape character is' \ ', when delimiter is a char ','
printWithEscape print '\,' , so when delimiter is multiple characters "[|]" printWithEscape
print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there more any suggestion about
this feature ?
——
Chen Guoping
Re:Re: [CSV] The Feature Multiple-Character Delimiter
Posted by Chen Guoping1 <ch...@163.com>.
Hi,
As shown in the following figure, MySQL supports multi character separator import and export:
And about [CSV-206](https://issues.apache.org/jira/projects/CSV/issues/CSV-206?filter=allopenissues), StackOverflow
(https://stackoverflow.com/questions/8653797/java-csv-parser-with-string-separator-multi-character). There are people
looking for support for multi character separators.
It can be found in [CSV-206] that CsvHelper(https://github.com/JoshClose/CsvHelper) supports multi character separator,
and miller(https://github.com/johnkerl/miller/blob/master/c/input/line_readers.h) also supprot. But It seems that there is
no Java library support yet.
Do commons CSV consider support?
Chen
At 2020-05-13 07:27:26, "Gary Gregory" <ga...@gmail.com> wrote:
>Hi,
>
>May you give an example where more than one character is used as a
>separator? Is there a database or known tool out there that uses such a
>format?
>
>WRT escaping I would think that \ escapes the one character that follows
>only. It is up to the reader to decide what to do with an escape sequence.
>Anyone else?
>
>Gary
>
>On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
>wrote:
>
>> Hi, all
>>
>>
>>
>>
>> In CSV parsing, there are many scenarios where multiple characters are
>> used as separators,
>>
>> To support this feature, we should change the char type of delimiter to
>> String. This will lead to
>>
>> API changes, and old usage code may need to be modified to pass.
>>
>>
>>
>>
>> When parsing we can get the character array in advance through
>> lookAhead(int n) in the
>>
>> ExtendedBufferedReader to determine whether it is a delimiter
>>
>>
>>
>>
>> char[] lookAhead(int n) throws IOException {
>>
>> char[] buf = new char[n];
>>
>> super.mark(n);
>>
>> super.read(buf, 0, n);
>>
>> super.reset();
>>
>> return buf;
>>
>> }
>>
>>
>>
>>
>> I have a little problem to confirm. The escape character is' \ ', when
>> delimiter is a char ','
>> printWithEscape print '\,' , so when delimiter is multiple characters
>> "[|]" printWithEscape
>> print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
>> more any suggestion about
>> this feature ?
>>
>>
>> ——
>> Chen Guoping
>>
>>
>>
>>
>>
>>
>>
>>
Re:[CSV] The Feature Multiple-Character Delimiter
Posted by Chen Guoping1 <ch...@163.com>.
At 2020-05-13 22:29:20, "Gary Gregory" <ga...@gmail.com> wrote:
>On Wed, May 13, 2020 at 6:48 AM sebb <se...@gmail.com> wrote:
>
>
>Chen,
>
>Are you talking about record separators, field separators, or both?
>
>Gary
>
Hi, all
Sorry, field seperators.
It is the problem described by [CSV-206](https://issues.apache.org/jira/projects/CSV/issues/CSV-206)
Chen
At 2020-05-13 22:29:20, "Gary Gregory" <ga...@gmail.com> wrote:
>On Wed, May 13, 2020 at 6:48 AM sebb <se...@gmail.com> wrote:
>
>> On Wed, 13 May 2020 at 00:27, Gary Gregory <ga...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > May you give an example where more than one character is used as a
>> > separator? Is there a database or known tool out there that uses such a
>> > format?
>>
>> The IBAN Registry (TXT) located at:
>> https://www.swift.com/standards/data-standards/iban
>> uses \r\n as EOL.
>>
>> Some of the fields include \n within quoted values.
>>
>
>Chen,
>
>Are you talking about record separators, field separators, or both?
>
>Gary
>
>
>>
>> > WRT escaping I would think that \ escapes the one character that follows
>> > only. It is up to the reader to decide what to do with an escape
>> sequence.
>> > Anyone else?
>> >
>> > Gary
>> >
>> > On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
>> > wrote:
>> >
>> > > Hi, all
>> > >
>> > >
>> > >
>> > >
>> > > In CSV parsing, there are many scenarios where multiple characters are
>> > > used as separators,
>> > >
>> > > To support this feature, we should change the char type of delimiter to
>> > > String. This will lead to
>> > >
>> > > API changes, and old usage code may need to be modified to pass.
>> > >
>> > >
>> > >
>> > >
>> > > When parsing we can get the character array in advance through
>> > > lookAhead(int n) in the
>> > >
>> > > ExtendedBufferedReader to determine whether it is a delimiter
>> > >
>> > >
>> > >
>> > >
>> > > char[] lookAhead(int n) throws IOException {
>> > >
>> > > char[] buf = new char[n];
>> > >
>> > > super.mark(n);
>> > >
>> > > super.read(buf, 0, n);
>> > >
>> > > super.reset();
>> > >
>> > > return buf;
>> > >
>> > > }
>> > >
>> > >
>> > >
>> > >
>> > > I have a little problem to confirm. The escape character is' \ ', when
>> > > delimiter is a char ','
>> > > printWithEscape print '\,' , so when delimiter is multiple characters
>> > > "[|]" printWithEscape
>> > > print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
>> > > more any suggestion about
>> > > this feature ?
>> > >
>> > >
>> > > ——
>> > > Chen Guoping
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
Re: [CSV] The Feature Multiple-Character Delimiter
Posted by Gary Gregory <ga...@gmail.com>.
On Wed, May 13, 2020 at 6:48 AM sebb <se...@gmail.com> wrote:
> On Wed, 13 May 2020 at 00:27, Gary Gregory <ga...@gmail.com> wrote:
> >
> > Hi,
> >
> > May you give an example where more than one character is used as a
> > separator? Is there a database or known tool out there that uses such a
> > format?
>
> The IBAN Registry (TXT) located at:
> https://www.swift.com/standards/data-standards/iban
> uses \r\n as EOL.
>
> Some of the fields include \n within quoted values.
>
Chen,
Are you talking about record separators, field separators, or both?
Gary
>
> > WRT escaping I would think that \ escapes the one character that follows
> > only. It is up to the reader to decide what to do with an escape
> sequence.
> > Anyone else?
> >
> > Gary
> >
> > On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
> > wrote:
> >
> > > Hi, all
> > >
> > >
> > >
> > >
> > > In CSV parsing, there are many scenarios where multiple characters are
> > > used as separators,
> > >
> > > To support this feature, we should change the char type of delimiter to
> > > String. This will lead to
> > >
> > > API changes, and old usage code may need to be modified to pass.
> > >
> > >
> > >
> > >
> > > When parsing we can get the character array in advance through
> > > lookAhead(int n) in the
> > >
> > > ExtendedBufferedReader to determine whether it is a delimiter
> > >
> > >
> > >
> > >
> > > char[] lookAhead(int n) throws IOException {
> > >
> > > char[] buf = new char[n];
> > >
> > > super.mark(n);
> > >
> > > super.read(buf, 0, n);
> > >
> > > super.reset();
> > >
> > > return buf;
> > >
> > > }
> > >
> > >
> > >
> > >
> > > I have a little problem to confirm. The escape character is' \ ', when
> > > delimiter is a char ','
> > > printWithEscape print '\,' , so when delimiter is multiple characters
> > > "[|]" printWithEscape
> > > print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
> > > more any suggestion about
> > > this feature ?
> > >
> > >
> > > ——
> > > Chen Guoping
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>
Re: [CSV] The Feature Multiple-Character Delimiter
Posted by sebb <se...@gmail.com>.
On Wed, 13 May 2020 at 00:27, Gary Gregory <ga...@gmail.com> wrote:
>
> Hi,
>
> May you give an example where more than one character is used as a
> separator? Is there a database or known tool out there that uses such a
> format?
The IBAN Registry (TXT) located at:
https://www.swift.com/standards/data-standards/iban
uses \r\n as EOL.
Some of the fields include \n within quoted values.
> WRT escaping I would think that \ escapes the one character that follows
> only. It is up to the reader to decide what to do with an escape sequence.
> Anyone else?
>
> Gary
>
> On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
> wrote:
>
> > Hi, all
> >
> >
> >
> >
> > In CSV parsing, there are many scenarios where multiple characters are
> > used as separators,
> >
> > To support this feature, we should change the char type of delimiter to
> > String. This will lead to
> >
> > API changes, and old usage code may need to be modified to pass.
> >
> >
> >
> >
> > When parsing we can get the character array in advance through
> > lookAhead(int n) in the
> >
> > ExtendedBufferedReader to determine whether it is a delimiter
> >
> >
> >
> >
> > char[] lookAhead(int n) throws IOException {
> >
> > char[] buf = new char[n];
> >
> > super.mark(n);
> >
> > super.read(buf, 0, n);
> >
> > super.reset();
> >
> > return buf;
> >
> > }
> >
> >
> >
> >
> > I have a little problem to confirm. The escape character is' \ ', when
> > delimiter is a char ','
> > printWithEscape print '\,' , so when delimiter is multiple characters
> > "[|]" printWithEscape
> > print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
> > more any suggestion about
> > this feature ?
> >
> >
> > ——
> > Chen Guoping
> >
> >
> >
> >
> >
> >
> >
> >
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org
Re: [CSV] The Feature Multiple-Character Delimiter
Posted by Gary Gregory <ga...@gmail.com>.
Hi,
May you give an example where more than one character is used as a
separator? Is there a database or known tool out there that uses such a
format?
WRT escaping I would think that \ escapes the one character that follows
only. It is up to the reader to decide what to do with an escape sequence.
Anyone else?
Gary
On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
wrote:
> Hi, all
>
>
>
>
> In CSV parsing, there are many scenarios where multiple characters are
> used as separators,
>
> To support this feature, we should change the char type of delimiter to
> String. This will lead to
>
> API changes, and old usage code may need to be modified to pass.
>
>
>
>
> When parsing we can get the character array in advance through
> lookAhead(int n) in the
>
> ExtendedBufferedReader to determine whether it is a delimiter
>
>
>
>
> char[] lookAhead(int n) throws IOException {
>
> char[] buf = new char[n];
>
> super.mark(n);
>
> super.read(buf, 0, n);
>
> super.reset();
>
> return buf;
>
> }
>
>
>
>
> I have a little problem to confirm. The escape character is' \ ', when
> delimiter is a char ','
> printWithEscape print '\,' , so when delimiter is multiple characters
> "[|]" printWithEscape
> print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
> more any suggestion about
> this feature ?
>
>
> ——
> Chen Guoping
>
>
>
>
>
>
>
>