You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Chen Guoping1 <ch...@163.com> on 2020/05/12 11:42:34 UTC

[CSV] The Feature Multiple-Character Delimiter

Hi, all 




In CSV parsing, there are many scenarios where multiple characters are used as separators, 

To support this feature, we should change the char type of delimiter to String. This will lead to 

API changes, and old usage code may need to be modified to pass.




When parsing we can get the character array in advance through lookAhead(int n) in the 

ExtendedBufferedReader to determine whether it is a delimiter




    char[] lookAhead(int n) throws IOException {

        char[] buf = new char[n];

        super.mark(n);

        super.read(buf, 0, n);

        super.reset();

        return buf;

    }




I have a little problem to confirm. The escape character is' \ ',  when delimiter is a char ','  
printWithEscape print '\,' , so when delimiter is multiple characters  "[|]" printWithEscape 
print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there more any suggestion about 
this feature ?


——
Chen Guoping








Re:Re: [CSV] The Feature Multiple-Character Delimiter

Posted by Chen Guoping1 <ch...@163.com>.
Hi, 


As shown in the following figure, MySQL supports multi character separator import and export:
And about [CSV-206](https://issues.apache.org/jira/projects/CSV/issues/CSV-206?filter=allopenissues), StackOverflow
(https://stackoverflow.com/questions/8653797/java-csv-parser-with-string-separator-multi-character). There are people 
looking for support for multi character separators. 
It can be found in [CSV-206] that CsvHelper(https://github.com/JoshClose/CsvHelper) supports multi character separator,
and  miller(https://github.com/johnkerl/miller/blob/master/c/input/line_readers.h)  also supprot. But It seems that there is 
no Java library support yet.
Do commons CSV consider support?


Chen














At 2020-05-13 07:27:26, "Gary Gregory" <ga...@gmail.com> wrote:
>Hi,
>
>May you give an example where more than one character is used as a
>separator? Is there a database or known tool out there that uses such a
>format?
>
>WRT escaping I would think that \ escapes the one character that follows
>only. It is up to the reader to decide what to do with an escape sequence.
>Anyone else?
>
>Gary
>
>On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
>wrote:
>
>> Hi, all
>>
>>
>>
>>
>> In CSV parsing, there are many scenarios where multiple characters are
>> used as separators,
>>
>> To support this feature, we should change the char type of delimiter to
>> String. This will lead to
>>
>> API changes, and old usage code may need to be modified to pass.
>>
>>
>>
>>
>> When parsing we can get the character array in advance through
>> lookAhead(int n) in the
>>
>> ExtendedBufferedReader to determine whether it is a delimiter
>>
>>
>>
>>
>>     char[] lookAhead(int n) throws IOException {
>>
>>         char[] buf = new char[n];
>>
>>         super.mark(n);
>>
>>         super.read(buf, 0, n);
>>
>>         super.reset();
>>
>>         return buf;
>>
>>     }
>>
>>
>>
>>
>> I have a little problem to confirm. The escape character is' \ ',  when
>> delimiter is a char ','
>> printWithEscape print '\,' , so when delimiter is multiple characters
>> "[|]" printWithEscape
>> print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
>> more any suggestion about
>> this feature ?
>>
>>
>> ——
>> Chen Guoping
>>
>>
>>
>>
>>
>>
>>
>>

Re:[CSV] The Feature Multiple-Character Delimiter

Posted by Chen Guoping1 <ch...@163.com>.
At 2020-05-13 22:29:20, "Gary Gregory" <ga...@gmail.com> wrote:
>On Wed, May 13, 2020 at 6:48 AM sebb <se...@gmail.com> wrote:
>
>
>Chen,
>
>Are you talking about record separators, field separators, or both?
>
>Gary
>


Hi, all


Sorry, field seperators.
It is the problem described by [CSV-206](https://issues.apache.org/jira/projects/CSV/issues/CSV-206)


Chen














At 2020-05-13 22:29:20, "Gary Gregory" <ga...@gmail.com> wrote:
>On Wed, May 13, 2020 at 6:48 AM sebb <se...@gmail.com> wrote:
>
>> On Wed, 13 May 2020 at 00:27, Gary Gregory <ga...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > May you give an example where more than one character is used as a
>> > separator? Is there a database or known tool out there that uses such a
>> > format?
>>
>> The IBAN Registry (TXT) located at:
>> https://www.swift.com/standards/data-standards/iban
>> uses \r\n as EOL.
>>
>> Some of the fields include \n within quoted values.
>>
>
>Chen,
>
>Are you talking about record separators, field separators, or both?
>
>Gary
>
>
>>
>> > WRT escaping I would think that \ escapes the one character that follows
>> > only. It is up to the reader to decide what to do with an escape
>> sequence.
>> > Anyone else?
>> >
>> > Gary
>> >
>> > On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
>> > wrote:
>> >
>> > > Hi, all
>> > >
>> > >
>> > >
>> > >
>> > > In CSV parsing, there are many scenarios where multiple characters are
>> > > used as separators,
>> > >
>> > > To support this feature, we should change the char type of delimiter to
>> > > String. This will lead to
>> > >
>> > > API changes, and old usage code may need to be modified to pass.
>> > >
>> > >
>> > >
>> > >
>> > > When parsing we can get the character array in advance through
>> > > lookAhead(int n) in the
>> > >
>> > > ExtendedBufferedReader to determine whether it is a delimiter
>> > >
>> > >
>> > >
>> > >
>> > >     char[] lookAhead(int n) throws IOException {
>> > >
>> > >         char[] buf = new char[n];
>> > >
>> > >         super.mark(n);
>> > >
>> > >         super.read(buf, 0, n);
>> > >
>> > >         super.reset();
>> > >
>> > >         return buf;
>> > >
>> > >     }
>> > >
>> > >
>> > >
>> > >
>> > > I have a little problem to confirm. The escape character is' \ ',  when
>> > > delimiter is a char ','
>> > > printWithEscape print '\,' , so when delimiter is multiple characters
>> > > "[|]" printWithEscape
>> > > print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
>> > > more any suggestion about
>> > > this feature ?
>> > >
>> > >
>> > > ——
>> > > Chen Guoping
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>

Re: [CSV] The Feature Multiple-Character Delimiter

Posted by Gary Gregory <ga...@gmail.com>.
On Wed, May 13, 2020 at 6:48 AM sebb <se...@gmail.com> wrote:

> On Wed, 13 May 2020 at 00:27, Gary Gregory <ga...@gmail.com> wrote:
> >
> > Hi,
> >
> > May you give an example where more than one character is used as a
> > separator? Is there a database or known tool out there that uses such a
> > format?
>
> The IBAN Registry (TXT) located at:
> https://www.swift.com/standards/data-standards/iban
> uses \r\n as EOL.
>
> Some of the fields include \n within quoted values.
>

Chen,

Are you talking about record separators, field separators, or both?

Gary


>
> > WRT escaping I would think that \ escapes the one character that follows
> > only. It is up to the reader to decide what to do with an escape
> sequence.
> > Anyone else?
> >
> > Gary
> >
> > On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
> > wrote:
> >
> > > Hi, all
> > >
> > >
> > >
> > >
> > > In CSV parsing, there are many scenarios where multiple characters are
> > > used as separators,
> > >
> > > To support this feature, we should change the char type of delimiter to
> > > String. This will lead to
> > >
> > > API changes, and old usage code may need to be modified to pass.
> > >
> > >
> > >
> > >
> > > When parsing we can get the character array in advance through
> > > lookAhead(int n) in the
> > >
> > > ExtendedBufferedReader to determine whether it is a delimiter
> > >
> > >
> > >
> > >
> > >     char[] lookAhead(int n) throws IOException {
> > >
> > >         char[] buf = new char[n];
> > >
> > >         super.mark(n);
> > >
> > >         super.read(buf, 0, n);
> > >
> > >         super.reset();
> > >
> > >         return buf;
> > >
> > >     }
> > >
> > >
> > >
> > >
> > > I have a little problem to confirm. The escape character is' \ ',  when
> > > delimiter is a char ','
> > > printWithEscape print '\,' , so when delimiter is multiple characters
> > > "[|]" printWithEscape
> > > print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
> > > more any suggestion about
> > > this feature ?
> > >
> > >
> > > ——
> > > Chen Guoping
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [CSV] The Feature Multiple-Character Delimiter

Posted by sebb <se...@gmail.com>.
On Wed, 13 May 2020 at 00:27, Gary Gregory <ga...@gmail.com> wrote:
>
> Hi,
>
> May you give an example where more than one character is used as a
> separator? Is there a database or known tool out there that uses such a
> format?

The IBAN Registry (TXT) located at:
https://www.swift.com/standards/data-standards/iban
uses \r\n as EOL.

Some of the fields include \n within quoted values.

> WRT escaping I would think that \ escapes the one character that follows
> only. It is up to the reader to decide what to do with an escape sequence.
> Anyone else?
>
> Gary
>
> On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
> wrote:
>
> > Hi, all
> >
> >
> >
> >
> > In CSV parsing, there are many scenarios where multiple characters are
> > used as separators,
> >
> > To support this feature, we should change the char type of delimiter to
> > String. This will lead to
> >
> > API changes, and old usage code may need to be modified to pass.
> >
> >
> >
> >
> > When parsing we can get the character array in advance through
> > lookAhead(int n) in the
> >
> > ExtendedBufferedReader to determine whether it is a delimiter
> >
> >
> >
> >
> >     char[] lookAhead(int n) throws IOException {
> >
> >         char[] buf = new char[n];
> >
> >         super.mark(n);
> >
> >         super.read(buf, 0, n);
> >
> >         super.reset();
> >
> >         return buf;
> >
> >     }
> >
> >
> >
> >
> > I have a little problem to confirm. The escape character is' \ ',  when
> > delimiter is a char ','
> > printWithEscape print '\,' , so when delimiter is multiple characters
> > "[|]" printWithEscape
> > print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
> > more any suggestion about
> > this feature ?
> >
> >
> > ——
> > Chen Guoping
> >
> >
> >
> >
> >
> >
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [CSV] The Feature Multiple-Character Delimiter

Posted by Gary Gregory <ga...@gmail.com>.
Hi,

May you give an example where more than one character is used as a
separator? Is there a database or known tool out there that uses such a
format?

WRT escaping I would think that \ escapes the one character that follows
only. It is up to the reader to decide what to do with an escape sequence.
Anyone else?

Gary

On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 <ch...@163.com>
wrote:

> Hi, all
>
>
>
>
> In CSV parsing, there are many scenarios where multiple characters are
> used as separators,
>
> To support this feature, we should change the char type of delimiter to
> String. This will lead to
>
> API changes, and old usage code may need to be modified to pass.
>
>
>
>
> When parsing we can get the character array in advance through
> lookAhead(int n) in the
>
> ExtendedBufferedReader to determine whether it is a delimiter
>
>
>
>
>     char[] lookAhead(int n) throws IOException {
>
>         char[] buf = new char[n];
>
>         super.mark(n);
>
>         super.read(buf, 0, n);
>
>         super.reset();
>
>         return buf;
>
>     }
>
>
>
>
> I have a little problem to confirm. The escape character is' \ ',  when
> delimiter is a char ','
> printWithEscape print '\,' , so when delimiter is multiple characters
> "[|]" printWithEscape
> print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
> more any suggestion about
> this feature ?
>
>
> ——
> Chen Guoping
>
>
>
>
>
>
>
>