You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Jacek Furmankiewicz <ja...@gmail.com> on 2009/04/15 14:39:57 UTC

StrTokenizer not handling quotes correctly?

I am trying to use StrTokenizer for some parsing and I am probably not using
it correctly.

Let's say I have this string:

11"a,b"11,22"c,d"22"

I would like to split it by the comma ",", but ignoring any commas embedded
in quotes. I try this:

        String test = "11\"a,b\"11,22\"c,d\"22";
        StrTokenizer str = new StrTokenizer(test,',','"');
        String[] tokens = str.getTokenArray();

        for(String t: tokens) {
            System.out.println(t);
        }

and expect to have two strings print out:

11"a,b"11
22"c,d"22

but instead I get 4 :

11"a
b"11
22"c
d"22

It seems the tokenizer is splitting on the comma, even if it is embedded in
quotes.

I tried different options on the StrTokenizer, but not been able to get it
to work correctly.

Any idea as to what am I doing wrong? Using latest version 2.4.

Thanks, Jacek

Re: StrTokenizer not handling quotes correctly?

Posted by sebb <se...@gmail.com>.
On 15/04/2009, Jacek Furmankiewicz <ja...@gmail.com> wrote:
> I am trying to use StrTokenizer for some parsing and I am probably not using
>  it correctly.
>
>  Let's say I have this string:
>
>  11"a,b"11,22"c,d"22"
>
>  I would like to split it by the comma ",", but ignoring any commas embedded
>  in quotes. I try this:
>
>         String test = "11\"a,b\"11,22\"c,d\"22";
>         StrTokenizer str = new StrTokenizer(test,',','"');
>         String[] tokens = str.getTokenArray();
>
>         for(String t: tokens) {
>             System.out.println(t);
>         }
>
>  and expect to have two strings print out:
>
>  11"a,b"11
>  22"c,d"22
>
>  but instead I get 4 :
>
>  11"a
>  b"11
>  22"c
>  d"22
>
>  It seems the tokenizer is splitting on the comma, even if it is embedded in
>  quotes.

Quotes are only allowed in quoted strings. From the Javadoc:

"Each token may be surrounded by quotes. The quote matcher specifies
the quote character(s). A quote may be escaped within a quoted section
by duplicating itself. "

>  I tried different options on the StrTokenizer, but not been able to get it
>  to work correctly.
>
>  Any idea as to what am I doing wrong? Using latest version 2.4.

The input needs to look like this:

"11""a,b""11","22""c,d""22""

>  Thanks, Jacek
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org