You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by "Haswell, Joe" <jo...@hp.com> on 2010/10/07 00:07:42 UTC

[Commons-Lang] StrTokenizer behavior on CSVs

Hello,

The behavior of StrTokenizer on CSV lines is producing an unexpected result, and I'm wondering if it's a defect or intended behavior.
If it's intended, my hope is that someone can guide me to a work-around.


Consider the CSV line:

Field1, field2,{x}"this should
All, be escaped"

If x is empty, the line gets tokenized as expected; that is, it produces the results:
Field1,field2,"this should{newline}All, be escaped"

If x is whitespace, the line gets tokenized unexpectedly:
Field1, field2,["this should All],[be escaped"]
(brackets indicate complete field; quotation marks are fragmented)

Any clarification provided would be greatly appreciated.  The tokenizer is configured to use the appropriate delimiters and quote characters.

Thanks!


Joe Haswell | HP Software