You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2014/01/14 16:07:37 UTC
Splitting strings in Java - how to escape delimiter characters?
I have a Java question, for a custom update processor I'm developing. It
takes an input field of the following format:
field:value;mvfield:value1;mvfield:value2
With an inner delimiter set to a colon and an outer delimiter set to a
semicolon, this results in two new fields going into the document. The
field named 'field' has one value and the field named mvfield has two.
This code uses the String#split method, so it can't deal with the
delimiter characters being escaped with a backslash.
How can I make the code deal with an escape character (backslash) on the
two delimiters and the escape character itself? Unless it's absolutely
necessary or super easy, I do not plan to deal with the full set of regex
escaped characters.
I can move this discussion to the development list, but I thought I would
start here.
Thanks,
Shawn
Re: Splitting strings in Java - how to escape delimiter characters?
Posted by Yonik Seeley <yo...@heliosearch.com>.
Look at the StrUtils.splitSmart methods... the first variant treats
quotes specially,
the second variant doesn't (that's the one you probably want).
-Yonik
http://heliosearch.org -- off-heap filters for solr
On Tue, Jan 14, 2014 at 10:07 AM, Shawn Heisey <so...@elyograg.org> wrote:
> I have a Java question, for a custom update processor I'm developing. It
> takes an input field of the following format:
>
> field:value;mvfield:value1;mvfield:value2
>
> With an inner delimiter set to a colon and an outer delimiter set to a
> semicolon, this results in two new fields going into the document. The
> field named 'field' has one value and the field named mvfield has two.
>
> This code uses the String#split method, so it can't deal with the
> delimiter characters being escaped with a backslash.
>
> How can I make the code deal with an escape character (backslash) on the
> two delimiters and the escape character itself? Unless it's absolutely
> necessary or super easy, I do not plan to deal with the full set of regex
> escaped characters.
>
> I can move this discussion to the development list, but I thought I would
> start here.
>
> Thanks,
> Shawn
>
>
Re: Splitting strings in Java - how to escape delimiter characters?
Posted by Shawn Heisey <so...@elyograg.org>.
On 1/14/2014 8:20 AM, Steve Rowe wrote:
> Solrj’s StrUtils.splitSmart() should do exactly what you want - in the first pass, split on semicolon and don’t decode backslash escaping, and then in the inner loop, use the same method to split on colons and decode backslash escaping. I think :).
Thank you, Yonik and Steve! This seems to work perfectly. Here's what I
did. Naturally the whole thing is in a try/catch:
http://apaste.info/4beg
Shawn
Re: Splitting strings in Java - how to escape delimiter characters?
Posted by Steve Rowe <sa...@gmail.com>.
Hi Shawn,
Solrj’s StrUtils.splitSmart() should do exactly what you want - in the first pass, split on semicolon and don’t decode backslash escaping, and then in the inner loop, use the same method to split on colons and decode backslash escaping. I think :).
Steve
On Jan 14, 2014, at 10:07 AM, Shawn Heisey <so...@elyograg.org> wrote:
> I have a Java question, for a custom update processor I'm developing. It
> takes an input field of the following format:
>
> field:value;mvfield:value1;mvfield:value2
>
> With an inner delimiter set to a colon and an outer delimiter set to a
> semicolon, this results in two new fields going into the document. The
> field named 'field' has one value and the field named mvfield has two.
>
> This code uses the String#split method, so it can't deal with the
> delimiter characters being escaped with a backslash.
>
> How can I make the code deal with an escape character (backslash) on the
> two delimiters and the escape character itself? Unless it's absolutely
> necessary or super easy, I do not plan to deal with the full set of regex
> escaped characters.
>
> I can move this discussion to the development list, but I thought I would
> start here.
>
> Thanks,
> Shawn
>
>