You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2014/01/14 16:07:37 UTC

Splitting strings in Java - how to escape delimiter characters?

I have a Java question, for a custom update processor I'm developing. It
takes an input field of the following format:

field:value;mvfield:value1;mvfield:value2

With an inner delimiter set to a colon and an outer delimiter set to a
semicolon, this results in two new fields going into the document. The
field named 'field' has one value and the field named mvfield has two.

This code uses the String#split method,  so it can't deal with the
delimiter characters being escaped with a backslash.

How can I make the code deal with an escape character (backslash) on the
two delimiters and the escape character itself? Unless it's absolutely
necessary or super easy, I do not plan to deal with the full set of regex
escaped characters.

I can move this discussion to the development list, but I thought I would
start here.

Thanks,
Shawn



Re: Splitting strings in Java - how to escape delimiter characters?

Posted by Yonik Seeley <yo...@heliosearch.com>.
Look at the StrUtils.splitSmart methods... the first variant treats
quotes specially,
the second variant doesn't (that's the one you probably want).

-Yonik
http://heliosearch.org -- off-heap filters for solr


On Tue, Jan 14, 2014 at 10:07 AM, Shawn Heisey <so...@elyograg.org> wrote:
> I have a Java question, for a custom update processor I'm developing. It
> takes an input field of the following format:
>
> field:value;mvfield:value1;mvfield:value2
>
> With an inner delimiter set to a colon and an outer delimiter set to a
> semicolon, this results in two new fields going into the document. The
> field named 'field' has one value and the field named mvfield has two.
>
> This code uses the String#split method,  so it can't deal with the
> delimiter characters being escaped with a backslash.
>
> How can I make the code deal with an escape character (backslash) on the
> two delimiters and the escape character itself? Unless it's absolutely
> necessary or super easy, I do not plan to deal with the full set of regex
> escaped characters.
>
> I can move this discussion to the development list, but I thought I would
> start here.
>
> Thanks,
> Shawn
>
>

Re: Splitting strings in Java - how to escape delimiter characters?

Posted by Shawn Heisey <so...@elyograg.org>.
On 1/14/2014 8:20 AM, Steve Rowe wrote:
> Solrj’s StrUtils.splitSmart() should do exactly what you want - in the first pass, split on semicolon and don’t decode backslash escaping, and then in the inner loop, use the same method to split on colons and decode backslash escaping.  I think :).

Thank you, Yonik and Steve! This seems to work perfectly. Here's what I 
did.  Naturally the whole thing is in a try/catch:

http://apaste.info/4beg

Shawn


Re: Splitting strings in Java - how to escape delimiter characters?

Posted by Steve Rowe <sa...@gmail.com>.
Hi Shawn,

Solrj’s StrUtils.splitSmart() should do exactly what you want - in the first pass, split on semicolon and don’t decode backslash escaping, and then in the inner loop, use the same method to split on colons and decode backslash escaping.  I think :).

Steve
 
On Jan 14, 2014, at 10:07 AM, Shawn Heisey <so...@elyograg.org> wrote:

> I have a Java question, for a custom update processor I'm developing. It
> takes an input field of the following format:
> 
> field:value;mvfield:value1;mvfield:value2
> 
> With an inner delimiter set to a colon and an outer delimiter set to a
> semicolon, this results in two new fields going into the document. The
> field named 'field' has one value and the field named mvfield has two.
> 
> This code uses the String#split method,  so it can't deal with the
> delimiter characters being escaped with a backslash.
> 
> How can I make the code deal with an escape character (backslash) on the
> two delimiters and the escape character itself? Unless it's absolutely
> necessary or super easy, I do not plan to deal with the full set of regex
> escaped characters.
> 
> I can move this discussion to the development list, but I thought I would
> start here.
> 
> Thanks,
> Shawn
> 
>