You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Carrie Coy <cc...@ssww.com> on 2015/05/26 21:24:15 UTC

Different behavior (bug?) for RegExTransformer in Solr5

I'm experimenting with Solr5 (5.1.0 1672403 - timpotter - 2015-04-09 
10:37:54).  In my custom DIH, I use a RegExTransformer to load several 
columns, which may or may not be present.  If present, the regexp 
matches and the data loads correctly in both Solr4 and 5. If not present 
and the regexp fails, the column is empty in Solr 4.   But in Solr5 it 
contains the original string to be matched.

In other words, in Solr 5.10, if the 'replaceWith' value is empty, 
'replaceWith' appears to revert to the original string.

Example:

Column 'data' contains:   column1:xxx,column3:yyy

DIH regexp:
<field column="column1"       regex="^.*column1:(.*?),.*$" 
replaceWith="$1"  sourceColName="data" />
<field column="column2"       regex="^.*column2:(.*?),.*$" 
replaceWith="$1"  sourceColName="data" />
<field column="column3"       regex="^.*column3:(.*?),.*$" 
replaceWith="$1"  sourceColName="data" />

solr4:
column1: xxx
column2:
column3: yyy

solr5:
column1:xxx
column2: column1:xxx,column3:yyy
column3: yyy