You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dishanker Raj <Di...@adm.uib.no> on 2013/11/29 11:04:21 UTC

Why is 'solr.RegexReplaceProcessorFactory' not changing fields that are being indexed?

Hello!

The following entries in file ‘solrconfig.xml’ are supposed to match and replace an obsolete URL string to a newer one. But after documents are indexed using the ‘/update’ handler I still see the old URL string in the index that should have been replaced while indexing.

<!-- BEGIN rewrite of URL -->	
<updateRequestProcessorChain name="newDomainURL">
	<processor class="solr.RegexReplaceProcessorFactory">
	   <str name="fieldName">id</str>
	   <str name="fieldName">url</str>
	   <str name="pattern">old\.domain\.com</str>
	   <str name="replacement">new.domain.net</str>
	 </processor>
	<processor class="solr.LogUpdateProcessorFactory" />
	<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<!-- END rewrite of URL -->

<requestHandler name="/update" class="solr.UpdateRequestHandler">
	<lst name="defaults">
		<!-- BEGIN rewrite of URL -->
		<str name="update.chain">newDomainURL</str>
		<!-- END rewrite of URL -->
	</lst>	
</requestHandler>

Anyone got any helpful pointers as to why this is not succeeding? Thanks.

Sincerely,
Dishanker Raj

PGP Public Key: http://goo.gl/TulvBO


Re: Why is 'solr.RegexReplaceProcessorFactory' not changing fields that are being indexed?

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, need more data. This works just fine for me when I
copy/paste your example.

Best,
Erick


On Fri, Nov 29, 2013 at 5:04 AM, Dishanker Raj <Di...@adm.uib.no>wrote:

> Hello!
>
> The following entries in file ‘solrconfig.xml’ are supposed to match and
> replace an obsolete URL string to a newer one. But after documents are
> indexed using the ‘/update’ handler I still see the old URL string in the
> index that should have been replaced while indexing.
>
> <!-- BEGIN rewrite of URL -->
> <updateRequestProcessorChain name="newDomainURL">
>         <processor class="solr.RegexReplaceProcessorFactory">
>            <str name="fieldName">id</str>
>            <str name="fieldName">url</str>
>            <str name="pattern">old\.domain\.com</str>
>            <str name="replacement">new.domain.net</str>
>          </processor>
>         <processor class="solr.LogUpdateProcessorFactory" />
>         <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> <!-- END rewrite of URL -->
>
> <requestHandler name="/update" class="solr.UpdateRequestHandler">
>         <lst name="defaults">
>                 <!-- BEGIN rewrite of URL -->
>                 <str name="update.chain">newDomainURL</str>
>                 <!-- END rewrite of URL -->
>         </lst>
> </requestHandler>
>
> Anyone got any helpful pointers as to why this is not succeeding? Thanks.
>
> Sincerely,
> Dishanker Raj
>
> PGP Public Key: http://goo.gl/TulvBO
>
>

Re: Why is 'solr.RegexReplaceProcessorFactory' not changing fields that are being indexed?

Posted by Dishanker Raj <di...@adm.uib.no>.
I had prior to reindexing (following the recent changes to ‘solrconfig.xml’) removed all documents matching the undesired ‘url’ field value pattern (wildcard deletions using ‘curl’).

What finally worked was to remove substitutions of values in the ‘id’ field, restart SolrCloud cluster, and reindex those specific documents containing those undesired ‘url’ field values. Now only the ‘url’ field is manipulated, and that somehow strangely did the trick.

<!-- BEGIN rewrite of URL -->	
<updateRequestProcessorChain name="newDomainURL">
	<processor class="solr.RegexReplaceProcessorFactory”>

	  <!-- REMOVED: <str name="fieldName">id</str> -->

	   <str name="fieldName">url</str>
	   <str name="pattern">old\.domain\.com</str>
	   <str name="replacement">new.domain.net</str>
	 </processor>
	<processor class="solr.LogUpdateProcessorFactory" />
	<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<!-- END rewrite of URL -->

Sincerely,
Dishanker Raj

PGP Public Key: http://goo.gl/TulvBO

On 29 Nov 2013, at 14:59, Jack Krupansky <ja...@basetechnology.com> wrote:

> Any chance that you had already indexed some data before your finalized these configuration settings? That pre-existing data would need to be manually reindexed for the update processor to be effective.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Dishanker Raj
> Sent: Friday, November 29, 2013 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: Why is 'solr.RegexReplaceProcessorFactory' not changing fields that are being indexed?
> 
> Hello!
> 
> The following entries in file ‘solrconfig.xml’ are supposed to match and replace an obsolete URL string to a newer one. But after documents are indexed using the ‘/update’ handler I still see the old URL string in the index that should have been replaced while indexing.
> 
> <!-- BEGIN rewrite of URL -->
> <updateRequestProcessorChain name="newDomainURL">
> <processor class="solr.RegexReplaceProcessorFactory">
>  <str name="fieldName">id</str>
>  <str name="fieldName">url</str>
>  <str name="pattern">old\.domain\.com</str>
>  <str name="replacement">new.domain.net</str>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> <!-- END rewrite of URL -->
> 
> <requestHandler name="/update" class="solr.UpdateRequestHandler">
> <lst name="defaults">
> <!-- BEGIN rewrite of URL -->
> <str name="update.chain">newDomainURL</str>
> <!-- END rewrite of URL -->
> </lst>
> </requestHandler>
> 
> Anyone got any helpful pointers as to why this is not succeeding? Thanks.
> 
> Sincerely,
> Dishanker Raj
> 
> PGP Public Key: http://goo.gl/TulvBO 


Re: Why is 'solr.RegexReplaceProcessorFactory' not changing fields that are being indexed?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Any chance that you had already indexed some data before your finalized 
these configuration settings? That pre-existing data would need to be 
manually reindexed for the update processor to be effective.

-- Jack Krupansky

-----Original Message----- 
From: Dishanker Raj
Sent: Friday, November 29, 2013 5:04 AM
To: solr-user@lucene.apache.org
Subject: Why is 'solr.RegexReplaceProcessorFactory' not changing fields that 
are being indexed?

Hello!

The following entries in file ‘solrconfig.xml’ are supposed to match and 
replace an obsolete URL string to a newer one. But after documents are 
indexed using the ‘/update’ handler I still see the old URL string in the 
index that should have been replaced while indexing.

<!-- BEGIN rewrite of URL -->
<updateRequestProcessorChain name="newDomainURL">
<processor class="solr.RegexReplaceProcessorFactory">
   <str name="fieldName">id</str>
   <str name="fieldName">url</str>
   <str name="pattern">old\.domain\.com</str>
   <str name="replacement">new.domain.net</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<!-- END rewrite of URL -->

<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<!-- BEGIN rewrite of URL -->
<str name="update.chain">newDomainURL</str>
<!-- END rewrite of URL -->
</lst>
</requestHandler>

Anyone got any helpful pointers as to why this is not succeeding? Thanks.

Sincerely,
Dishanker Raj

PGP Public Key: http://goo.gl/TulvBO