You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2013/08/22 17:35:55 UTC

UpdateProcessor not working with DIH, but works with SolrJ

I have an updateProcessor defined.  It seems to work perfectly when I 
index with SolrJ, but when I use DIH (which I do for a full index 
rebuild), it doesn't work.  This is the case with both Solr 4.4 and Solr 
4.5-SNAPSHOT, svn revision 1516342.

Here's a solrconfig.xml excerpt:

<updateRequestProcessorChain name="nohtml">
   <!-- First pass converts entities and strips html. -->
   <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
     <str name="fieldName">ft_text</str>
     <str name="fieldName">ft_subject</str>
     <str name="fieldName">keywords</str>
     <str name="fieldName">text_preview</str>
   </processor>
   <!-- Second pass fixes dually-encoded stuff. -->
   <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
     <str name="fieldName">ft_text</str>
     <str name="fieldName">ft_subject</str>
     <str name="fieldName">keywords</str>
     <str name="fieldName">text_preview</str>
   </processor>
   <processor class="solr.LogUpdateProcessorFactory" />
   <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

   <requestHandler name="/update" class="solr.UpdateRequestHandler">
     <lst name="defaults">
       <str name="update.chain">nohtml</str>
     </lst>
   </requestHandler>

If I turn on DEBUG logging for FieldMutatingUpdateProcessorFactory, I 
see "replace value" debugs, but the contents of the index are only 
changed if the update happens with SolrJ, not with DIH.

A side issue.  FieldMutatingUpdateProcessorFactory has the following 
line in it, at about line 72:

         if (destVal != srcVal) {

Shouldn't this be the following?

         if (destVal.equals(srcVal)) {

Thanks,
Shawn

Re: UpdateProcessor not working with DIH, but works with SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.
On 8/22/2013 10:02 AM, Steve Rowe wrote:
> You could declare your update chain as the default by adding 'default="true"' to its declaring element:
>
>     <updateRequestProcessorChain name="nohtml" default="true">
>
> and then you wouldn't need to declare it as the default update.chain in either of your request handlers.

If I did this, would it only apply the HTML processor to only the fields 
that I have specified in those XML sections?  I haven't thought through 
the implications, but I think it might be OK.

Thanks,
Shawn


Re: UpdateProcessor not working with DIH, but works with SolrJ

Posted by Steve Rowe <sa...@gmail.com>.
You could declare your update chain as the default by adding 'default="true"' to its declaring element:

   <updateRequestProcessorChain name="nohtml" default="true">

and then you wouldn't need to declare it as the default update.chain in either of your request handlers.

On Aug 22, 2013, at 11:57 AM, Shawn Heisey <so...@elyograg.org> wrote:

> On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:
>> You should declare this
>> 
>> <str name="update.chain">nohtml</str>
>> 
>> in the "defaults" section of the RequestHandler that corresponds to your
>> dataimporthandler. You should have something like this:
>> 
>>     <requestHandler name="/dataimport"
>> class="org.apache.solr.handler.dataimport.DataImportHandler">
>>         <lst name="defaults">
>>             <str name="config">dih-config.xml</str>
>>             <str name="update.chain">nohtml/str>
>>         </lst>
>>     </requestHandler>
>> 
>> Otherwise the default update chain will be called (and your URP are not
>> part of that). The solrj, behind the scenes, is a client of the /update
>> request handler, that's the reason why using that you can see your URP
>> working.
> 
> This results in an error parsing the config, so my cores won't start up.  I saw another message via google that talked about using update.processor instead of update.chain, so I tried that as well, with no luck.
> 
> Can I ask DIH to use the /update handler that I have declared already?
> 
> Thanks,
> Shawn
> 


Re: UpdateProcessor not working with DIH, but works with SolrJ

Posted by Andrea Gazzarini <an...@gmail.com>.
Ok, found

     <requestHandler name="/dataimport" 
class="org.apache.solr.handler.dataimport.DataImportHandler">
         <lst name="defaults">
             <str name="config">dih-config.xml</str>
             <str name="update.chain">*nohtml**<*/str>
         </lst>
     </requestHandler>

Of course, my mistake...when I changed the name of the chain I deleted 
the "<" char.
Sorry

On 08/22/2013 06:15 PM, Shawn Heisey wrote:
> of "update.chain" so this shouldn't be the problem. 


Re: UpdateProcessor not working with DIH, but works with SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.
On 8/22/2013 10:06 AM, Andrea Gazzarini wrote:
> yes, yes of course, you should use your already declared request
> handler...that was just a copied and pasted example :)
>
> I'm curious about what kind of error you got....I copied the snippet
> above from a working core (just replaced the name of the chain)
>
> BTW: AFAIK is the "update.processor" that has been deprecated in favor
> of "update.chain" so this shouldn't be the problem.

Here's the full exception.  I use xinclude heavily in my solrconfig.xml. 
  The xinclude directives are actually almost the only thing that's in 
solrconfig.xml.

http://apaste.info/7PB0

I'm going to try setting my update processor to default as recommended 
by Steve Rowe.

Thanks,
Shawn


Re: UpdateProcessor not working with DIH, but works with SolrJ

Posted by Andrea Gazzarini <an...@gmail.com>.
yes, yes of course, you should use your already declared request 
handler...that was just a copied and pasted example :)

I'm curious about what kind of error you got....I copied the snippet 
above from a working core (just replaced the name of the chain)

BTW: AFAIK is the "update.processor" that has been deprecated in favor 
of "update.chain" so this shouldn't be the problem.

Best,
Gazza

On 08/22/2013 05:57 PM, Shawn Heisey wrote:
> On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:
>> You should declare this
>>
>> <str name="update.chain">nohtml</str>
>>
>> in the "defaults" section of the RequestHandler that corresponds to your
>> dataimporthandler. You should have something like this:
>>
>>      <requestHandler name="/dataimport"
>> class="org.apache.solr.handler.dataimport.DataImportHandler">
>>          <lst name="defaults">
>>              <str name="config">dih-config.xml</str>
>>              <str name="update.chain">nohtml/str>
>>          </lst>
>>      </requestHandler>
>>
>> Otherwise the default update chain will be called (and your URP are not
>> part of that). The solrj, behind the scenes, is a client of the /update
>> request handler, that's the reason why using that you can see your URP
>> working.
>
> This results in an error parsing the config, so my cores won't start 
> up.  I saw another message via google that talked about using 
> update.processor instead of update.chain, so I tried that as well, 
> with no luck.
>
> Can I ask DIH to use the /update handler that I have declared already?
>
> Thanks,
> Shawn
>


Re: UpdateProcessor not working with DIH, but works with SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.
On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:
> You should declare this
>
> <str name="update.chain">nohtml</str>
>
> in the "defaults" section of the RequestHandler that corresponds to your
> dataimporthandler. You should have something like this:
>
>      <requestHandler name="/dataimport"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>          <lst name="defaults">
>              <str name="config">dih-config.xml</str>
>              <str name="update.chain">nohtml/str>
>          </lst>
>      </requestHandler>
>
> Otherwise the default update chain will be called (and your URP are not
> part of that). The solrj, behind the scenes, is a client of the /update
> request handler, that's the reason why using that you can see your URP
> working.

This results in an error parsing the config, so my cores won't start up. 
  I saw another message via google that talked about using 
update.processor instead of update.chain, so I tried that as well, with 
no luck.

Can I ask DIH to use the /update handler that I have declared already?

Thanks,
Shawn


Re: UpdateProcessor not working with DIH, but works with SolrJ

Posted by Andrea Gazzarini <an...@gmail.com>.
You should declare this

<str name="update.chain">nohtml</str>

in the "defaults" section of the RequestHandler that corresponds to your 
dataimporthandler. You should have something like this:

     <requestHandler name="/dataimport" 
class="org.apache.solr.handler.dataimport.DataImportHandler">
         <lst name="defaults">
             <str name="config">dih-config.xml</str>
             <str name="update.chain">nohtml/str>
         </lst>
     </requestHandler>

Otherwise the default update chain will be called (and your URP are not 
part of that). The solrj, behind the scenes, is a client of the /update 
request handler, that's the reason why using that you can see your URP 
working.

Best,
Gazza


On 08/22/2013 05:35 PM, Shawn Heisey wrote:
> I have an updateProcessor defined.  It seems to work perfectly when I 
> index with SolrJ, but when I use DIH (which I do for a full index 
> rebuild), it doesn't work.  This is the case with both Solr 4.4 and 
> Solr 4.5-SNAPSHOT, svn revision 1516342.
>
> Here's a solrconfig.xml excerpt:
>
> <updateRequestProcessorChain name="nohtml">
>   <!-- First pass converts entities and strips html. -->
>   <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
>     <str name="fieldName">ft_text</str>
>     <str name="fieldName">ft_subject</str>
>     <str name="fieldName">keywords</str>
>     <str name="fieldName">text_preview</str>
>   </processor>
>   <!-- Second pass fixes dually-encoded stuff. -->
>   <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
>     <str name="fieldName">ft_text</str>
>     <str name="fieldName">ft_subject</str>
>     <str name="fieldName">keywords</str>
>     <str name="fieldName">text_preview</str>
>   </processor>
>   <processor class="solr.LogUpdateProcessorFactory" />
>   <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
>
>   <requestHandler name="/update" class="solr.UpdateRequestHandler">
>     <lst name="defaults">
>       <str name="update.chain">nohtml</str>
>     </lst>
>   </requestHandler>
>
> If I turn on DEBUG logging for FieldMutatingUpdateProcessorFactory, I 
> see "replace value" debugs, but the contents of the index are only 
> changed if the update happens with SolrJ, not with DIH.
>
> A side issue.  FieldMutatingUpdateProcessorFactory has the following 
> line in it, at about line 72:
>
>         if (destVal != srcVal) {
>
> Shouldn't this be the following?
>
>         if (destVal.equals(srcVal)) {
>
> Thanks,
> Shawn