You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2013/08/22 17:35:55 UTC
UpdateProcessor not working with DIH, but works with SolrJ
I have an updateProcessor defined. It seems to work perfectly when I
index with SolrJ, but when I use DIH (which I do for a full index
rebuild), it doesn't work. This is the case with both Solr 4.4 and Solr
4.5-SNAPSHOT, svn revision 1516342.
Here's a solrconfig.xml excerpt:
<updateRequestProcessorChain name="nohtml">
<!-- First pass converts entities and strips html. -->
<processor class="solr.HTMLStripFieldUpdateProcessorFactory">
<str name="fieldName">ft_text</str>
<str name="fieldName">ft_subject</str>
<str name="fieldName">keywords</str>
<str name="fieldName">text_preview</str>
</processor>
<!-- Second pass fixes dually-encoded stuff. -->
<processor class="solr.HTMLStripFieldUpdateProcessorFactory">
<str name="fieldName">ft_text</str>
<str name="fieldName">ft_subject</str>
<str name="fieldName">keywords</str>
<str name="fieldName">text_preview</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">nohtml</str>
</lst>
</requestHandler>
If I turn on DEBUG logging for FieldMutatingUpdateProcessorFactory, I
see "replace value" debugs, but the contents of the index are only
changed if the update happens with SolrJ, not with DIH.
A side issue. FieldMutatingUpdateProcessorFactory has the following
line in it, at about line 72:
if (destVal != srcVal) {
Shouldn't this be the following?
if (destVal.equals(srcVal)) {
Thanks,
Shawn
Re: UpdateProcessor not working with DIH, but works with SolrJ
Posted by Shawn Heisey <so...@elyograg.org>.
On 8/22/2013 10:02 AM, Steve Rowe wrote:
> You could declare your update chain as the default by adding 'default="true"' to its declaring element:
>
> <updateRequestProcessorChain name="nohtml" default="true">
>
> and then you wouldn't need to declare it as the default update.chain in either of your request handlers.
If I did this, would it only apply the HTML processor to only the fields
that I have specified in those XML sections? I haven't thought through
the implications, but I think it might be OK.
Thanks,
Shawn
Re: UpdateProcessor not working with DIH, but works with SolrJ
Posted by Steve Rowe <sa...@gmail.com>.
You could declare your update chain as the default by adding 'default="true"' to its declaring element:
<updateRequestProcessorChain name="nohtml" default="true">
and then you wouldn't need to declare it as the default update.chain in either of your request handlers.
On Aug 22, 2013, at 11:57 AM, Shawn Heisey <so...@elyograg.org> wrote:
> On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:
>> You should declare this
>>
>> <str name="update.chain">nohtml</str>
>>
>> in the "defaults" section of the RequestHandler that corresponds to your
>> dataimporthandler. You should have something like this:
>>
>> <requestHandler name="/dataimport"
>> class="org.apache.solr.handler.dataimport.DataImportHandler">
>> <lst name="defaults">
>> <str name="config">dih-config.xml</str>
>> <str name="update.chain">nohtml/str>
>> </lst>
>> </requestHandler>
>>
>> Otherwise the default update chain will be called (and your URP are not
>> part of that). The solrj, behind the scenes, is a client of the /update
>> request handler, that's the reason why using that you can see your URP
>> working.
>
> This results in an error parsing the config, so my cores won't start up. I saw another message via google that talked about using update.processor instead of update.chain, so I tried that as well, with no luck.
>
> Can I ask DIH to use the /update handler that I have declared already?
>
> Thanks,
> Shawn
>
Re: UpdateProcessor not working with DIH, but works with SolrJ
Posted by Andrea Gazzarini <an...@gmail.com>.
Ok, found
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">dih-config.xml</str>
<str name="update.chain">*nohtml**<*/str>
</lst>
</requestHandler>
Of course, my mistake...when I changed the name of the chain I deleted
the "<" char.
Sorry
On 08/22/2013 06:15 PM, Shawn Heisey wrote:
> of "update.chain" so this shouldn't be the problem.
Re: UpdateProcessor not working with DIH, but works with SolrJ
Posted by Shawn Heisey <so...@elyograg.org>.
On 8/22/2013 10:06 AM, Andrea Gazzarini wrote:
> yes, yes of course, you should use your already declared request
> handler...that was just a copied and pasted example :)
>
> I'm curious about what kind of error you got....I copied the snippet
> above from a working core (just replaced the name of the chain)
>
> BTW: AFAIK is the "update.processor" that has been deprecated in favor
> of "update.chain" so this shouldn't be the problem.
Here's the full exception. I use xinclude heavily in my solrconfig.xml.
The xinclude directives are actually almost the only thing that's in
solrconfig.xml.
http://apaste.info/7PB0
I'm going to try setting my update processor to default as recommended
by Steve Rowe.
Thanks,
Shawn
Re: UpdateProcessor not working with DIH, but works with SolrJ
Posted by Andrea Gazzarini <an...@gmail.com>.
yes, yes of course, you should use your already declared request
handler...that was just a copied and pasted example :)
I'm curious about what kind of error you got....I copied the snippet
above from a working core (just replaced the name of the chain)
BTW: AFAIK is the "update.processor" that has been deprecated in favor
of "update.chain" so this shouldn't be the problem.
Best,
Gazza
On 08/22/2013 05:57 PM, Shawn Heisey wrote:
> On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:
>> You should declare this
>>
>> <str name="update.chain">nohtml</str>
>>
>> in the "defaults" section of the RequestHandler that corresponds to your
>> dataimporthandler. You should have something like this:
>>
>> <requestHandler name="/dataimport"
>> class="org.apache.solr.handler.dataimport.DataImportHandler">
>> <lst name="defaults">
>> <str name="config">dih-config.xml</str>
>> <str name="update.chain">nohtml/str>
>> </lst>
>> </requestHandler>
>>
>> Otherwise the default update chain will be called (and your URP are not
>> part of that). The solrj, behind the scenes, is a client of the /update
>> request handler, that's the reason why using that you can see your URP
>> working.
>
> This results in an error parsing the config, so my cores won't start
> up. I saw another message via google that talked about using
> update.processor instead of update.chain, so I tried that as well,
> with no luck.
>
> Can I ask DIH to use the /update handler that I have declared already?
>
> Thanks,
> Shawn
>
Re: UpdateProcessor not working with DIH, but works with SolrJ
Posted by Shawn Heisey <so...@elyograg.org>.
On 8/22/2013 9:42 AM, Andrea Gazzarini wrote:
> You should declare this
>
> <str name="update.chain">nohtml</str>
>
> in the "defaults" section of the RequestHandler that corresponds to your
> dataimporthandler. You should have something like this:
>
> <requestHandler name="/dataimport"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
> <lst name="defaults">
> <str name="config">dih-config.xml</str>
> <str name="update.chain">nohtml/str>
> </lst>
> </requestHandler>
>
> Otherwise the default update chain will be called (and your URP are not
> part of that). The solrj, behind the scenes, is a client of the /update
> request handler, that's the reason why using that you can see your URP
> working.
This results in an error parsing the config, so my cores won't start up.
I saw another message via google that talked about using
update.processor instead of update.chain, so I tried that as well, with
no luck.
Can I ask DIH to use the /update handler that I have declared already?
Thanks,
Shawn
Re: UpdateProcessor not working with DIH, but works with SolrJ
Posted by Andrea Gazzarini <an...@gmail.com>.
You should declare this
<str name="update.chain">nohtml</str>
in the "defaults" section of the RequestHandler that corresponds to your
dataimporthandler. You should have something like this:
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">dih-config.xml</str>
<str name="update.chain">nohtml/str>
</lst>
</requestHandler>
Otherwise the default update chain will be called (and your URP are not
part of that). The solrj, behind the scenes, is a client of the /update
request handler, that's the reason why using that you can see your URP
working.
Best,
Gazza
On 08/22/2013 05:35 PM, Shawn Heisey wrote:
> I have an updateProcessor defined. It seems to work perfectly when I
> index with SolrJ, but when I use DIH (which I do for a full index
> rebuild), it doesn't work. This is the case with both Solr 4.4 and
> Solr 4.5-SNAPSHOT, svn revision 1516342.
>
> Here's a solrconfig.xml excerpt:
>
> <updateRequestProcessorChain name="nohtml">
> <!-- First pass converts entities and strips html. -->
> <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
> <str name="fieldName">ft_text</str>
> <str name="fieldName">ft_subject</str>
> <str name="fieldName">keywords</str>
> <str name="fieldName">text_preview</str>
> </processor>
> <!-- Second pass fixes dually-encoded stuff. -->
> <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
> <str name="fieldName">ft_text</str>
> <str name="fieldName">ft_subject</str>
> <str name="fieldName">keywords</str>
> <str name="fieldName">text_preview</str>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
>
> <requestHandler name="/update" class="solr.UpdateRequestHandler">
> <lst name="defaults">
> <str name="update.chain">nohtml</str>
> </lst>
> </requestHandler>
>
> If I turn on DEBUG logging for FieldMutatingUpdateProcessorFactory, I
> see "replace value" debugs, but the contents of the index are only
> changed if the update happens with SolrJ, not with DIH.
>
> A side issue. FieldMutatingUpdateProcessorFactory has the following
> line in it, at about line 72:
>
> if (destVal != srcVal) {
>
> Shouldn't this be the following?
>
> if (destVal.equals(srcVal)) {
>
> Thanks,
> Shawn