You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by David Smiley <da...@gmail.com> on 2016/03/16 20:28:48 UTC

Solr UpdateLog & UpdateRequestProcessors

For a project I work on, I have an URP that adds a Lucene Field object to
the SolrInputField.  Normally it's the job of a FieldType to produce a
Lucene Field (createFields()) but my use-case requires data from other
fields.  An URP can do this but a FieldType cannot (somewhat related
to SOLR-4329).  Note that Solr's DocumentBuilder will skip invoking the
FieldType's createField() to get the field if the SolrInputField already
has a Lucene Field.  So far so good.

The problem is that the UpdateLog, invoked by DirectUpdateHandler2, invoked
by RunUpdateProcessor URP (the last URP) passes the final SolrInputDocument
to the UpdateLog to get serialized.  Of course, since it's the last URP to
pass the doc along.  The UpdateLog will in turn consult JavaBinCodec which
has a fallback for types it doesn't know about to emit the classname
string, colon, then toString of the object.  In my opinion, it should
return an error, or at the very least a warning!  And it doesn't know about
Field (nor could it support that), of course.  Note that SolrCloud PeerSync
consults the UpdateLog of replicas to get a new Leader up to date, and an
error will get triggered (and we probably lose the doc).

Is it pointless to haven have an URP produce something that JavaBinCodec
can't serialize (assuming use of the UpdateLog/SolrCloud)? Maybe.  At least
there's the JavaBinCodec.ObjectResolver interface.  And as I mentioned if
there was an early warning/error, an insidious problem wouldn't creep up on
you later.  Before I noticed ObjectResolver I was thinking of filing an
issue related to controlling which URPs apply when, relative to the
UpdateLog.  I wonder if anyone else has any thoughts on all of this.

~ David

-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Solr UpdateLog & UpdateRequestProcessors

Posted by David Smiley <da...@gmail.com>.
FYI the issue I filed is https://issues.apache.org/jira/browse/SOLR-8866
(now with a patch)

On Wed, Mar 16, 2016 at 3:35 PM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> I agree that we should throw an exception if JavaBinCodec's fallback
> serialization is hit, since it won't be deserialized during a log
> reply/peersync.
> Just curious, if the field value was not properly serialized by the
> JavaBinCodec, how was it handled by the DUH2 and written to the index?
>
>
DUH2 ultimately wants a Lucene Document of Lucene Fields.  Solr's
DocumentBuilder converts the SolrInputDocument to this.  And if it sees a
SolrInputField of type Field, it's done for that value.
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Solr UpdateLog & UpdateRequestProcessors

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.
I agree that we should throw an exception if JavaBinCodec's fallback
serialization is hit, since it won't be deserialized during a log
reply/peersync.
Just curious, if the field value was not properly serialized by the
JavaBinCodec, how was it handled by the DUH2 and written to the index?

On Thu, Mar 17, 2016 at 12:58 AM, David Smiley <da...@gmail.com>
wrote:

> For a project I work on, I have an URP that adds a Lucene Field object to
> the SolrInputField.  Normally it's the job of a FieldType to produce a
> Lucene Field (createFields()) but my use-case requires data from other
> fields.  An URP can do this but a FieldType cannot (somewhat related
> to SOLR-4329).  Note that Solr's DocumentBuilder will skip invoking the
> FieldType's createField() to get the field if the SolrInputField already
> has a Lucene Field.  So far so good.
>
> The problem is that the UpdateLog, invoked by DirectUpdateHandler2,
> invoked by RunUpdateProcessor URP (the last URP) passes the final
> SolrInputDocument to the UpdateLog to get serialized.  Of course, since
> it's the last URP to pass the doc along.  The UpdateLog will in turn
> consult JavaBinCodec which has a fallback for types it doesn't know about
> to emit the classname string, colon, then toString of the object.  In my
> opinion, it should return an error, or at the very least a warning!  And it
> doesn't know about Field (nor could it support that), of course.  Note that
> SolrCloud PeerSync consults the UpdateLog of replicas to get a new Leader
> up to date, and an error will get triggered (and we probably lose the doc).
>
> Is it pointless to haven have an URP produce something that JavaBinCodec
> can't serialize (assuming use of the UpdateLog/SolrCloud)? Maybe.  At least
> there's the JavaBinCodec.ObjectResolver interface.  And as I mentioned if
> there was an early warning/error, an insidious problem wouldn't creep up on
> you later.  Before I noticed ObjectResolver I was thinking of filing an
> issue related to controlling which URPs apply when, relative to the
> UpdateLog.  I wonder if anyone else has any thoughts on all of this.
>
> ~ David
>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>