You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2012/05/21 21:21:43 UTC
[jira] [Commented] (SOLR-3215) We should clone the SolrInputDocument before adding locally and then send that clone to replicas.

    [ https://issues.apache.org/jira/browse/SOLR-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280394#comment-13280394 ] 

Hoss Man commented on SOLR-3215:
--------------------------------


bq. DistributedUpdateProcessor should come right before RunUpdateProcessor (or are you assuming we might support random update processors in-between? Are there use cases for this?)

the main scenerio i've seen/heard mentioned is the idea of processors that are computationally cheap, but increase the size of the document significantly (ie: clone a big ass text field and strip the html from the clone) so you want it to happen after distrib (on every replica) to minimize the amount of data sent over the wire.

----

bq. Intuitively, you expect that processors that run after the distrib processor will not hit the document sent to replicas before the docs are sent to replicas - but it will.

to clarify (because i kept not-understanding what the crux of the issue was here so if i post this comment i'll remember next time w/o needing to ask mark on IRC _again_) if we do *NOT* clone the doc, there is a race condition where local processors executing after the distrib processor may modify the documents before the are serialized and forwarded to one or more shards.

one way to avoid this would be to stop treating the "local" replica as special, and instead have distrib forward back to localhost (via HTTP) just like every other replica) and abort the current request. 
                
> We should clone the SolrInputDocument before adding locally and then send that clone to replicas.
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3215
>                 URL: https://issues.apache.org/jira/browse/SOLR-3215
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>             Fix For: 4.0
>
>         Attachments: SOLR-3215.patch
>
>
> If we don't do this, the behavior is a little unexpected. You cannot avoid having other processors always hit documents twice unless we support using multiple update chains. We have another issue open that should make this better, but I'd like to do this sooner than that. We are going to have to end up cloning anyway when we want to offer the ability to not wait for the local add before sending to replicas.
> Cloning with the current SolrInputDocument, SolrInputField apis is a little scary - there is an Object to contend with - but it seems we can pretty much count on that being a primitive that we don't have to clone?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org