You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John Nielsen <jn...@mcb.dk> on 2012/12/13 15:53:40 UTC

Strange data-loss problem on one of our cores

Hi all,

We are seeing a strange problem on our 2-node solr4 cluster. This problem
has resultet in data loss.

We have two servers, varnish01 and varnish02. Zookeeper is running on
varnish02, but in a separate jvm.

We index directly to varnish02 and we read from varnish01. Data is thus
replicated from varnish02 to varnish01.

I found this in the varnish01 log:

*INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
status=0 QTime=42
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
status=0 QTime=41
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
status=0 QTime=33
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
status=0 QTime=33
Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
SEVERE: shard update error StdNode:
http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at:
http://varnish02.lynero.net:8000/solr/default1_Norwegian
    at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
    at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
    at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
    at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.http.NoHttpResponseException: The target server
failed to respond
    at
org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
    at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
    at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
    at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
    at
org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
    at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
    at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
    at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
    at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
    at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
    at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
    at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
    at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
    ... 11 more

Dec 13, 2012 12:23:39 PM
org.apache.solr.update.processor.DistributedUpdateProcessor doFinish
INFO: try and ask http://varnish02.lynero.net:8000/solr to recover*

It looks like it is sending updates from varnish01 to varnish02. I am not
sure for what since we only index on varnish02. Updates should never be
going from varnish01 to varnish02.

Meanwhile on varnish02:

*INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=16
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=15
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=16
Dec 13, 2012 12:23:42 PM org.apache.solr.handler.admin.CoreAdminHandler
handleRequestRecoveryAction
INFO: It has been requested that we recover*
*Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=1
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select/
params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=1
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=1
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=1
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=26
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=22
Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
doRecovery
Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
doRecovery
INFO: Running recovery - first canceling any ongoing recovery
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=25
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=24
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=20
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=25
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=23
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=21
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=23
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=16
Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run
INFO: Starting recovery process.  core=default1_Norwegian
recoveringAfterStartup=false
Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader
updateClusterState
INFO: Updating cloud state from ZooKeeper...
Dec 13, 2012 12:23:42 PM
org.apache.solr.update.processor.LogUpdateProcessor finish*

And less than a second later:

*Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Attempting to PeerSync from
http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
- recoveringAfterStartup=false
Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=default1_Norwegian url=
http://varnish02.lynero.net:8000/solr START replicas=[
http://varnish01.lynero.net:8000/solr/default1_Norwegian/] nUpdates=100
Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
WARNING: PeerSync: core=default1_Norwegian url=
http://varnish02.lynero.net:8000/solr too many updates received since start
- startingUpdates no longer overlaps with our currentUpdates
Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: PeerSync Recovery was not successful - trying replication.
core=default1_Norwegian
Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Starting Replication Recovery. core=default1_Norwegian
Dec 13, 2012 12:23:42 PM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
process
INFO: A cluster state change has occurred - updating...*

State change on varnish01 at the same time:

*Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
process
INFO: A cluster state change has occurred - updating...*
*
*And a few seconds later on varnish02, the recovery finishes:
*
Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Replication Recovery was successful - registering as Active.
core=default1_Norwegian
Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Finished recovery process. core=default1_Norwegian
Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=8
Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
updateClusterState
INFO: Updating cloud state from ZooKeeper... *

Which is picked up on varnish01:

*Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader$2
process
INFO: A cluster state change has occurred - updating...*

It looks like it replicated successfully, only it didnt. The
default1_Norwegian core on varnish01 now has 55.071 docs and the same core
on varnish02 has 35.088 docs.

I checked the log files for both JVM's and no stop-the-world GC were taking
place.

There is also nothing in the zookeeper log of interest that I can see.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
post@mcb.dk
www.mcb.dk

Re: Strange data-loss problem on one of our cores

Posted by John Nielsen <jn...@mcb.dk>.
Awesome!

http://host:port/solr/admin/cores is exactly what i needed!



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
post@mcb.dk
www.mcb.dk



On Fri, Dec 14, 2012 at 1:21 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> You must use the core's name and not use the collection name so you have
> to know which core is on which server.
> http://host:port/solr/corename/select
>
> You can use the cores handler to find out about the cores on the node:
> http://host:port/solr/admin/cores
>
> You can also use luke for this. It returns the same stats as in the
> interface:
> http://host:port/solr/corename/admin/luke
>
> -----Original message-----
> > From:John Nielsen <jn...@mcb.dk>
> > Sent: Fri 14-Dec-2012 13:16
> > To: solr-user@lucene.apache.org
> > Subject: Re: Strange data-loss problem on one of our cores
> >
> > I'm building a simple tool which will help us monitor the solr cores for
> > this problem. Basically it does a q=*:* on both servers on each cores and
> > compares numFound of each result. Problem is that since this is a cloud
> > setup, i can't be sure which server gets me the result. Is there a
> > parameter I can add to the GET requests that will lock the request to a
> > specific node in the cluster, treating the server receiving the request
> as
> > a standalone server as opposed to a member of a cluster?
> >
> > I tried googeling it without luck.
> >
> >
> >
> > --
> > Med venlig hilsen / Best regards
> >
> > *John Nielsen*
> > Programmer
> >
> >
> >
> > *MCB A/S*
> > Enghaven 15
> > DK-7500 Holstebro
> >
> > Kundeservice: +45 9610 2824
> > post@mcb.dk
> > www.mcb.dk
> >
> >
> >
> > On Fri, Dec 14, 2012 at 12:36 PM, Markus Jelsma
> > <ma...@openindex.io>wrote:
> >
> > > We did not solve it but reindexing can remedy the problem.
> > >
> > > -----Original message-----
> > > > From:John Nielsen <jn...@mcb.dk>
> > > > Sent: Fri 14-Dec-2012 12:31
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Strange data-loss problem on one of our cores
> > > >
> > > > How did you solve the problem?
> > > >
> > > >
> > > > --
> > > > Med venlig hilsen / Best regards
> > > >
> > > > *John Nielsen*
> > > > Programmer
> > > >
> > > >
> > > >
> > > > *MCB A/S*
> > > > Enghaven 15
> > > > DK-7500 Holstebro
> > > >
> > > > Kundeservice: +45 9610 2824
> > > > post@mcb.dk
> > > > www.mcb.dk
> > > >
> > > >
> > > >
> > > > On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
> > > > <ma...@openindex.io>wrote:
> > > >
> > > > > FYI, we observe the same issue, after some time (days, months) a
> > > cluster
> > > > > running an older trunk version has at least two shards where the
> > > leader and
> > > > > the replica do not contain the same number of records. No recovery
> is
> > > > > attempted, it seems it thinks everything is alright. Also, one
> core of
> > > one
> > > > > of the unsynced shards waits forever loading
> > > > > /replication?command=detail&wt=json, other cores load it in a few
> ms.
> > > Both
> > > > > cores of another unsynced shard does not show this problem.
> > > > >
> > > > > -----Original message-----
> > > > > > From:John Nielsen <jn...@mcb.dk>
> > > > > > Sent: Fri 14-Dec-2012 11:50
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Subject: Re: Strange data-loss problem on one of our cores
> > > > > >
> > > > > > I did a manual commit, and we are still missing docs, so it
> doesn't
> > > look
> > > > > > like the search race condition you mention.
> > > > > >
> > > > > > My boss wasn't happy when i mentioned that I wanted to try out
> > > unreleased
> > > > > > code. Ill get him won over though and return with my findings. It
> > > will
> > > > > > probably be some time next week.
> > > > > >
> > > > > > Thanks for your help.
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Med venlig hilsen / Best regards
> > > > > >
> > > > > > *John Nielsen*
> > > > > > Programmer
> > > > > >
> > > > > >
> > > > > >
> > > > > > *MCB A/S*
> > > > > > Enghaven 15
> > > > > > DK-7500 Holstebro
> > > > > >
> > > > > > Kundeservice: +45 9610 2824
> > > > > > post@mcb.dk
> > > > > > www.mcb.dk
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <
> markrmiller@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Couple things to start:
> > > > > > >
> > > > > > > By default SolrCloud distributes updates a doc at a time. So
> if you
> > > > > have 1
> > > > > > > shard, whatever node you index too, it will send updates to the
> > > other.
> > > > > > > Replication is only used for recovery, not distributing data.
> So
> > > for
> > > > > some
> > > > > > > reason, there is an IOException when it tries to forward.
> > > > > > >
> > > > > > > The other issue is not something that Ive seen reported.
> Can/did
> > > you
> > > > > try
> > > > > > > and do another hard commit to make sure you had the latest
> search
> > > open
> > > > > when
> > > > > > > checking the # of docs on each node? There was previously a
> race
> > > around
> > > > > > > commit that could cause some issues around expected visibility.
> > > > > > >
> > > > > > > If you are able to, you might try out a nightly build - 4.1
> will be
> > > > > ready
> > > > > > > very soon and has numerous bug fixes for SolrCloud.
> > > > > > >
> > > > > > > - Mark
> > > > > > >
> > > > > > > On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > We are seeing a strange problem on our 2-node solr4 cluster.
> This
> > > > > problem
> > > > > > > > has resultet in data loss.
> > > > > > > >
> > > > > > > > We have two servers, varnish01 and varnish02. Zookeeper is
> > > running on
> > > > > > > > varnish02, but in a separate jvm.
> > > > > > > >
> > > > > > > > We index directly to varnish02 and we read from varnish01.
> Data
> > > is
> > > > > thus
> > > > > > > > replicated from varnish02 to varnish01.
> > > > > > > >
> > > > > > > > I found this in the varnish01 log:
> > > > > > > >
> > > > > > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=42
> > > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=41
> > > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=33
> > > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=33
> > > > > > > > Dec 13, 2012 12:23:39 PM
> org.apache.solr.common.SolrException log
> > > > > > > > SEVERE: shard update error StdNode:
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> > > > > > > :
> > > > > > > > IOException occured when talking to server at:
> > > > > > > > http://varnish02.lynero.net:8000/solr/default1_Norwegian
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> > > > > > > >    at
> > > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > > > > >    at
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > > > > >    at
> > > > > > > >
> > > > >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > > > > >    at
> > > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > > > > >    at
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > > > > > >    at java.lang.Thread.run(Thread.java:636)
> > > > > > > > Caused by: org.apache.http.NoHttpResponseException: The
> target
> > > server
> > > > > > > > failed to respond
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> > > > > > > >    at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> > > > > > > >    ... 11 more
> > > > > > > >
> > > > > > > > Dec 13, 2012 12:23:39 PM
> > > > > > > > org.apache.solr.update.processor.DistributedUpdateProcessor
> > > doFinish
> > > > > > > > INFO: try and ask http://varnish02.lynero.net:8000/solr to
> > > recover*
> > > > > > > >
> > > > > > > > It looks like it is sending updates from varnish01 to
> varnish02.
> > > I
> > > > > am not
> > > > > > > > sure for what since we only index on varnish02. Updates
> should
> > > never
> > > > > be
> > > > > > > > going from varnish01 to varnish02.
> > > > > > > >
> > > > > > > > Meanwhile on varnish02:
> > > > > > > >
> > > > > > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=16
> > > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=15
> > > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=16
> > > > > > > > Dec 13, 2012 12:23:42 PM
> > > > > org.apache.solr.handler.admin.CoreAdminHandler
> > > > > > > > handleRequestRecoveryAction
> > > > > > > > INFO: It has been requested that we recover*
> > > > > > > > *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > > >
> > > > > > >
> > > > >
> > >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > > > > >
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > > status=0 QTime=1
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Danish] webapp=/solr path=/select/
> > > > > > > > params={fq=site_guid:(2810678)&q=win} hits=0 status=0
> QTime=17
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > > >
> > > > > > >
> > > > >
> > >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> > > > > > >
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > > status=0 QTime=1
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > > >
> > > > > > >
> > > > >
> > >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > > > > >
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > > status=0 QTime=1
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > > >
> > > > > > >
> > > > >
> > >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> > > > > > >
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > > status=0 QTime=1
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=26
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=22
> > > > > > > > Dec 13, 2012 12:23:42 PM
> > > org.apache.solr.update.DefaultSolrCoreState
> > > > > > > > doRecovery
> > > > > > > > Dec 13, 2012 12:23:42 PM
> > > org.apache.solr.update.DefaultSolrCoreState
> > > > > > > > doRecovery
> > > > > > > > INFO: Running recovery - first canceling any ongoing recovery
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=25
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=24
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=20
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=25
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=23
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=21
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=23
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > > params={distrib.from=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > > }
> > > > > > > > status=0 QTime=16
> > > > > > > > Dec 13, 2012 12:23:42 PM
> org.apache.solr.cloud.RecoveryStrategy
> > > run
> > > > > > > > INFO: Starting recovery process.  core=default1_Norwegian
> > > > > > > > recoveringAfterStartup=false
> > > > > > > > Dec 13, 2012 12:23:42 PM
> > > org.apache.solr.common.cloud.ZkStateReader
> > > > > > > > updateClusterState
> > > > > > > > INFO: Updating cloud state from ZooKeeper...
> > > > > > > > Dec 13, 2012 12:23:42 PM
> > > > > > > > org.apache.solr.update.processor.LogUpdateProcessor finish*
> > > > > > > >
> > > > > > > > And less than a second later:
> > > > > > > >
> > > > > > > > *Dec 13, 2012 12:23:42 PM
> org.apache.solr.cloud.RecoveryStrategy
> > > > > > > doRecovery
> > > > > > > > INFO: Attempting to PeerSync from
> > > > > > > >
> > > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> > > > > > > > - recoveringAfterStartup=false
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > > > > > INFO: PeerSync: core=default1_Norwegian url=
> > > > > > > > http://varnish02.lynero.net:8000/solr START replicas=[
> > > > > > > > http://varnish01.lynero.net:8000/solr/default1_Norwegian/]
> > > > > nUpdates=100
> > > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > > > > > WARNING: PeerSync: core=default1_Norwegian url=
> > > > > > > > http://varnish02.lynero.net:8000/solr too many updates
> received
> > > > > since
> > > > > > > start
> > > > > > > > - startingUpdates no longer overlaps with our currentUpdates
> > > > > > > > Dec 13, 2012 12:23:42 PM
> org.apache.solr.cloud.RecoveryStrategy
> > > > > > > doRecovery
> > > > > > > > INFO: PeerSync Recovery was not successful - trying
> replication.
> > > > > > > > core=default1_Norwegian
> > > > > > > > Dec 13, 2012 12:23:42 PM
> org.apache.solr.cloud.RecoveryStrategy
> > > > > > > doRecovery
> > > > > > > > INFO: Starting Replication Recovery. core=default1_Norwegian
> > > > > > > > Dec 13, 2012 12:23:42 PM
> > > > > org.apache.solr.client.solrj.impl.HttpClientUtil
> > > > > > > > createClient
> > > > > > > > INFO: Creating new http client,
> > > > > > > >
> > > > >
> > >
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > > > > > > > Dec 13, 2012 12:23:42 PM
> > > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > > > > process
> > > > > > > > INFO: A cluster state change has occurred - updating...*
> > > > > > > >
> > > > > > > > State change on varnish01 at the same time:
> > > > > > > >
> > > > > > > > *Dec 13, 2012 12:23:42 PM
> > > > > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > > > > process
> > > > > > > > INFO: A cluster state change has occurred - updating...*
> > > > > > > > *
> > > > > > > > *And a few seconds later on varnish02, the recovery finishes:
> > > > > > > > *
> > > > > > > > Dec 13, 2012 12:23:48 PM
> org.apache.solr.cloud.RecoveryStrategy
> > > > > > > doRecovery
> > > > > > > > INFO: Replication Recovery was successful - registering as
> > > Active.
> > > > > > > > core=default1_Norwegian
> > > > > > > > Dec 13, 2012 12:23:48 PM
> org.apache.solr.cloud.RecoveryStrategy
> > > > > > > doRecovery
> > > > > > > > INFO: Finished recovery process. core=default1_Norwegian
> > > > > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore
> execute
> > > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > > >
> > > > > > >
> > > > >
> > >
> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > > > > >
> > > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > > > > >
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > > status=0 QTime=8
> > > > > > > > Dec 13, 2012 12:23:48 PM
> > > org.apache.solr.common.cloud.ZkStateReader
> > > > > > > > updateClusterState
> > > > > > > > INFO: Updating cloud state from ZooKeeper... *
> > > > > > > >
> > > > > > > > Which is picked up on varnish01:
> > > > > > > >
> > > > > > > > *Dec 13, 2012 12:23:48 PM
> > > > > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > > > > process
> > > > > > > > INFO: A cluster state change has occurred - updating...*
> > > > > > > >
> > > > > > > > It looks like it replicated successfully, only it didnt. The
> > > > > > > > default1_Norwegian core on varnish01 now has 55.071 docs and
> the
> > > same
> > > > > > > core
> > > > > > > > on varnish02 has 35.088 docs.
> > > > > > > >
> > > > > > > > I checked the log files for both JVM's and no stop-the-world
> GC
> > > were
> > > > > > > taking
> > > > > > > > place.
> > > > > > > >
> > > > > > > > There is also nothing in the zookeeper log of interest that
> I can
> > > > > see.
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Med venlig hilsen / Best regards
> > > > > > > >
> > > > > > > > *John Nielsen*
> > > > > > > > Programmer
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > *MCB A/S*
> > > > > > > > Enghaven 15
> > > > > > > > DK-7500 Holstebro
> > > > > > > >
> > > > > > > > Kundeservice: +45 9610 2824
> > > > > > > > post@mcb.dk
> > > > > > > > www.mcb.dk
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

RE: Strange data-loss problem on one of our cores

Posted by Markus Jelsma <ma...@openindex.io>.
You must use the core's name and not use the collection name so you have to know which core is on which server.
http://host:port/solr/corename/select

You can use the cores handler to find out about the cores on the node:
http://host:port/solr/admin/cores

You can also use luke for this. It returns the same stats as in the interface:
http://host:port/solr/corename/admin/luke
 
-----Original message-----
> From:John Nielsen <jn...@mcb.dk>
> Sent: Fri 14-Dec-2012 13:16
> To: solr-user@lucene.apache.org
> Subject: Re: Strange data-loss problem on one of our cores
> 
> I'm building a simple tool which will help us monitor the solr cores for
> this problem. Basically it does a q=*:* on both servers on each cores and
> compares numFound of each result. Problem is that since this is a cloud
> setup, i can't be sure which server gets me the result. Is there a
> parameter I can add to the GET requests that will lock the request to a
> specific node in the cluster, treating the server receiving the request as
> a standalone server as opposed to a member of a cluster?
> 
> I tried googeling it without luck.
> 
> 
> 
> -- 
> Med venlig hilsen / Best regards
> 
> *John Nielsen*
> Programmer
> 
> 
> 
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
> 
> Kundeservice: +45 9610 2824
> post@mcb.dk
> www.mcb.dk
> 
> 
> 
> On Fri, Dec 14, 2012 at 12:36 PM, Markus Jelsma
> <ma...@openindex.io>wrote:
> 
> > We did not solve it but reindexing can remedy the problem.
> >
> > -----Original message-----
> > > From:John Nielsen <jn...@mcb.dk>
> > > Sent: Fri 14-Dec-2012 12:31
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Strange data-loss problem on one of our cores
> > >
> > > How did you solve the problem?
> > >
> > >
> > > --
> > > Med venlig hilsen / Best regards
> > >
> > > *John Nielsen*
> > > Programmer
> > >
> > >
> > >
> > > *MCB A/S*
> > > Enghaven 15
> > > DK-7500 Holstebro
> > >
> > > Kundeservice: +45 9610 2824
> > > post@mcb.dk
> > > www.mcb.dk
> > >
> > >
> > >
> > > On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
> > > <ma...@openindex.io>wrote:
> > >
> > > > FYI, we observe the same issue, after some time (days, months) a
> > cluster
> > > > running an older trunk version has at least two shards where the
> > leader and
> > > > the replica do not contain the same number of records. No recovery is
> > > > attempted, it seems it thinks everything is alright. Also, one core of
> > one
> > > > of the unsynced shards waits forever loading
> > > > /replication?command=detail&wt=json, other cores load it in a few ms.
> > Both
> > > > cores of another unsynced shard does not show this problem.
> > > >
> > > > -----Original message-----
> > > > > From:John Nielsen <jn...@mcb.dk>
> > > > > Sent: Fri 14-Dec-2012 11:50
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Strange data-loss problem on one of our cores
> > > > >
> > > > > I did a manual commit, and we are still missing docs, so it doesn't
> > look
> > > > > like the search race condition you mention.
> > > > >
> > > > > My boss wasn't happy when i mentioned that I wanted to try out
> > unreleased
> > > > > code. Ill get him won over though and return with my findings. It
> > will
> > > > > probably be some time next week.
> > > > >
> > > > > Thanks for your help.
> > > > >
> > > > >
> > > > > --
> > > > > Med venlig hilsen / Best regards
> > > > >
> > > > > *John Nielsen*
> > > > > Programmer
> > > > >
> > > > >
> > > > >
> > > > > *MCB A/S*
> > > > > Enghaven 15
> > > > > DK-7500 Holstebro
> > > > >
> > > > > Kundeservice: +45 9610 2824
> > > > > post@mcb.dk
> > > > > www.mcb.dk
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <ma...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Couple things to start:
> > > > > >
> > > > > > By default SolrCloud distributes updates a doc at a time. So if you
> > > > have 1
> > > > > > shard, whatever node you index too, it will send updates to the
> > other.
> > > > > > Replication is only used for recovery, not distributing data. So
> > for
> > > > some
> > > > > > reason, there is an IOException when it tries to forward.
> > > > > >
> > > > > > The other issue is not something that Ive seen reported. Can/did
> > you
> > > > try
> > > > > > and do another hard commit to make sure you had the latest search
> > open
> > > > when
> > > > > > checking the # of docs on each node? There was previously a race
> > around
> > > > > > commit that could cause some issues around expected visibility.
> > > > > >
> > > > > > If you are able to, you might try out a nightly build - 4.1 will be
> > > > ready
> > > > > > very soon and has numerous bug fixes for SolrCloud.
> > > > > >
> > > > > > - Mark
> > > > > >
> > > > > > On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > We are seeing a strange problem on our 2-node solr4 cluster. This
> > > > problem
> > > > > > > has resultet in data loss.
> > > > > > >
> > > > > > > We have two servers, varnish01 and varnish02. Zookeeper is
> > running on
> > > > > > > varnish02, but in a separate jvm.
> > > > > > >
> > > > > > > We index directly to varnish02 and we read from varnish01. Data
> > is
> > > > thus
> > > > > > > replicated from varnish02 to varnish01.
> > > > > > >
> > > > > > > I found this in the varnish01 log:
> > > > > > >
> > > > > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=42
> > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=41
> > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=33
> > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=33
> > > > > > > Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
> > > > > > > SEVERE: shard update error StdNode:
> > > > > > >
> > > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> > > > > > :
> > > > > > > IOException occured when talking to server at:
> > > > > > > http://varnish02.lynero.net:8000/solr/default1_Norwegian
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> > > > > > >    at
> > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > > > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > > > >    at
> > > > > > >
> > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > > > >    at
> > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > > > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > > > > >    at java.lang.Thread.run(Thread.java:636)
> > > > > > > Caused by: org.apache.http.NoHttpResponseException: The target
> > server
> > > > > > > failed to respond
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> > > > > > >    at
> > > > > > >
> > > > > >
> > > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> > > > > > >    ... 11 more
> > > > > > >
> > > > > > > Dec 13, 2012 12:23:39 PM
> > > > > > > org.apache.solr.update.processor.DistributedUpdateProcessor
> > doFinish
> > > > > > > INFO: try and ask http://varnish02.lynero.net:8000/solr to
> > recover*
> > > > > > >
> > > > > > > It looks like it is sending updates from varnish01 to varnish02.
> > I
> > > > am not
> > > > > > > sure for what since we only index on varnish02. Updates should
> > never
> > > > be
> > > > > > > going from varnish01 to varnish02.
> > > > > > >
> > > > > > > Meanwhile on varnish02:
> > > > > > >
> > > > > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=16
> > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=15
> > > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=16
> > > > > > > Dec 13, 2012 12:23:42 PM
> > > > org.apache.solr.handler.admin.CoreAdminHandler
> > > > > > > handleRequestRecoveryAction
> > > > > > > INFO: It has been requested that we recover*
> > > > > > > *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > >
> > > > > >
> > > >
> > params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > > > >
> > > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > > > >
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > status=0 QTime=1
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Danish] webapp=/solr path=/select/
> > > > > > > params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > >
> > > > > >
> > > >
> > params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > > > > >
> > > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> > > > > >
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > status=0 QTime=1
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > >
> > > > > >
> > > >
> > params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > > > >
> > > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > > > >
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > status=0 QTime=1
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > >
> > > > > >
> > > >
> > params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > > > > >
> > > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> > > > > >
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > status=0 QTime=1
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=26
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=22
> > > > > > > Dec 13, 2012 12:23:42 PM
> > org.apache.solr.update.DefaultSolrCoreState
> > > > > > > doRecovery
> > > > > > > Dec 13, 2012 12:23:42 PM
> > org.apache.solr.update.DefaultSolrCoreState
> > > > > > > doRecovery
> > > > > > > INFO: Running recovery - first canceling any ongoing recovery
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=25
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=24
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=20
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=25
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=23
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=21
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=23
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > > params={distrib.from=
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > > }
> > > > > > > status=0 QTime=16
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > run
> > > > > > > INFO: Starting recovery process.  core=default1_Norwegian
> > > > > > > recoveringAfterStartup=false
> > > > > > > Dec 13, 2012 12:23:42 PM
> > org.apache.solr.common.cloud.ZkStateReader
> > > > > > > updateClusterState
> > > > > > > INFO: Updating cloud state from ZooKeeper...
> > > > > > > Dec 13, 2012 12:23:42 PM
> > > > > > > org.apache.solr.update.processor.LogUpdateProcessor finish*
> > > > > > >
> > > > > > > And less than a second later:
> > > > > > >
> > > > > > > *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > > doRecovery
> > > > > > > INFO: Attempting to PeerSync from
> > > > > > >
> > > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> > > > > > > - recoveringAfterStartup=false
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > > > > INFO: PeerSync: core=default1_Norwegian url=
> > > > > > > http://varnish02.lynero.net:8000/solr START replicas=[
> > > > > > > http://varnish01.lynero.net:8000/solr/default1_Norwegian/]
> > > > nUpdates=100
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > > > > WARNING: PeerSync: core=default1_Norwegian url=
> > > > > > > http://varnish02.lynero.net:8000/solr too many updates received
> > > > since
> > > > > > start
> > > > > > > - startingUpdates no longer overlaps with our currentUpdates
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > > doRecovery
> > > > > > > INFO: PeerSync Recovery was not successful - trying replication.
> > > > > > > core=default1_Norwegian
> > > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > > doRecovery
> > > > > > > INFO: Starting Replication Recovery. core=default1_Norwegian
> > > > > > > Dec 13, 2012 12:23:42 PM
> > > > org.apache.solr.client.solrj.impl.HttpClientUtil
> > > > > > > createClient
> > > > > > > INFO: Creating new http client,
> > > > > > >
> > > >
> > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > > > > > > Dec 13, 2012 12:23:42 PM
> > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > > > process
> > > > > > > INFO: A cluster state change has occurred - updating...*
> > > > > > >
> > > > > > > State change on varnish01 at the same time:
> > > > > > >
> > > > > > > *Dec 13, 2012 12:23:42 PM
> > > > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > > > process
> > > > > > > INFO: A cluster state change has occurred - updating...*
> > > > > > > *
> > > > > > > *And a few seconds later on varnish02, the recovery finishes:
> > > > > > > *
> > > > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > > doRecovery
> > > > > > > INFO: Replication Recovery was successful - registering as
> > Active.
> > > > > > > core=default1_Norwegian
> > > > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > > doRecovery
> > > > > > > INFO: Finished recovery process. core=default1_Norwegian
> > > > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> > > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > > >
> > > > > >
> > > >
> > params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > > > >
> > > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > > > >
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > > status=0 QTime=8
> > > > > > > Dec 13, 2012 12:23:48 PM
> > org.apache.solr.common.cloud.ZkStateReader
> > > > > > > updateClusterState
> > > > > > > INFO: Updating cloud state from ZooKeeper... *
> > > > > > >
> > > > > > > Which is picked up on varnish01:
> > > > > > >
> > > > > > > *Dec 13, 2012 12:23:48 PM
> > > > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > > > process
> > > > > > > INFO: A cluster state change has occurred - updating...*
> > > > > > >
> > > > > > > It looks like it replicated successfully, only it didnt. The
> > > > > > > default1_Norwegian core on varnish01 now has 55.071 docs and the
> > same
> > > > > > core
> > > > > > > on varnish02 has 35.088 docs.
> > > > > > >
> > > > > > > I checked the log files for both JVM's and no stop-the-world GC
> > were
> > > > > > taking
> > > > > > > place.
> > > > > > >
> > > > > > > There is also nothing in the zookeeper log of interest that I can
> > > > see.
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Med venlig hilsen / Best regards
> > > > > > >
> > > > > > > *John Nielsen*
> > > > > > > Programmer
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *MCB A/S*
> > > > > > > Enghaven 15
> > > > > > > DK-7500 Holstebro
> > > > > > >
> > > > > > > Kundeservice: +45 9610 2824
> > > > > > > post@mcb.dk
> > > > > > > www.mcb.dk
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 

Re: Strange data-loss problem on one of our cores

Posted by Mark Miller <ma...@gmail.com>.
On Dec 14, 2012, at 7:09 AM, John Nielsen <jn...@mcb.dk> wrote:

>  Is there a
> parameter I can add to the GET requests that will lock the request to a
> specific node in the cluster, treating the server receiving the request as
> a standalone server as opposed to a member of a cluster?

The param dist=false will do this.

- Mark

Re: Strange data-loss problem on one of our cores

Posted by John Nielsen <jn...@mcb.dk>.
I'm building a simple tool which will help us monitor the solr cores for
this problem. Basically it does a q=*:* on both servers on each cores and
compares numFound of each result. Problem is that since this is a cloud
setup, i can't be sure which server gets me the result. Is there a
parameter I can add to the GET requests that will lock the request to a
specific node in the cluster, treating the server receiving the request as
a standalone server as opposed to a member of a cluster?

I tried googeling it without luck.



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
post@mcb.dk
www.mcb.dk



On Fri, Dec 14, 2012 at 12:36 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> We did not solve it but reindexing can remedy the problem.
>
> -----Original message-----
> > From:John Nielsen <jn...@mcb.dk>
> > Sent: Fri 14-Dec-2012 12:31
> > To: solr-user@lucene.apache.org
> > Subject: Re: Strange data-loss problem on one of our cores
> >
> > How did you solve the problem?
> >
> >
> > --
> > Med venlig hilsen / Best regards
> >
> > *John Nielsen*
> > Programmer
> >
> >
> >
> > *MCB A/S*
> > Enghaven 15
> > DK-7500 Holstebro
> >
> > Kundeservice: +45 9610 2824
> > post@mcb.dk
> > www.mcb.dk
> >
> >
> >
> > On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
> > <ma...@openindex.io>wrote:
> >
> > > FYI, we observe the same issue, after some time (days, months) a
> cluster
> > > running an older trunk version has at least two shards where the
> leader and
> > > the replica do not contain the same number of records. No recovery is
> > > attempted, it seems it thinks everything is alright. Also, one core of
> one
> > > of the unsynced shards waits forever loading
> > > /replication?command=detail&wt=json, other cores load it in a few ms.
> Both
> > > cores of another unsynced shard does not show this problem.
> > >
> > > -----Original message-----
> > > > From:John Nielsen <jn...@mcb.dk>
> > > > Sent: Fri 14-Dec-2012 11:50
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Strange data-loss problem on one of our cores
> > > >
> > > > I did a manual commit, and we are still missing docs, so it doesn't
> look
> > > > like the search race condition you mention.
> > > >
> > > > My boss wasn't happy when i mentioned that I wanted to try out
> unreleased
> > > > code. Ill get him won over though and return with my findings. It
> will
> > > > probably be some time next week.
> > > >
> > > > Thanks for your help.
> > > >
> > > >
> > > > --
> > > > Med venlig hilsen / Best regards
> > > >
> > > > *John Nielsen*
> > > > Programmer
> > > >
> > > >
> > > >
> > > > *MCB A/S*
> > > > Enghaven 15
> > > > DK-7500 Holstebro
> > > >
> > > > Kundeservice: +45 9610 2824
> > > > post@mcb.dk
> > > > www.mcb.dk
> > > >
> > > >
> > > >
> > > > On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <ma...@gmail.com>
> > > wrote:
> > > >
> > > > > Couple things to start:
> > > > >
> > > > > By default SolrCloud distributes updates a doc at a time. So if you
> > > have 1
> > > > > shard, whatever node you index too, it will send updates to the
> other.
> > > > > Replication is only used for recovery, not distributing data. So
> for
> > > some
> > > > > reason, there is an IOException when it tries to forward.
> > > > >
> > > > > The other issue is not something that Ive seen reported. Can/did
> you
> > > try
> > > > > and do another hard commit to make sure you had the latest search
> open
> > > when
> > > > > checking the # of docs on each node? There was previously a race
> around
> > > > > commit that could cause some issues around expected visibility.
> > > > >
> > > > > If you are able to, you might try out a nightly build - 4.1 will be
> > > ready
> > > > > very soon and has numerous bug fixes for SolrCloud.
> > > > >
> > > > > - Mark
> > > > >
> > > > > On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We are seeing a strange problem on our 2-node solr4 cluster. This
> > > problem
> > > > > > has resultet in data loss.
> > > > > >
> > > > > > We have two servers, varnish01 and varnish02. Zookeeper is
> running on
> > > > > > varnish02, but in a separate jvm.
> > > > > >
> > > > > > We index directly to varnish02 and we read from varnish01. Data
> is
> > > thus
> > > > > > replicated from varnish02 to varnish01.
> > > > > >
> > > > > > I found this in the varnish01 log:
> > > > > >
> > > > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=42
> > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=41
> > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=33
> > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=33
> > > > > > Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
> > > > > > SEVERE: shard update error StdNode:
> > > > > >
> > > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> > > > > :
> > > > > > IOException occured when talking to server at:
> > > > > > http://varnish02.lynero.net:8000/solr/default1_Norwegian
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> > > > > >    at
> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > > >    at
> > > > > >
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > > >    at
> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > > >    at
> > > > > >
> > > > >
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > > > >    at
> > > > > >
> > > > >
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > > > >    at java.lang.Thread.run(Thread.java:636)
> > > > > > Caused by: org.apache.http.NoHttpResponseException: The target
> server
> > > > > > failed to respond
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> > > > > >    at
> > > > > >
> > > > >
> > >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> > > > > >    ... 11 more
> > > > > >
> > > > > > Dec 13, 2012 12:23:39 PM
> > > > > > org.apache.solr.update.processor.DistributedUpdateProcessor
> doFinish
> > > > > > INFO: try and ask http://varnish02.lynero.net:8000/solr to
> recover*
> > > > > >
> > > > > > It looks like it is sending updates from varnish01 to varnish02.
> I
> > > am not
> > > > > > sure for what since we only index on varnish02. Updates should
> never
> > > be
> > > > > > going from varnish01 to varnish02.
> > > > > >
> > > > > > Meanwhile on varnish02:
> > > > > >
> > > > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=16
> > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=15
> > > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=16
> > > > > > Dec 13, 2012 12:23:42 PM
> > > org.apache.solr.handler.admin.CoreAdminHandler
> > > > > > handleRequestRecoveryAction
> > > > > > INFO: It has been requested that we recover*
> > > > > > *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > >
> > > > >
> > >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > status=0 QTime=1
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Danish] webapp=/solr path=/select/
> > > > > > params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > >
> > > > >
> > >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > status=0 QTime=1
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > >
> > > > >
> > >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > status=0 QTime=1
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > >
> > > > >
> > >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > status=0 QTime=1
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=26
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=22
> > > > > > Dec 13, 2012 12:23:42 PM
> org.apache.solr.update.DefaultSolrCoreState
> > > > > > doRecovery
> > > > > > Dec 13, 2012 12:23:42 PM
> org.apache.solr.update.DefaultSolrCoreState
> > > > > > doRecovery
> > > > > > INFO: Running recovery - first canceling any ongoing recovery
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=25
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=24
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=20
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=25
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=23
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=21
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=23
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > > params={distrib.from=
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > > }
> > > > > > status=0 QTime=16
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> run
> > > > > > INFO: Starting recovery process.  core=default1_Norwegian
> > > > > > recoveringAfterStartup=false
> > > > > > Dec 13, 2012 12:23:42 PM
> org.apache.solr.common.cloud.ZkStateReader
> > > > > > updateClusterState
> > > > > > INFO: Updating cloud state from ZooKeeper...
> > > > > > Dec 13, 2012 12:23:42 PM
> > > > > > org.apache.solr.update.processor.LogUpdateProcessor finish*
> > > > > >
> > > > > > And less than a second later:
> > > > > >
> > > > > > *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > doRecovery
> > > > > > INFO: Attempting to PeerSync from
> > > > > >
> > > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> > > > > > - recoveringAfterStartup=false
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > > > INFO: PeerSync: core=default1_Norwegian url=
> > > > > > http://varnish02.lynero.net:8000/solr START replicas=[
> > > > > > http://varnish01.lynero.net:8000/solr/default1_Norwegian/]
> > > nUpdates=100
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > > > WARNING: PeerSync: core=default1_Norwegian url=
> > > > > > http://varnish02.lynero.net:8000/solr too many updates received
> > > since
> > > > > start
> > > > > > - startingUpdates no longer overlaps with our currentUpdates
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > doRecovery
> > > > > > INFO: PeerSync Recovery was not successful - trying replication.
> > > > > > core=default1_Norwegian
> > > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > doRecovery
> > > > > > INFO: Starting Replication Recovery. core=default1_Norwegian
> > > > > > Dec 13, 2012 12:23:42 PM
> > > org.apache.solr.client.solrj.impl.HttpClientUtil
> > > > > > createClient
> > > > > > INFO: Creating new http client,
> > > > > >
> > >
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > > > > > Dec 13, 2012 12:23:42 PM
> org.apache.solr.common.cloud.ZkStateReader$2
> > > > > > process
> > > > > > INFO: A cluster state change has occurred - updating...*
> > > > > >
> > > > > > State change on varnish01 at the same time:
> > > > > >
> > > > > > *Dec 13, 2012 12:23:42 PM
> > > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > > process
> > > > > > INFO: A cluster state change has occurred - updating...*
> > > > > > *
> > > > > > *And a few seconds later on varnish02, the recovery finishes:
> > > > > > *
> > > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > doRecovery
> > > > > > INFO: Replication Recovery was successful - registering as
> Active.
> > > > > > core=default1_Norwegian
> > > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > > > doRecovery
> > > > > > INFO: Finished recovery process. core=default1_Norwegian
> > > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> > > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > > >
> > > > >
> > >
> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > > >
> > > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > > >
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > > status=0 QTime=8
> > > > > > Dec 13, 2012 12:23:48 PM
> org.apache.solr.common.cloud.ZkStateReader
> > > > > > updateClusterState
> > > > > > INFO: Updating cloud state from ZooKeeper... *
> > > > > >
> > > > > > Which is picked up on varnish01:
> > > > > >
> > > > > > *Dec 13, 2012 12:23:48 PM
> > > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > > process
> > > > > > INFO: A cluster state change has occurred - updating...*
> > > > > >
> > > > > > It looks like it replicated successfully, only it didnt. The
> > > > > > default1_Norwegian core on varnish01 now has 55.071 docs and the
> same
> > > > > core
> > > > > > on varnish02 has 35.088 docs.
> > > > > >
> > > > > > I checked the log files for both JVM's and no stop-the-world GC
> were
> > > > > taking
> > > > > > place.
> > > > > >
> > > > > > There is also nothing in the zookeeper log of interest that I can
> > > see.
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Med venlig hilsen / Best regards
> > > > > >
> > > > > > *John Nielsen*
> > > > > > Programmer
> > > > > >
> > > > > >
> > > > > >
> > > > > > *MCB A/S*
> > > > > > Enghaven 15
> > > > > > DK-7500 Holstebro
> > > > > >
> > > > > > Kundeservice: +45 9610 2824
> > > > > > post@mcb.dk
> > > > > > www.mcb.dk
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Strange data-loss problem on one of our cores

Posted by Erick Erickson <er...@gmail.com>.
Thanks for letting us know, and do bring let us know if you see the problem
again.

Erick


On Tue, Dec 18, 2012 at 7:39 AM, John Nielsen <jn...@mcb.dk> wrote:

> I build a solr version from the solr-4x branch yesterday and so far am
> unable to replicate the problems i had before.
>
> I am cautiously optimistic that the problem has been resolved. If i run
> into any more problems, I'll let you all know.
>
>
> --
> Med venlig hilsen / Best regards
>
> *John Nielsen*
> Programmer
>
>
>
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
>
> Kundeservice: +45 9610 2824
> post@mcb.dk
> www.mcb.dk
>
>
>
> On Fri, Dec 14, 2012 at 7:33 PM, Markus Jelsma
> <ma...@openindex.io>wrote:
>
> > Mark, no issue has been filed. That cluster runs a check out from round
> > end of july/beginning of august. I'm in the process of including another
> > cluster in the indexing and removal of documents besides the old
> production
> > clusters. I'll start writing to that one tuesday orso.
> > If i notice a discrepancy after some time i am sure to report it. I doubt
> > i'll find it before 2013, if the problem is still there.
> >
> >
> > -----Original message-----
> > > From:Mark Miller <ma...@gmail.com>
> > > Sent: Fri 14-Dec-2012 19:05
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Strange data-loss problem on one of our cores
> > >
> > > Have you filed a JIRA issue for this that I don't remember Markus?
> > >
> > > We need to make sure this is fixed.
> > >
> > > Any idea around when the trunk version came from? Before or after 4.0?
> > >
> > > - Mark
> > >
> > > On Dec 14, 2012, at 6:36 AM, Markus Jelsma <markus.jelsma@openindex.io
> >
> > wrote:
> > >
> > > > We did not solve it but reindexing can remedy the problem.
> > > >
> > > > -----Original message-----
> > > >> From:John Nielsen <jn...@mcb.dk>
> > > >> Sent: Fri 14-Dec-2012 12:31
> > > >> To: solr-user@lucene.apache.org
> > > >> Subject: Re: Strange data-loss problem on one of our cores
> > > >>
> > > >> How did you solve the problem?
> > > >>
> > > >>
> > > >> --
> > > >> Med venlig hilsen / Best regards
> > > >>
> > > >> *John Nielsen*
> > > >> Programmer
> > > >>
> > > >>
> > > >>
> > > >> *MCB A/S*
> > > >> Enghaven 15
> > > >> DK-7500 Holstebro
> > > >>
> > > >> Kundeservice: +45 9610 2824
> > > >> post@mcb.dk
> > > >> www.mcb.dk
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
> > > >> <ma...@openindex.io>wrote:
> > > >>
> > > >>> FYI, we observe the same issue, after some time (days, months) a
> > cluster
> > > >>> running an older trunk version has at least two shards where the
> > leader and
> > > >>> the replica do not contain the same number of records. No recovery
> is
> > > >>> attempted, it seems it thinks everything is alright. Also, one core
> > of one
> > > >>> of the unsynced shards waits forever loading
> > > >>> /replication?command=detail&wt=json, other cores load it in a few
> > ms. Both
> > > >>> cores of another unsynced shard does not show this problem.
> > > >>>
> > > >>> -----Original message-----
> > > >>>> From:John Nielsen <jn...@mcb.dk>
> > > >>>> Sent: Fri 14-Dec-2012 11:50
> > > >>>> To: solr-user@lucene.apache.org
> > > >>>> Subject: Re: Strange data-loss problem on one of our cores
> > > >>>>
> > > >>>> I did a manual commit, and we are still missing docs, so it
> doesn't
> > look
> > > >>>> like the search race condition you mention.
> > > >>>>
> > > >>>> My boss wasn't happy when i mentioned that I wanted to try out
> > unreleased
> > > >>>> code. Ill get him won over though and return with my findings. It
> > will
> > > >>>> probably be some time next week.
> > > >>>>
> > > >>>> Thanks for your help.
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Med venlig hilsen / Best regards
> > > >>>>
> > > >>>> *John Nielsen*
> > > >>>> Programmer
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> *MCB A/S*
> > > >>>> Enghaven 15
> > > >>>> DK-7500 Holstebro
> > > >>>>
> > > >>>> Kundeservice: +45 9610 2824
> > > >>>> post@mcb.dk
> > > >>>> www.mcb.dk
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <
> markrmiller@gmail.com
> > >
> > > >>> wrote:
> > > >>>>
> > > >>>>> Couple things to start:
> > > >>>>>
> > > >>>>> By default SolrCloud distributes updates a doc at a time. So if
> you
> > > >>> have 1
> > > >>>>> shard, whatever node you index too, it will send updates to the
> > other.
> > > >>>>> Replication is only used for recovery, not distributing data. So
> > for
> > > >>> some
> > > >>>>> reason, there is an IOException when it tries to forward.
> > > >>>>>
> > > >>>>> The other issue is not something that Ive seen reported. Can/did
> > you
> > > >>> try
> > > >>>>> and do another hard commit to make sure you had the latest search
> > open
> > > >>> when
> > > >>>>> checking the # of docs on each node? There was previously a race
> > around
> > > >>>>> commit that could cause some issues around expected visibility.
> > > >>>>>
> > > >>>>> If you are able to, you might try out a nightly build - 4.1 will
> be
> > > >>> ready
> > > >>>>> very soon and has numerous bug fixes for SolrCloud.
> > > >>>>>
> > > >>>>> - Mark
> > > >>>>>
> > > >>>>> On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
> > > >>>>>
> > > >>>>>> Hi all,
> > > >>>>>>
> > > >>>>>> We are seeing a strange problem on our 2-node solr4 cluster.
> This
> > > >>> problem
> > > >>>>>> has resultet in data loss.
> > > >>>>>>
> > > >>>>>> We have two servers, varnish01 and varnish02. Zookeeper is
> > running on
> > > >>>>>> varnish02, but in a separate jvm.
> > > >>>>>>
> > > >>>>>> We index directly to varnish02 and we read from varnish01. Data
> is
> > > >>> thus
> > > >>>>>> replicated from varnish02 to varnish01.
> > > >>>>>>
> > > >>>>>> I found this in the varnish01 log:
> > > >>>>>>
> > > >>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=42
> > > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=41
> > > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=33
> > > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=33
> > > >>>>>> Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException
> log
> > > >>>>>> SEVERE: shard update error StdNode:
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> > > >>>>> :
> > > >>>>>> IOException occured when talking to server at:
> > > >>>>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> > > >>>>>>   at
> > > >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > >>>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > >>>>>>   at
> > > >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > >>>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > >>>>>>   at java.lang.Thread.run(Thread.java:636)
> > > >>>>>> Caused by: org.apache.http.NoHttpResponseException: The target
> > server
> > > >>>>>> failed to respond
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> > > >>>>>>   at
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> > > >>>>>>   ... 11 more
> > > >>>>>>
> > > >>>>>> Dec 13, 2012 12:23:39 PM
> > > >>>>>> org.apache.solr.update.processor.DistributedUpdateProcessor
> > doFinish
> > > >>>>>> INFO: try and ask http://varnish02.lynero.net:8000/solr to
> > recover*
> > > >>>>>>
> > > >>>>>> It looks like it is sending updates from varnish01 to
> varnish02. I
> > > >>> am not
> > > >>>>>> sure for what since we only index on varnish02. Updates should
> > never
> > > >>> be
> > > >>>>>> going from varnish01 to varnish02.
> > > >>>>>>
> > > >>>>>> Meanwhile on varnish02:
> > > >>>>>>
> > > >>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=16
> > > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=15
> > > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=16
> > > >>>>>> Dec 13, 2012 12:23:42 PM
> > > >>> org.apache.solr.handler.admin.CoreAdminHandler
> > > >>>>>> handleRequestRecoveryAction
> > > >>>>>> INFO: It has been requested that we recover*
> > > >>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > >>>>>
> > > >>>
> >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > >>>>>> status=0 QTime=1
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select/
> > > >>>>>> params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> > > >>>>>
> > > >>>
> >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > >>>>>> status=0 QTime=1
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > >>>>>
> > > >>>
> >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > >>>>>> status=0 QTime=1
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> > > >>>>>
> > > >>>
> >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > >>>>>> status=0 QTime=1
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=26
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=22
> > > >>>>>> Dec 13, 2012 12:23:42 PM
> > org.apache.solr.update.DefaultSolrCoreState
> > > >>>>>> doRecovery
> > > >>>>>> Dec 13, 2012 12:23:42 PM
> > org.apache.solr.update.DefaultSolrCoreState
> > > >>>>>> doRecovery
> > > >>>>>> INFO: Running recovery - first canceling any ongoing recovery
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=25
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=24
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=20
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=25
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=23
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=21
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=23
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > > >>>>> params={distrib.from=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > >>>>> }
> > > >>>>>> status=0 QTime=16
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > run
> > > >>>>>> INFO: Starting recovery process.  core=default1_Norwegian
> > > >>>>>> recoveringAfterStartup=false
> > > >>>>>> Dec 13, 2012 12:23:42 PM
> > org.apache.solr.common.cloud.ZkStateReader
> > > >>>>>> updateClusterState
> > > >>>>>> INFO: Updating cloud state from ZooKeeper...
> > > >>>>>> Dec 13, 2012 12:23:42 PM
> > > >>>>>> org.apache.solr.update.processor.LogUpdateProcessor finish*
> > > >>>>>>
> > > >>>>>> And less than a second later:
> > > >>>>>>
> > > >>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > >>>>> doRecovery
> > > >>>>>> INFO: Attempting to PeerSync from
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> > > >>>>>> - recoveringAfterStartup=false
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > >>>>>> INFO: PeerSync: core=default1_Norwegian url=
> > > >>>>>> http://varnish02.lynero.net:8000/solr START replicas=[
> > > >>>>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/]
> > > >>> nUpdates=100
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > >>>>>> WARNING: PeerSync: core=default1_Norwegian url=
> > > >>>>>> http://varnish02.lynero.net:8000/solr too many updates received
> > > >>> since
> > > >>>>> start
> > > >>>>>> - startingUpdates no longer overlaps with our currentUpdates
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > >>>>> doRecovery
> > > >>>>>> INFO: PeerSync Recovery was not successful - trying replication.
> > > >>>>>> core=default1_Norwegian
> > > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > >>>>> doRecovery
> > > >>>>>> INFO: Starting Replication Recovery. core=default1_Norwegian
> > > >>>>>> Dec 13, 2012 12:23:42 PM
> > > >>> org.apache.solr.client.solrj.impl.HttpClientUtil
> > > >>>>>> createClient
> > > >>>>>> INFO: Creating new http client,
> > > >>>>>>
> > > >>>
> > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > > >>>>>> Dec 13, 2012 12:23:42 PM
> > org.apache.solr.common.cloud.ZkStateReader$2
> > > >>>>>> process
> > > >>>>>> INFO: A cluster state change has occurred - updating...*
> > > >>>>>>
> > > >>>>>> State change on varnish01 at the same time:
> > > >>>>>>
> > > >>>>>> *Dec 13, 2012 12:23:42 PM
> > > >>> org.apache.solr.common.cloud.ZkStateReader$2
> > > >>>>>> process
> > > >>>>>> INFO: A cluster state change has occurred - updating...*
> > > >>>>>> *
> > > >>>>>> *And a few seconds later on varnish02, the recovery finishes:
> > > >>>>>> *
> > > >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > >>>>> doRecovery
> > > >>>>>> INFO: Replication Recovery was successful - registering as
> Active.
> > > >>>>>> core=default1_Norwegian
> > > >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > >>>>> doRecovery
> > > >>>>>> INFO: Finished recovery process. core=default1_Norwegian
> > > >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> > > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > >>>>>>
> > > >>>>>
> > > >>>
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > >>>>>
> > > >>>
> >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > >>>>>> status=0 QTime=8
> > > >>>>>> Dec 13, 2012 12:23:48 PM
> > org.apache.solr.common.cloud.ZkStateReader
> > > >>>>>> updateClusterState
> > > >>>>>> INFO: Updating cloud state from ZooKeeper... *
> > > >>>>>>
> > > >>>>>> Which is picked up on varnish01:
> > > >>>>>>
> > > >>>>>> *Dec 13, 2012 12:23:48 PM
> > > >>> org.apache.solr.common.cloud.ZkStateReader$2
> > > >>>>>> process
> > > >>>>>> INFO: A cluster state change has occurred - updating...*
> > > >>>>>>
> > > >>>>>> It looks like it replicated successfully, only it didnt. The
> > > >>>>>> default1_Norwegian core on varnish01 now has 55.071 docs and the
> > same
> > > >>>>> core
> > > >>>>>> on varnish02 has 35.088 docs.
> > > >>>>>>
> > > >>>>>> I checked the log files for both JVM's and no stop-the-world GC
> > were
> > > >>>>> taking
> > > >>>>>> place.
> > > >>>>>>
> > > >>>>>> There is also nothing in the zookeeper log of interest that I
> can
> > > >>> see.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Med venlig hilsen / Best regards
> > > >>>>>>
> > > >>>>>> *John Nielsen*
> > > >>>>>> Programmer
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> *MCB A/S*
> > > >>>>>> Enghaven 15
> > > >>>>>> DK-7500 Holstebro
> > > >>>>>>
> > > >>>>>> Kundeservice: +45 9610 2824
> > > >>>>>> post@mcb.dk
> > > >>>>>> www.mcb.dk
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: Strange data-loss problem on one of our cores

Posted by John Nielsen <jn...@mcb.dk>.
I build a solr version from the solr-4x branch yesterday and so far am
unable to replicate the problems i had before.

I am cautiously optimistic that the problem has been resolved. If i run
into any more problems, I'll let you all know.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
post@mcb.dk
www.mcb.dk



On Fri, Dec 14, 2012 at 7:33 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> Mark, no issue has been filed. That cluster runs a check out from round
> end of july/beginning of august. I'm in the process of including another
> cluster in the indexing and removal of documents besides the old production
> clusters. I'll start writing to that one tuesday orso.
> If i notice a discrepancy after some time i am sure to report it. I doubt
> i'll find it before 2013, if the problem is still there.
>
>
> -----Original message-----
> > From:Mark Miller <ma...@gmail.com>
> > Sent: Fri 14-Dec-2012 19:05
> > To: solr-user@lucene.apache.org
> > Subject: Re: Strange data-loss problem on one of our cores
> >
> > Have you filed a JIRA issue for this that I don't remember Markus?
> >
> > We need to make sure this is fixed.
> >
> > Any idea around when the trunk version came from? Before or after 4.0?
> >
> > - Mark
> >
> > On Dec 14, 2012, at 6:36 AM, Markus Jelsma <ma...@openindex.io>
> wrote:
> >
> > > We did not solve it but reindexing can remedy the problem.
> > >
> > > -----Original message-----
> > >> From:John Nielsen <jn...@mcb.dk>
> > >> Sent: Fri 14-Dec-2012 12:31
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: Strange data-loss problem on one of our cores
> > >>
> > >> How did you solve the problem?
> > >>
> > >>
> > >> --
> > >> Med venlig hilsen / Best regards
> > >>
> > >> *John Nielsen*
> > >> Programmer
> > >>
> > >>
> > >>
> > >> *MCB A/S*
> > >> Enghaven 15
> > >> DK-7500 Holstebro
> > >>
> > >> Kundeservice: +45 9610 2824
> > >> post@mcb.dk
> > >> www.mcb.dk
> > >>
> > >>
> > >>
> > >> On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
> > >> <ma...@openindex.io>wrote:
> > >>
> > >>> FYI, we observe the same issue, after some time (days, months) a
> cluster
> > >>> running an older trunk version has at least two shards where the
> leader and
> > >>> the replica do not contain the same number of records. No recovery is
> > >>> attempted, it seems it thinks everything is alright. Also, one core
> of one
> > >>> of the unsynced shards waits forever loading
> > >>> /replication?command=detail&wt=json, other cores load it in a few
> ms. Both
> > >>> cores of another unsynced shard does not show this problem.
> > >>>
> > >>> -----Original message-----
> > >>>> From:John Nielsen <jn...@mcb.dk>
> > >>>> Sent: Fri 14-Dec-2012 11:50
> > >>>> To: solr-user@lucene.apache.org
> > >>>> Subject: Re: Strange data-loss problem on one of our cores
> > >>>>
> > >>>> I did a manual commit, and we are still missing docs, so it doesn't
> look
> > >>>> like the search race condition you mention.
> > >>>>
> > >>>> My boss wasn't happy when i mentioned that I wanted to try out
> unreleased
> > >>>> code. Ill get him won over though and return with my findings. It
> will
> > >>>> probably be some time next week.
> > >>>>
> > >>>> Thanks for your help.
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Med venlig hilsen / Best regards
> > >>>>
> > >>>> *John Nielsen*
> > >>>> Programmer
> > >>>>
> > >>>>
> > >>>>
> > >>>> *MCB A/S*
> > >>>> Enghaven 15
> > >>>> DK-7500 Holstebro
> > >>>>
> > >>>> Kundeservice: +45 9610 2824
> > >>>> post@mcb.dk
> > >>>> www.mcb.dk
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <markrmiller@gmail.com
> >
> > >>> wrote:
> > >>>>
> > >>>>> Couple things to start:
> > >>>>>
> > >>>>> By default SolrCloud distributes updates a doc at a time. So if you
> > >>> have 1
> > >>>>> shard, whatever node you index too, it will send updates to the
> other.
> > >>>>> Replication is only used for recovery, not distributing data. So
> for
> > >>> some
> > >>>>> reason, there is an IOException when it tries to forward.
> > >>>>>
> > >>>>> The other issue is not something that Ive seen reported. Can/did
> you
> > >>> try
> > >>>>> and do another hard commit to make sure you had the latest search
> open
> > >>> when
> > >>>>> checking the # of docs on each node? There was previously a race
> around
> > >>>>> commit that could cause some issues around expected visibility.
> > >>>>>
> > >>>>> If you are able to, you might try out a nightly build - 4.1 will be
> > >>> ready
> > >>>>> very soon and has numerous bug fixes for SolrCloud.
> > >>>>>
> > >>>>> - Mark
> > >>>>>
> > >>>>> On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> We are seeing a strange problem on our 2-node solr4 cluster. This
> > >>> problem
> > >>>>>> has resultet in data loss.
> > >>>>>>
> > >>>>>> We have two servers, varnish01 and varnish02. Zookeeper is
> running on
> > >>>>>> varnish02, but in a separate jvm.
> > >>>>>>
> > >>>>>> We index directly to varnish02 and we read from varnish01. Data is
> > >>> thus
> > >>>>>> replicated from varnish02 to varnish01.
> > >>>>>>
> > >>>>>> I found this in the varnish01 log:
> > >>>>>>
> > >>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=42
> > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=41
> > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=33
> > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=33
> > >>>>>> Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
> > >>>>>> SEVERE: shard update error StdNode:
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> > >>>>> :
> > >>>>>> IOException occured when talking to server at:
> > >>>>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> > >>>>>>   at
> > >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>>>>   at
> > >>>>>>
> > >>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >>>>>>   at
> > >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >>>>>>   at java.lang.Thread.run(Thread.java:636)
> > >>>>>> Caused by: org.apache.http.NoHttpResponseException: The target
> server
> > >>>>>> failed to respond
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> > >>>>>>   at
> > >>>>>>
> > >>>>>
> > >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> > >>>>>>   ... 11 more
> > >>>>>>
> > >>>>>> Dec 13, 2012 12:23:39 PM
> > >>>>>> org.apache.solr.update.processor.DistributedUpdateProcessor
> doFinish
> > >>>>>> INFO: try and ask http://varnish02.lynero.net:8000/solr to
> recover*
> > >>>>>>
> > >>>>>> It looks like it is sending updates from varnish01 to varnish02. I
> > >>> am not
> > >>>>>> sure for what since we only index on varnish02. Updates should
> never
> > >>> be
> > >>>>>> going from varnish01 to varnish02.
> > >>>>>>
> > >>>>>> Meanwhile on varnish02:
> > >>>>>>
> > >>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=16
> > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=15
> > >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=16
> > >>>>>> Dec 13, 2012 12:23:42 PM
> > >>> org.apache.solr.handler.admin.CoreAdminHandler
> > >>>>>> handleRequestRecoveryAction
> > >>>>>> INFO: It has been requested that we recover*
> > >>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > >>>>>>
> > >>>>>
> > >>>
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > >>>>>>
> > >>>>>
> > >>>
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > >>>>>
> > >>>
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > >>>>>> status=0 QTime=1
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select/
> > >>>>>> params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > >>>>>>
> > >>>>>
> > >>>
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > >>>>>>
> > >>>>>
> > >>>
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> > >>>>>
> > >>>
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > >>>>>> status=0 QTime=1
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > >>>>>>
> > >>>>>
> > >>>
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > >>>>>>
> > >>>>>
> > >>>
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > >>>>>
> > >>>
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > >>>>>> status=0 QTime=1
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > >>>>>>
> > >>>>>
> > >>>
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > >>>>>>
> > >>>>>
> > >>>
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> > >>>>>
> > >>>
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > >>>>>> status=0 QTime=1
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=26
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=22
> > >>>>>> Dec 13, 2012 12:23:42 PM
> org.apache.solr.update.DefaultSolrCoreState
> > >>>>>> doRecovery
> > >>>>>> Dec 13, 2012 12:23:42 PM
> org.apache.solr.update.DefaultSolrCoreState
> > >>>>>> doRecovery
> > >>>>>> INFO: Running recovery - first canceling any ongoing recovery
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=25
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=24
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=20
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=25
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=23
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=21
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=23
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> > >>>>> params={distrib.from=
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > >>>>> }
> > >>>>>> status=0 QTime=16
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> run
> > >>>>>> INFO: Starting recovery process.  core=default1_Norwegian
> > >>>>>> recoveringAfterStartup=false
> > >>>>>> Dec 13, 2012 12:23:42 PM
> org.apache.solr.common.cloud.ZkStateReader
> > >>>>>> updateClusterState
> > >>>>>> INFO: Updating cloud state from ZooKeeper...
> > >>>>>> Dec 13, 2012 12:23:42 PM
> > >>>>>> org.apache.solr.update.processor.LogUpdateProcessor finish*
> > >>>>>>
> > >>>>>> And less than a second later:
> > >>>>>>
> > >>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > >>>>> doRecovery
> > >>>>>> INFO: Attempting to PeerSync from
> > >>>>>>
> > >>>>>
> > >>>
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> > >>>>>> - recoveringAfterStartup=false
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > >>>>>> INFO: PeerSync: core=default1_Norwegian url=
> > >>>>>> http://varnish02.lynero.net:8000/solr START replicas=[
> > >>>>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/]
> > >>> nUpdates=100
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > >>>>>> WARNING: PeerSync: core=default1_Norwegian url=
> > >>>>>> http://varnish02.lynero.net:8000/solr too many updates received
> > >>> since
> > >>>>> start
> > >>>>>> - startingUpdates no longer overlaps with our currentUpdates
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > >>>>> doRecovery
> > >>>>>> INFO: PeerSync Recovery was not successful - trying replication.
> > >>>>>> core=default1_Norwegian
> > >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > >>>>> doRecovery
> > >>>>>> INFO: Starting Replication Recovery. core=default1_Norwegian
> > >>>>>> Dec 13, 2012 12:23:42 PM
> > >>> org.apache.solr.client.solrj.impl.HttpClientUtil
> > >>>>>> createClient
> > >>>>>> INFO: Creating new http client,
> > >>>>>>
> > >>>
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > >>>>>> Dec 13, 2012 12:23:42 PM
> org.apache.solr.common.cloud.ZkStateReader$2
> > >>>>>> process
> > >>>>>> INFO: A cluster state change has occurred - updating...*
> > >>>>>>
> > >>>>>> State change on varnish01 at the same time:
> > >>>>>>
> > >>>>>> *Dec 13, 2012 12:23:42 PM
> > >>> org.apache.solr.common.cloud.ZkStateReader$2
> > >>>>>> process
> > >>>>>> INFO: A cluster state change has occurred - updating...*
> > >>>>>> *
> > >>>>>> *And a few seconds later on varnish02, the recovery finishes:
> > >>>>>> *
> > >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > >>>>> doRecovery
> > >>>>>> INFO: Replication Recovery was successful - registering as Active.
> > >>>>>> core=default1_Norwegian
> > >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > >>>>> doRecovery
> > >>>>>> INFO: Finished recovery process. core=default1_Norwegian
> > >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> > >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> > >>>>>>
> > >>>>>
> > >>>
> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > >>>>>>
> > >>>>>
> > >>>
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > >>>>>
> > >>>
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > >>>>>> status=0 QTime=8
> > >>>>>> Dec 13, 2012 12:23:48 PM
> org.apache.solr.common.cloud.ZkStateReader
> > >>>>>> updateClusterState
> > >>>>>> INFO: Updating cloud state from ZooKeeper... *
> > >>>>>>
> > >>>>>> Which is picked up on varnish01:
> > >>>>>>
> > >>>>>> *Dec 13, 2012 12:23:48 PM
> > >>> org.apache.solr.common.cloud.ZkStateReader$2
> > >>>>>> process
> > >>>>>> INFO: A cluster state change has occurred - updating...*
> > >>>>>>
> > >>>>>> It looks like it replicated successfully, only it didnt. The
> > >>>>>> default1_Norwegian core on varnish01 now has 55.071 docs and the
> same
> > >>>>> core
> > >>>>>> on varnish02 has 35.088 docs.
> > >>>>>>
> > >>>>>> I checked the log files for both JVM's and no stop-the-world GC
> were
> > >>>>> taking
> > >>>>>> place.
> > >>>>>>
> > >>>>>> There is also nothing in the zookeeper log of interest that I can
> > >>> see.
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Med venlig hilsen / Best regards
> > >>>>>>
> > >>>>>> *John Nielsen*
> > >>>>>> Programmer
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> *MCB A/S*
> > >>>>>> Enghaven 15
> > >>>>>> DK-7500 Holstebro
> > >>>>>>
> > >>>>>> Kundeservice: +45 9610 2824
> > >>>>>> post@mcb.dk
> > >>>>>> www.mcb.dk
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

RE: Strange data-loss problem on one of our cores

Posted by Markus Jelsma <ma...@openindex.io>.
Mark, no issue has been filed. That cluster runs a check out from round end of july/beginning of august. I'm in the process of including another cluster in the indexing and removal of documents besides the old production clusters. I'll start writing to that one tuesday orso.
If i notice a discrepancy after some time i am sure to report it. I doubt i'll find it before 2013, if the problem is still there.

 
-----Original message-----
> From:Mark Miller <ma...@gmail.com>
> Sent: Fri 14-Dec-2012 19:05
> To: solr-user@lucene.apache.org
> Subject: Re: Strange data-loss problem on one of our cores
> 
> Have you filed a JIRA issue for this that I don't remember Markus?
> 
> We need to make sure this is fixed.
> 
> Any idea around when the trunk version came from? Before or after 4.0?
> 
> - Mark
> 
> On Dec 14, 2012, at 6:36 AM, Markus Jelsma <ma...@openindex.io> wrote:
> 
> > We did not solve it but reindexing can remedy the problem. 
> > 
> > -----Original message-----
> >> From:John Nielsen <jn...@mcb.dk>
> >> Sent: Fri 14-Dec-2012 12:31
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Strange data-loss problem on one of our cores
> >> 
> >> How did you solve the problem?
> >> 
> >> 
> >> -- 
> >> Med venlig hilsen / Best regards
> >> 
> >> *John Nielsen*
> >> Programmer
> >> 
> >> 
> >> 
> >> *MCB A/S*
> >> Enghaven 15
> >> DK-7500 Holstebro
> >> 
> >> Kundeservice: +45 9610 2824
> >> post@mcb.dk
> >> www.mcb.dk
> >> 
> >> 
> >> 
> >> On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
> >> <ma...@openindex.io>wrote:
> >> 
> >>> FYI, we observe the same issue, after some time (days, months) a cluster
> >>> running an older trunk version has at least two shards where the leader and
> >>> the replica do not contain the same number of records. No recovery is
> >>> attempted, it seems it thinks everything is alright. Also, one core of one
> >>> of the unsynced shards waits forever loading
> >>> /replication?command=detail&wt=json, other cores load it in a few ms. Both
> >>> cores of another unsynced shard does not show this problem.
> >>> 
> >>> -----Original message-----
> >>>> From:John Nielsen <jn...@mcb.dk>
> >>>> Sent: Fri 14-Dec-2012 11:50
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: Re: Strange data-loss problem on one of our cores
> >>>> 
> >>>> I did a manual commit, and we are still missing docs, so it doesn't look
> >>>> like the search race condition you mention.
> >>>> 
> >>>> My boss wasn't happy when i mentioned that I wanted to try out unreleased
> >>>> code. Ill get him won over though and return with my findings. It will
> >>>> probably be some time next week.
> >>>> 
> >>>> Thanks for your help.
> >>>> 
> >>>> 
> >>>> --
> >>>> Med venlig hilsen / Best regards
> >>>> 
> >>>> *John Nielsen*
> >>>> Programmer
> >>>> 
> >>>> 
> >>>> 
> >>>> *MCB A/S*
> >>>> Enghaven 15
> >>>> DK-7500 Holstebro
> >>>> 
> >>>> Kundeservice: +45 9610 2824
> >>>> post@mcb.dk
> >>>> www.mcb.dk
> >>>> 
> >>>> 
> >>>> 
> >>>> On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <ma...@gmail.com>
> >>> wrote:
> >>>> 
> >>>>> Couple things to start:
> >>>>> 
> >>>>> By default SolrCloud distributes updates a doc at a time. So if you
> >>> have 1
> >>>>> shard, whatever node you index too, it will send updates to the other.
> >>>>> Replication is only used for recovery, not distributing data. So for
> >>> some
> >>>>> reason, there is an IOException when it tries to forward.
> >>>>> 
> >>>>> The other issue is not something that Ive seen reported. Can/did you
> >>> try
> >>>>> and do another hard commit to make sure you had the latest search open
> >>> when
> >>>>> checking the # of docs on each node? There was previously a race around
> >>>>> commit that could cause some issues around expected visibility.
> >>>>> 
> >>>>> If you are able to, you might try out a nightly build - 4.1 will be
> >>> ready
> >>>>> very soon and has numerous bug fixes for SolrCloud.
> >>>>> 
> >>>>> - Mark
> >>>>> 
> >>>>> On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
> >>>>> 
> >>>>>> Hi all,
> >>>>>> 
> >>>>>> We are seeing a strange problem on our 2-node solr4 cluster. This
> >>> problem
> >>>>>> has resultet in data loss.
> >>>>>> 
> >>>>>> We have two servers, varnish01 and varnish02. Zookeeper is running on
> >>>>>> varnish02, but in a separate jvm.
> >>>>>> 
> >>>>>> We index directly to varnish02 and we read from varnish01. Data is
> >>> thus
> >>>>>> replicated from varnish02 to varnish01.
> >>>>>> 
> >>>>>> I found this in the varnish01 log:
> >>>>>> 
> >>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=42
> >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=41
> >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=33
> >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=33
> >>>>>> Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
> >>>>>> SEVERE: shard update error StdNode:
> >>>>>> 
> >>>>> 
> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> >>>>> :
> >>>>>> IOException occured when talking to server at:
> >>>>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> >>>>>>   at
> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>>>>   at
> >>>>>> 
> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>>>>>   at
> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>>>>   at java.lang.Thread.run(Thread.java:636)
> >>>>>> Caused by: org.apache.http.NoHttpResponseException: The target server
> >>>>>> failed to respond
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> >>>>>>   at
> >>>>>> 
> >>>>> 
> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> >>>>>>   ... 11 more
> >>>>>> 
> >>>>>> Dec 13, 2012 12:23:39 PM
> >>>>>> org.apache.solr.update.processor.DistributedUpdateProcessor doFinish
> >>>>>> INFO: try and ask http://varnish02.lynero.net:8000/solr to recover*
> >>>>>> 
> >>>>>> It looks like it is sending updates from varnish01 to varnish02. I
> >>> am not
> >>>>>> sure for what since we only index on varnish02. Updates should never
> >>> be
> >>>>>> going from varnish01 to varnish02.
> >>>>>> 
> >>>>>> Meanwhile on varnish02:
> >>>>>> 
> >>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=16
> >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=15
> >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=16
> >>>>>> Dec 13, 2012 12:23:42 PM
> >>> org.apache.solr.handler.admin.CoreAdminHandler
> >>>>>> handleRequestRecoveryAction
> >>>>>> INFO: It has been requested that we recover*
> >>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> >>>>>> 
> >>>>> 
> >>> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> >>>>>> 
> >>>>> 
> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> >>>>> 
> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> >>>>>> status=0 QTime=1
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Danish] webapp=/solr path=/select/
> >>>>>> params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> >>>>>> 
> >>>>> 
> >>> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> >>>>>> 
> >>>>> 
> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> >>>>> 
> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> >>>>>> status=0 QTime=1
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> >>>>>> 
> >>>>> 
> >>> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> >>>>>> 
> >>>>> 
> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> >>>>> 
> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> >>>>>> status=0 QTime=1
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> >>>>>> 
> >>>>> 
> >>> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> >>>>>> 
> >>>>> 
> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> >>>>> 
> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> >>>>>> status=0 QTime=1
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=26
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=22
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> >>>>>> doRecovery
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> >>>>>> doRecovery
> >>>>>> INFO: Running recovery - first canceling any ongoing recovery
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=25
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=24
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=20
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=25
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=23
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=21
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=23
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
> >>>>> params={distrib.from=
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> >>>>> }
> >>>>>> status=0 QTime=16
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run
> >>>>>> INFO: Starting recovery process.  core=default1_Norwegian
> >>>>>> recoveringAfterStartup=false
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader
> >>>>>> updateClusterState
> >>>>>> INFO: Updating cloud state from ZooKeeper...
> >>>>>> Dec 13, 2012 12:23:42 PM
> >>>>>> org.apache.solr.update.processor.LogUpdateProcessor finish*
> >>>>>> 
> >>>>>> And less than a second later:
> >>>>>> 
> >>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> >>>>> doRecovery
> >>>>>> INFO: Attempting to PeerSync from
> >>>>>> 
> >>>>> 
> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> >>>>>> - recoveringAfterStartup=false
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> >>>>>> INFO: PeerSync: core=default1_Norwegian url=
> >>>>>> http://varnish02.lynero.net:8000/solr START replicas=[
> >>>>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/]
> >>> nUpdates=100
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> >>>>>> WARNING: PeerSync: core=default1_Norwegian url=
> >>>>>> http://varnish02.lynero.net:8000/solr too many updates received
> >>> since
> >>>>> start
> >>>>>> - startingUpdates no longer overlaps with our currentUpdates
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> >>>>> doRecovery
> >>>>>> INFO: PeerSync Recovery was not successful - trying replication.
> >>>>>> core=default1_Norwegian
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> >>>>> doRecovery
> >>>>>> INFO: Starting Replication Recovery. core=default1_Norwegian
> >>>>>> Dec 13, 2012 12:23:42 PM
> >>> org.apache.solr.client.solrj.impl.HttpClientUtil
> >>>>>> createClient
> >>>>>> INFO: Creating new http client,
> >>>>>> 
> >>> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
> >>>>>> process
> >>>>>> INFO: A cluster state change has occurred - updating...*
> >>>>>> 
> >>>>>> State change on varnish01 at the same time:
> >>>>>> 
> >>>>>> *Dec 13, 2012 12:23:42 PM
> >>> org.apache.solr.common.cloud.ZkStateReader$2
> >>>>>> process
> >>>>>> INFO: A cluster state change has occurred - updating...*
> >>>>>> *
> >>>>>> *And a few seconds later on varnish02, the recovery finishes:
> >>>>>> *
> >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> >>>>> doRecovery
> >>>>>> INFO: Replication Recovery was successful - registering as Active.
> >>>>>> core=default1_Norwegian
> >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> >>>>> doRecovery
> >>>>>> INFO: Finished recovery process. core=default1_Norwegian
> >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> >>>>>> INFO: [default1_Danish] webapp=/solr path=/select
> >>>>>> 
> >>>>> 
> >>> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> >>>>>> 
> >>>>> 
> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> >>>>> 
> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> >>>>>> status=0 QTime=8
> >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
> >>>>>> updateClusterState
> >>>>>> INFO: Updating cloud state from ZooKeeper... *
> >>>>>> 
> >>>>>> Which is picked up on varnish01:
> >>>>>> 
> >>>>>> *Dec 13, 2012 12:23:48 PM
> >>> org.apache.solr.common.cloud.ZkStateReader$2
> >>>>>> process
> >>>>>> INFO: A cluster state change has occurred - updating...*
> >>>>>> 
> >>>>>> It looks like it replicated successfully, only it didnt. The
> >>>>>> default1_Norwegian core on varnish01 now has 55.071 docs and the same
> >>>>> core
> >>>>>> on varnish02 has 35.088 docs.
> >>>>>> 
> >>>>>> I checked the log files for both JVM's and no stop-the-world GC were
> >>>>> taking
> >>>>>> place.
> >>>>>> 
> >>>>>> There is also nothing in the zookeeper log of interest that I can
> >>> see.
> >>>>>> 
> >>>>>> 
> >>>>>> --
> >>>>>> Med venlig hilsen / Best regards
> >>>>>> 
> >>>>>> *John Nielsen*
> >>>>>> Programmer
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> *MCB A/S*
> >>>>>> Enghaven 15
> >>>>>> DK-7500 Holstebro
> >>>>>> 
> >>>>>> Kundeservice: +45 9610 2824
> >>>>>> post@mcb.dk
> >>>>>> www.mcb.dk
> >>>>> 
> >>>>> 
> >>>> 
> >>> 
> >> 
> 
> 

Re: Strange data-loss problem on one of our cores

Posted by Mark Miller <ma...@gmail.com>.
Have you filed a JIRA issue for this that I don't remember Markus?

We need to make sure this is fixed.

Any idea around when the trunk version came from? Before or after 4.0?

- Mark

On Dec 14, 2012, at 6:36 AM, Markus Jelsma <ma...@openindex.io> wrote:

> We did not solve it but reindexing can remedy the problem. 
> 
> -----Original message-----
>> From:John Nielsen <jn...@mcb.dk>
>> Sent: Fri 14-Dec-2012 12:31
>> To: solr-user@lucene.apache.org
>> Subject: Re: Strange data-loss problem on one of our cores
>> 
>> How did you solve the problem?
>> 
>> 
>> -- 
>> Med venlig hilsen / Best regards
>> 
>> *John Nielsen*
>> Programmer
>> 
>> 
>> 
>> *MCB A/S*
>> Enghaven 15
>> DK-7500 Holstebro
>> 
>> Kundeservice: +45 9610 2824
>> post@mcb.dk
>> www.mcb.dk
>> 
>> 
>> 
>> On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
>> <ma...@openindex.io>wrote:
>> 
>>> FYI, we observe the same issue, after some time (days, months) a cluster
>>> running an older trunk version has at least two shards where the leader and
>>> the replica do not contain the same number of records. No recovery is
>>> attempted, it seems it thinks everything is alright. Also, one core of one
>>> of the unsynced shards waits forever loading
>>> /replication?command=detail&wt=json, other cores load it in a few ms. Both
>>> cores of another unsynced shard does not show this problem.
>>> 
>>> -----Original message-----
>>>> From:John Nielsen <jn...@mcb.dk>
>>>> Sent: Fri 14-Dec-2012 11:50
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Strange data-loss problem on one of our cores
>>>> 
>>>> I did a manual commit, and we are still missing docs, so it doesn't look
>>>> like the search race condition you mention.
>>>> 
>>>> My boss wasn't happy when i mentioned that I wanted to try out unreleased
>>>> code. Ill get him won over though and return with my findings. It will
>>>> probably be some time next week.
>>>> 
>>>> Thanks for your help.
>>>> 
>>>> 
>>>> --
>>>> Med venlig hilsen / Best regards
>>>> 
>>>> *John Nielsen*
>>>> Programmer
>>>> 
>>>> 
>>>> 
>>>> *MCB A/S*
>>>> Enghaven 15
>>>> DK-7500 Holstebro
>>>> 
>>>> Kundeservice: +45 9610 2824
>>>> post@mcb.dk
>>>> www.mcb.dk
>>>> 
>>>> 
>>>> 
>>>> On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <ma...@gmail.com>
>>> wrote:
>>>> 
>>>>> Couple things to start:
>>>>> 
>>>>> By default SolrCloud distributes updates a doc at a time. So if you
>>> have 1
>>>>> shard, whatever node you index too, it will send updates to the other.
>>>>> Replication is only used for recovery, not distributing data. So for
>>> some
>>>>> reason, there is an IOException when it tries to forward.
>>>>> 
>>>>> The other issue is not something that Ive seen reported. Can/did you
>>> try
>>>>> and do another hard commit to make sure you had the latest search open
>>> when
>>>>> checking the # of docs on each node? There was previously a race around
>>>>> commit that could cause some issues around expected visibility.
>>>>> 
>>>>> If you are able to, you might try out a nightly build - 4.1 will be
>>> ready
>>>>> very soon and has numerous bug fixes for SolrCloud.
>>>>> 
>>>>> - Mark
>>>>> 
>>>>> On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> We are seeing a strange problem on our 2-node solr4 cluster. This
>>> problem
>>>>>> has resultet in data loss.
>>>>>> 
>>>>>> We have two servers, varnish01 and varnish02. Zookeeper is running on
>>>>>> varnish02, but in a separate jvm.
>>>>>> 
>>>>>> We index directly to varnish02 and we read from varnish01. Data is
>>> thus
>>>>>> replicated from varnish02 to varnish01.
>>>>>> 
>>>>>> I found this in the varnish01 log:
>>>>>> 
>>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=42
>>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=41
>>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=33
>>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=33
>>>>>> Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
>>>>>> SEVERE: shard update error StdNode:
>>>>>> 
>>>>> 
>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
>>>>> :
>>>>>> IOException occured when talking to server at:
>>>>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
>>>>>>   at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>>   at
>>>>>> 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>>   at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>>>>   at
>>>>>> 
>>>>> 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>>>   at
>>>>>> 
>>>>> 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>>>   at java.lang.Thread.run(Thread.java:636)
>>>>>> Caused by: org.apache.http.NoHttpResponseException: The target server
>>>>>> failed to respond
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
>>>>>>   at
>>>>>> 
>>>>> 
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>>>>>>   ... 11 more
>>>>>> 
>>>>>> Dec 13, 2012 12:23:39 PM
>>>>>> org.apache.solr.update.processor.DistributedUpdateProcessor doFinish
>>>>>> INFO: try and ask http://varnish02.lynero.net:8000/solr to recover*
>>>>>> 
>>>>>> It looks like it is sending updates from varnish01 to varnish02. I
>>> am not
>>>>>> sure for what since we only index on varnish02. Updates should never
>>> be
>>>>>> going from varnish01 to varnish02.
>>>>>> 
>>>>>> Meanwhile on varnish02:
>>>>>> 
>>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=16
>>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=15
>>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=16
>>>>>> Dec 13, 2012 12:23:42 PM
>>> org.apache.solr.handler.admin.CoreAdminHandler
>>>>>> handleRequestRecoveryAction
>>>>>> INFO: It has been requested that we recover*
>>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Danish] webapp=/solr path=/select
>>>>>> 
>>>>> 
>>> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
>>>>>> 
>>>>> 
>>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
>>>>> 
>>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
>>>>>> status=0 QTime=1
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Danish] webapp=/solr path=/select/
>>>>>> params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Danish] webapp=/solr path=/select
>>>>>> 
>>>>> 
>>> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
>>>>>> 
>>>>> 
>>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
>>>>> 
>>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
>>>>>> status=0 QTime=1
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Danish] webapp=/solr path=/select
>>>>>> 
>>>>> 
>>> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
>>>>>> 
>>>>> 
>>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
>>>>> 
>>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
>>>>>> status=0 QTime=1
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Danish] webapp=/solr path=/select
>>>>>> 
>>>>> 
>>> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
>>>>>> 
>>>>> 
>>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
>>>>> 
>>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
>>>>>> status=0 QTime=1
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=26
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=22
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
>>>>>> doRecovery
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
>>>>>> doRecovery
>>>>>> INFO: Running recovery - first canceling any ongoing recovery
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=25
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=24
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=20
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=25
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=23
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=21
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=23
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update
>>>>> params={distrib.from=
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
>>>>> }
>>>>>> status=0 QTime=16
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run
>>>>>> INFO: Starting recovery process.  core=default1_Norwegian
>>>>>> recoveringAfterStartup=false
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader
>>>>>> updateClusterState
>>>>>> INFO: Updating cloud state from ZooKeeper...
>>>>>> Dec 13, 2012 12:23:42 PM
>>>>>> org.apache.solr.update.processor.LogUpdateProcessor finish*
>>>>>> 
>>>>>> And less than a second later:
>>>>>> 
>>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
>>>>> doRecovery
>>>>>> INFO: Attempting to PeerSync from
>>>>>> 
>>>>> 
>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
>>>>>> - recoveringAfterStartup=false
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
>>>>>> INFO: PeerSync: core=default1_Norwegian url=
>>>>>> http://varnish02.lynero.net:8000/solr START replicas=[
>>>>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/]
>>> nUpdates=100
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
>>>>>> WARNING: PeerSync: core=default1_Norwegian url=
>>>>>> http://varnish02.lynero.net:8000/solr too many updates received
>>> since
>>>>> start
>>>>>> - startingUpdates no longer overlaps with our currentUpdates
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
>>>>> doRecovery
>>>>>> INFO: PeerSync Recovery was not successful - trying replication.
>>>>>> core=default1_Norwegian
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
>>>>> doRecovery
>>>>>> INFO: Starting Replication Recovery. core=default1_Norwegian
>>>>>> Dec 13, 2012 12:23:42 PM
>>> org.apache.solr.client.solrj.impl.HttpClientUtil
>>>>>> createClient
>>>>>> INFO: Creating new http client,
>>>>>> 
>>> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
>>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
>>>>>> process
>>>>>> INFO: A cluster state change has occurred - updating...*
>>>>>> 
>>>>>> State change on varnish01 at the same time:
>>>>>> 
>>>>>> *Dec 13, 2012 12:23:42 PM
>>> org.apache.solr.common.cloud.ZkStateReader$2
>>>>>> process
>>>>>> INFO: A cluster state change has occurred - updating...*
>>>>>> *
>>>>>> *And a few seconds later on varnish02, the recovery finishes:
>>>>>> *
>>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
>>>>> doRecovery
>>>>>> INFO: Replication Recovery was successful - registering as Active.
>>>>>> core=default1_Norwegian
>>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
>>>>> doRecovery
>>>>>> INFO: Finished recovery process. core=default1_Norwegian
>>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [default1_Danish] webapp=/solr path=/select
>>>>>> 
>>>>> 
>>> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
>>>>>> 
>>>>> 
>>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
>>>>> 
>>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
>>>>>> status=0 QTime=8
>>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
>>>>>> updateClusterState
>>>>>> INFO: Updating cloud state from ZooKeeper... *
>>>>>> 
>>>>>> Which is picked up on varnish01:
>>>>>> 
>>>>>> *Dec 13, 2012 12:23:48 PM
>>> org.apache.solr.common.cloud.ZkStateReader$2
>>>>>> process
>>>>>> INFO: A cluster state change has occurred - updating...*
>>>>>> 
>>>>>> It looks like it replicated successfully, only it didnt. The
>>>>>> default1_Norwegian core on varnish01 now has 55.071 docs and the same
>>>>> core
>>>>>> on varnish02 has 35.088 docs.
>>>>>> 
>>>>>> I checked the log files for both JVM's and no stop-the-world GC were
>>>>> taking
>>>>>> place.
>>>>>> 
>>>>>> There is also nothing in the zookeeper log of interest that I can
>>> see.
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Med venlig hilsen / Best regards
>>>>>> 
>>>>>> *John Nielsen*
>>>>>> Programmer
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> *MCB A/S*
>>>>>> Enghaven 15
>>>>>> DK-7500 Holstebro
>>>>>> 
>>>>>> Kundeservice: +45 9610 2824
>>>>>> post@mcb.dk
>>>>>> www.mcb.dk
>>>>> 
>>>>> 
>>>> 
>>> 
>> 


RE: Strange data-loss problem on one of our cores

Posted by Markus Jelsma <ma...@openindex.io>.
We did not solve it but reindexing can remedy the problem. 
 
-----Original message-----
> From:John Nielsen <jn...@mcb.dk>
> Sent: Fri 14-Dec-2012 12:31
> To: solr-user@lucene.apache.org
> Subject: Re: Strange data-loss problem on one of our cores
> 
> How did you solve the problem?
> 
> 
> -- 
> Med venlig hilsen / Best regards
> 
> *John Nielsen*
> Programmer
> 
> 
> 
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
> 
> Kundeservice: +45 9610 2824
> post@mcb.dk
> www.mcb.dk
> 
> 
> 
> On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
> <ma...@openindex.io>wrote:
> 
> > FYI, we observe the same issue, after some time (days, months) a cluster
> > running an older trunk version has at least two shards where the leader and
> > the replica do not contain the same number of records. No recovery is
> > attempted, it seems it thinks everything is alright. Also, one core of one
> > of the unsynced shards waits forever loading
> > /replication?command=detail&wt=json, other cores load it in a few ms. Both
> > cores of another unsynced shard does not show this problem.
> >
> > -----Original message-----
> > > From:John Nielsen <jn...@mcb.dk>
> > > Sent: Fri 14-Dec-2012 11:50
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Strange data-loss problem on one of our cores
> > >
> > > I did a manual commit, and we are still missing docs, so it doesn't look
> > > like the search race condition you mention.
> > >
> > > My boss wasn't happy when i mentioned that I wanted to try out unreleased
> > > code. Ill get him won over though and return with my findings. It will
> > > probably be some time next week.
> > >
> > > Thanks for your help.
> > >
> > >
> > > --
> > > Med venlig hilsen / Best regards
> > >
> > > *John Nielsen*
> > > Programmer
> > >
> > >
> > >
> > > *MCB A/S*
> > > Enghaven 15
> > > DK-7500 Holstebro
> > >
> > > Kundeservice: +45 9610 2824
> > > post@mcb.dk
> > > www.mcb.dk
> > >
> > >
> > >
> > > On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <ma...@gmail.com>
> > wrote:
> > >
> > > > Couple things to start:
> > > >
> > > > By default SolrCloud distributes updates a doc at a time. So if you
> > have 1
> > > > shard, whatever node you index too, it will send updates to the other.
> > > > Replication is only used for recovery, not distributing data. So for
> > some
> > > > reason, there is an IOException when it tries to forward.
> > > >
> > > > The other issue is not something that Ive seen reported. Can/did you
> > try
> > > > and do another hard commit to make sure you had the latest search open
> > when
> > > > checking the # of docs on each node? There was previously a race around
> > > > commit that could cause some issues around expected visibility.
> > > >
> > > > If you are able to, you might try out a nightly build - 4.1 will be
> > ready
> > > > very soon and has numerous bug fixes for SolrCloud.
> > > >
> > > > - Mark
> > > >
> > > > On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > We are seeing a strange problem on our 2-node solr4 cluster. This
> > problem
> > > > > has resultet in data loss.
> > > > >
> > > > > We have two servers, varnish01 and varnish02. Zookeeper is running on
> > > > > varnish02, but in a separate jvm.
> > > > >
> > > > > We index directly to varnish02 and we read from varnish01. Data is
> > thus
> > > > > replicated from varnish02 to varnish01.
> > > > >
> > > > > I found this in the varnish01 log:
> > > > >
> > > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=42
> > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=41
> > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=33
> > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=33
> > > > > Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
> > > > > SEVERE: shard update error StdNode:
> > > > >
> > > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> > > > :
> > > > > IOException occured when talking to server at:
> > > > > http://varnish02.lynero.net:8000/solr/default1_Norwegian
> > > > >    at
> > > > >
> > > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> > > > >    at
> > > > >
> > > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> > > > >    at
> > > > >
> > > >
> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> > > > >    at
> > > > >
> > > >
> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> > > > >    at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > >    at
> > > > >
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > >    at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > >    at
> > > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > > >    at
> > > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > > >    at java.lang.Thread.run(Thread.java:636)
> > > > > Caused by: org.apache.http.NoHttpResponseException: The target server
> > > > > failed to respond
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> > > > >    at
> > > > >
> > > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> > > > >    at
> > > > >
> > > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> > > > >    ... 11 more
> > > > >
> > > > > Dec 13, 2012 12:23:39 PM
> > > > > org.apache.solr.update.processor.DistributedUpdateProcessor doFinish
> > > > > INFO: try and ask http://varnish02.lynero.net:8000/solr to recover*
> > > > >
> > > > > It looks like it is sending updates from varnish01 to varnish02. I
> > am not
> > > > > sure for what since we only index on varnish02. Updates should never
> > be
> > > > > going from varnish01 to varnish02.
> > > > >
> > > > > Meanwhile on varnish02:
> > > > >
> > > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=16
> > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=15
> > > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=16
> > > > > Dec 13, 2012 12:23:42 PM
> > org.apache.solr.handler.admin.CoreAdminHandler
> > > > > handleRequestRecoveryAction
> > > > > INFO: It has been requested that we recover*
> > > > > *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > >
> > > >
> > params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > status=0 QTime=1
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Danish] webapp=/solr path=/select/
> > > > > params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > >
> > > >
> > params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > status=0 QTime=1
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > >
> > > >
> > params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > status=0 QTime=1
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > >
> > > >
> > params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > status=0 QTime=1
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=26
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=22
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> > > > > doRecovery
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> > > > > doRecovery
> > > > > INFO: Running recovery - first canceling any ongoing recovery
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=25
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=24
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=20
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=25
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=23
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=21
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=23
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > > params={distrib.from=
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > > }
> > > > > status=0 QTime=16
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run
> > > > > INFO: Starting recovery process.  core=default1_Norwegian
> > > > > recoveringAfterStartup=false
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader
> > > > > updateClusterState
> > > > > INFO: Updating cloud state from ZooKeeper...
> > > > > Dec 13, 2012 12:23:42 PM
> > > > > org.apache.solr.update.processor.LogUpdateProcessor finish*
> > > > >
> > > > > And less than a second later:
> > > > >
> > > > > *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > > doRecovery
> > > > > INFO: Attempting to PeerSync from
> > > > >
> > > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> > > > > - recoveringAfterStartup=false
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > > INFO: PeerSync: core=default1_Norwegian url=
> > > > > http://varnish02.lynero.net:8000/solr START replicas=[
> > > > > http://varnish01.lynero.net:8000/solr/default1_Norwegian/]
> > nUpdates=100
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > > WARNING: PeerSync: core=default1_Norwegian url=
> > > > > http://varnish02.lynero.net:8000/solr too many updates received
> > since
> > > > start
> > > > > - startingUpdates no longer overlaps with our currentUpdates
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > > doRecovery
> > > > > INFO: PeerSync Recovery was not successful - trying replication.
> > > > > core=default1_Norwegian
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > > doRecovery
> > > > > INFO: Starting Replication Recovery. core=default1_Norwegian
> > > > > Dec 13, 2012 12:23:42 PM
> > org.apache.solr.client.solrj.impl.HttpClientUtil
> > > > > createClient
> > > > > INFO: Creating new http client,
> > > > >
> > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > > > > Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
> > > > > process
> > > > > INFO: A cluster state change has occurred - updating...*
> > > > >
> > > > > State change on varnish01 at the same time:
> > > > >
> > > > > *Dec 13, 2012 12:23:42 PM
> > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > process
> > > > > INFO: A cluster state change has occurred - updating...*
> > > > > *
> > > > > *And a few seconds later on varnish02, the recovery finishes:
> > > > > *
> > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > > doRecovery
> > > > > INFO: Replication Recovery was successful - registering as Active.
> > > > > core=default1_Norwegian
> > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > > doRecovery
> > > > > INFO: Finished recovery process. core=default1_Norwegian
> > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> > > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > > >
> > > >
> > params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > > >
> > > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > > >
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > > status=0 QTime=8
> > > > > Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
> > > > > updateClusterState
> > > > > INFO: Updating cloud state from ZooKeeper... *
> > > > >
> > > > > Which is picked up on varnish01:
> > > > >
> > > > > *Dec 13, 2012 12:23:48 PM
> > org.apache.solr.common.cloud.ZkStateReader$2
> > > > > process
> > > > > INFO: A cluster state change has occurred - updating...*
> > > > >
> > > > > It looks like it replicated successfully, only it didnt. The
> > > > > default1_Norwegian core on varnish01 now has 55.071 docs and the same
> > > > core
> > > > > on varnish02 has 35.088 docs.
> > > > >
> > > > > I checked the log files for both JVM's and no stop-the-world GC were
> > > > taking
> > > > > place.
> > > > >
> > > > > There is also nothing in the zookeeper log of interest that I can
> > see.
> > > > >
> > > > >
> > > > > --
> > > > > Med venlig hilsen / Best regards
> > > > >
> > > > > *John Nielsen*
> > > > > Programmer
> > > > >
> > > > >
> > > > >
> > > > > *MCB A/S*
> > > > > Enghaven 15
> > > > > DK-7500 Holstebro
> > > > >
> > > > > Kundeservice: +45 9610 2824
> > > > > post@mcb.dk
> > > > > www.mcb.dk
> > > >
> > > >
> > >
> >
> 

Re: Strange data-loss problem on one of our cores

Posted by John Nielsen <jn...@mcb.dk>.
How did you solve the problem?


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
post@mcb.dk
www.mcb.dk



On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> FYI, we observe the same issue, after some time (days, months) a cluster
> running an older trunk version has at least two shards where the leader and
> the replica do not contain the same number of records. No recovery is
> attempted, it seems it thinks everything is alright. Also, one core of one
> of the unsynced shards waits forever loading
> /replication?command=detail&wt=json, other cores load it in a few ms. Both
> cores of another unsynced shard does not show this problem.
>
> -----Original message-----
> > From:John Nielsen <jn...@mcb.dk>
> > Sent: Fri 14-Dec-2012 11:50
> > To: solr-user@lucene.apache.org
> > Subject: Re: Strange data-loss problem on one of our cores
> >
> > I did a manual commit, and we are still missing docs, so it doesn't look
> > like the search race condition you mention.
> >
> > My boss wasn't happy when i mentioned that I wanted to try out unreleased
> > code. Ill get him won over though and return with my findings. It will
> > probably be some time next week.
> >
> > Thanks for your help.
> >
> >
> > --
> > Med venlig hilsen / Best regards
> >
> > *John Nielsen*
> > Programmer
> >
> >
> >
> > *MCB A/S*
> > Enghaven 15
> > DK-7500 Holstebro
> >
> > Kundeservice: +45 9610 2824
> > post@mcb.dk
> > www.mcb.dk
> >
> >
> >
> > On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <ma...@gmail.com>
> wrote:
> >
> > > Couple things to start:
> > >
> > > By default SolrCloud distributes updates a doc at a time. So if you
> have 1
> > > shard, whatever node you index too, it will send updates to the other.
> > > Replication is only used for recovery, not distributing data. So for
> some
> > > reason, there is an IOException when it tries to forward.
> > >
> > > The other issue is not something that Ive seen reported. Can/did you
> try
> > > and do another hard commit to make sure you had the latest search open
> when
> > > checking the # of docs on each node? There was previously a race around
> > > commit that could cause some issues around expected visibility.
> > >
> > > If you are able to, you might try out a nightly build - 4.1 will be
> ready
> > > very soon and has numerous bug fixes for SolrCloud.
> > >
> > > - Mark
> > >
> > > On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
> > >
> > > > Hi all,
> > > >
> > > > We are seeing a strange problem on our 2-node solr4 cluster. This
> problem
> > > > has resultet in data loss.
> > > >
> > > > We have two servers, varnish01 and varnish02. Zookeeper is running on
> > > > varnish02, but in a separate jvm.
> > > >
> > > > We index directly to varnish02 and we read from varnish01. Data is
> thus
> > > > replicated from varnish02 to varnish01.
> > > >
> > > > I found this in the varnish01 log:
> > > >
> > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=42
> > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=41
> > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=33
> > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=33
> > > > Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
> > > > SEVERE: shard update error StdNode:
> > > >
> > >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> > > :
> > > > IOException occured when talking to server at:
> > > > http://varnish02.lynero.net:8000/solr/default1_Norwegian
> > > >    at
> > > >
> > >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> > > >    at
> > > >
> > >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> > > >    at
> > > >
> > >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> > > >    at
> > > >
> > >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> > > >    at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > >    at
> > > >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > >    at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > >    at
> > > >
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > >    at
> > > >
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > >    at java.lang.Thread.run(Thread.java:636)
> > > > Caused by: org.apache.http.NoHttpResponseException: The target server
> > > > failed to respond
> > > >    at
> > > >
> > >
> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> > > >    at
> > > >
> > >
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> > > >    at
> > > >
> > >
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> > > >    at
> > > >
> > >
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> > > >    at
> > > >
> > >
> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> > > >    at
> > > >
> > >
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> > > >    at
> > > >
> > >
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> > > >    at
> > > >
> > >
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> > > >    at
> > > >
> > >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> > > >    at
> > > >
> > >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> > > >    at
> > > >
> > >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> > > >    at
> > > >
> > >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> > > >    at
> > > >
> > >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> > > >    ... 11 more
> > > >
> > > > Dec 13, 2012 12:23:39 PM
> > > > org.apache.solr.update.processor.DistributedUpdateProcessor doFinish
> > > > INFO: try and ask http://varnish02.lynero.net:8000/solr to recover*
> > > >
> > > > It looks like it is sending updates from varnish01 to varnish02. I
> am not
> > > > sure for what since we only index on varnish02. Updates should never
> be
> > > > going from varnish01 to varnish02.
> > > >
> > > > Meanwhile on varnish02:
> > > >
> > > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=16
> > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=15
> > > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=16
> > > > Dec 13, 2012 12:23:42 PM
> org.apache.solr.handler.admin.CoreAdminHandler
> > > > handleRequestRecoveryAction
> > > > INFO: It has been requested that we recover*
> > > > *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > >
> > >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > status=0 QTime=1
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Danish] webapp=/solr path=/select/
> > > > params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > >
> > >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > status=0 QTime=1
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > >
> > >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > status=0 QTime=1
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > >
> > >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > status=0 QTime=1
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=26
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=22
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> > > > doRecovery
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> > > > doRecovery
> > > > INFO: Running recovery - first canceling any ongoing recovery
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=25
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=24
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=20
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=25
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=23
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=21
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=23
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > > params={distrib.from=
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > > }
> > > > status=0 QTime=16
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run
> > > > INFO: Starting recovery process.  core=default1_Norwegian
> > > > recoveringAfterStartup=false
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader
> > > > updateClusterState
> > > > INFO: Updating cloud state from ZooKeeper...
> > > > Dec 13, 2012 12:23:42 PM
> > > > org.apache.solr.update.processor.LogUpdateProcessor finish*
> > > >
> > > > And less than a second later:
> > > >
> > > > *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > doRecovery
> > > > INFO: Attempting to PeerSync from
> > > >
> > >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> > > > - recoveringAfterStartup=false
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > INFO: PeerSync: core=default1_Norwegian url=
> > > > http://varnish02.lynero.net:8000/solr START replicas=[
> > > > http://varnish01.lynero.net:8000/solr/default1_Norwegian/]
> nUpdates=100
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > > WARNING: PeerSync: core=default1_Norwegian url=
> > > > http://varnish02.lynero.net:8000/solr too many updates received
> since
> > > start
> > > > - startingUpdates no longer overlaps with our currentUpdates
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > doRecovery
> > > > INFO: PeerSync Recovery was not successful - trying replication.
> > > > core=default1_Norwegian
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > > doRecovery
> > > > INFO: Starting Replication Recovery. core=default1_Norwegian
> > > > Dec 13, 2012 12:23:42 PM
> org.apache.solr.client.solrj.impl.HttpClientUtil
> > > > createClient
> > > > INFO: Creating new http client,
> > > >
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > > > Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
> > > > process
> > > > INFO: A cluster state change has occurred - updating...*
> > > >
> > > > State change on varnish01 at the same time:
> > > >
> > > > *Dec 13, 2012 12:23:42 PM
> org.apache.solr.common.cloud.ZkStateReader$2
> > > > process
> > > > INFO: A cluster state change has occurred - updating...*
> > > > *
> > > > *And a few seconds later on varnish02, the recovery finishes:
> > > > *
> > > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > doRecovery
> > > > INFO: Replication Recovery was successful - registering as Active.
> > > > core=default1_Norwegian
> > > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > > doRecovery
> > > > INFO: Finished recovery process. core=default1_Norwegian
> > > > Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> > > > INFO: [default1_Danish] webapp=/solr path=/select
> > > >
> > >
> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > > >
> > >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > >
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > > status=0 QTime=8
> > > > Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
> > > > updateClusterState
> > > > INFO: Updating cloud state from ZooKeeper... *
> > > >
> > > > Which is picked up on varnish01:
> > > >
> > > > *Dec 13, 2012 12:23:48 PM
> org.apache.solr.common.cloud.ZkStateReader$2
> > > > process
> > > > INFO: A cluster state change has occurred - updating...*
> > > >
> > > > It looks like it replicated successfully, only it didnt. The
> > > > default1_Norwegian core on varnish01 now has 55.071 docs and the same
> > > core
> > > > on varnish02 has 35.088 docs.
> > > >
> > > > I checked the log files for both JVM's and no stop-the-world GC were
> > > taking
> > > > place.
> > > >
> > > > There is also nothing in the zookeeper log of interest that I can
> see.
> > > >
> > > >
> > > > --
> > > > Med venlig hilsen / Best regards
> > > >
> > > > *John Nielsen*
> > > > Programmer
> > > >
> > > >
> > > >
> > > > *MCB A/S*
> > > > Enghaven 15
> > > > DK-7500 Holstebro
> > > >
> > > > Kundeservice: +45 9610 2824
> > > > post@mcb.dk
> > > > www.mcb.dk
> > >
> > >
> >
>

RE: Strange data-loss problem on one of our cores

Posted by Markus Jelsma <ma...@openindex.io>.
FYI, we observe the same issue, after some time (days, months) a cluster running an older trunk version has at least two shards where the leader and the replica do not contain the same number of records. No recovery is attempted, it seems it thinks everything is alright. Also, one core of one of the unsynced shards waits forever loading /replication?command=detail&wt=json, other cores load it in a few ms. Both cores of another unsynced shard does not show this problem.
 
-----Original message-----
> From:John Nielsen <jn...@mcb.dk>
> Sent: Fri 14-Dec-2012 11:50
> To: solr-user@lucene.apache.org
> Subject: Re: Strange data-loss problem on one of our cores
> 
> I did a manual commit, and we are still missing docs, so it doesn't look
> like the search race condition you mention.
> 
> My boss wasn't happy when i mentioned that I wanted to try out unreleased
> code. Ill get him won over though and return with my findings. It will
> probably be some time next week.
> 
> Thanks for your help.
> 
> 
> -- 
> Med venlig hilsen / Best regards
> 
> *John Nielsen*
> Programmer
> 
> 
> 
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
> 
> Kundeservice: +45 9610 2824
> post@mcb.dk
> www.mcb.dk
> 
> 
> 
> On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <ma...@gmail.com> wrote:
> 
> > Couple things to start:
> >
> > By default SolrCloud distributes updates a doc at a time. So if you have 1
> > shard, whatever node you index too, it will send updates to the other.
> > Replication is only used for recovery, not distributing data. So for some
> > reason, there is an IOException when it tries to forward.
> >
> > The other issue is not something that Ive seen reported. Can/did you try
> > and do another hard commit to make sure you had the latest search open when
> > checking the # of docs on each node? There was previously a race around
> > commit that could cause some issues around expected visibility.
> >
> > If you are able to, you might try out a nightly build - 4.1 will be ready
> > very soon and has numerous bug fixes for SolrCloud.
> >
> > - Mark
> >
> > On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
> >
> > > Hi all,
> > >
> > > We are seeing a strange problem on our 2-node solr4 cluster. This problem
> > > has resultet in data loss.
> > >
> > > We have two servers, varnish01 and varnish02. Zookeeper is running on
> > > varnish02, but in a separate jvm.
> > >
> > > We index directly to varnish02 and we read from varnish01. Data is thus
> > > replicated from varnish02 to varnish01.
> > >
> > > I found this in the varnish01 log:
> > >
> > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=42
> > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=41
> > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=33
> > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=33
> > > Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
> > > SEVERE: shard update error StdNode:
> > >
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> > :
> > > IOException occured when talking to server at:
> > > http://varnish02.lynero.net:8000/solr/default1_Norwegian
> > >    at
> > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> > >    at
> > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> > >    at
> > >
> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> > >    at
> > >
> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> > >    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >    at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >    at
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >    at
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >    at java.lang.Thread.run(Thread.java:636)
> > > Caused by: org.apache.http.NoHttpResponseException: The target server
> > > failed to respond
> > >    at
> > >
> > org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> > >    at
> > >
> > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> > >    at
> > >
> > org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> > >    at
> > >
> > org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> > >    at
> > >
> > org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> > >    at
> > >
> > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> > >    at
> > >
> > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> > >    at
> > >
> > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> > >    at
> > >
> > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> > >    at
> > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> > >    at
> > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> > >    at
> > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> > >    at
> > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> > >    ... 11 more
> > >
> > > Dec 13, 2012 12:23:39 PM
> > > org.apache.solr.update.processor.DistributedUpdateProcessor doFinish
> > > INFO: try and ask http://varnish02.lynero.net:8000/solr to recover*
> > >
> > > It looks like it is sending updates from varnish01 to varnish02. I am not
> > > sure for what since we only index on varnish02. Updates should never be
> > > going from varnish01 to varnish02.
> > >
> > > Meanwhile on varnish02:
> > >
> > > *INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=16
> > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=15
> > > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=16
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.handler.admin.CoreAdminHandler
> > > handleRequestRecoveryAction
> > > INFO: It has been requested that we recover*
> > > *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Danish] webapp=/solr path=/select
> > >
> > params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > status=0 QTime=1
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Danish] webapp=/solr path=/select/
> > > params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Danish] webapp=/solr path=/select
> > >
> > params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > status=0 QTime=1
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Danish] webapp=/solr path=/select
> > >
> > params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > status=0 QTime=1
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Danish] webapp=/solr path=/select
> > >
> > params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > status=0 QTime=1
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=26
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=22
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> > > doRecovery
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> > > doRecovery
> > > INFO: Running recovery - first canceling any ongoing recovery
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=25
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=24
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=20
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=25
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=23
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=21
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=23
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Norwegian] webapp=/solr path=/update
> > params={distrib.from=
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> > }
> > > status=0 QTime=16
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run
> > > INFO: Starting recovery process.  core=default1_Norwegian
> > > recoveringAfterStartup=false
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader
> > > updateClusterState
> > > INFO: Updating cloud state from ZooKeeper...
> > > Dec 13, 2012 12:23:42 PM
> > > org.apache.solr.update.processor.LogUpdateProcessor finish*
> > >
> > > And less than a second later:
> > >
> > > *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > doRecovery
> > > INFO: Attempting to PeerSync from
> > >
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> > > - recoveringAfterStartup=false
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > INFO: PeerSync: core=default1_Norwegian url=
> > > http://varnish02.lynero.net:8000/solr START replicas=[
> > > http://varnish01.lynero.net:8000/solr/default1_Norwegian/] nUpdates=100
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > > WARNING: PeerSync: core=default1_Norwegian url=
> > > http://varnish02.lynero.net:8000/solr too many updates received since
> > start
> > > - startingUpdates no longer overlaps with our currentUpdates
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > doRecovery
> > > INFO: PeerSync Recovery was not successful - trying replication.
> > > core=default1_Norwegian
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> > doRecovery
> > > INFO: Starting Replication Recovery. core=default1_Norwegian
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.client.solrj.impl.HttpClientUtil
> > > createClient
> > > INFO: Creating new http client,
> > > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > > Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
> > > process
> > > INFO: A cluster state change has occurred - updating...*
> > >
> > > State change on varnish01 at the same time:
> > >
> > > *Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
> > > process
> > > INFO: A cluster state change has occurred - updating...*
> > > *
> > > *And a few seconds later on varnish02, the recovery finishes:
> > > *
> > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > doRecovery
> > > INFO: Replication Recovery was successful - registering as Active.
> > > core=default1_Norwegian
> > > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> > doRecovery
> > > INFO: Finished recovery process. core=default1_Norwegian
> > > Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> > > INFO: [default1_Danish] webapp=/solr path=/select
> > >
> > params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> > >
> > varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> > ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > > status=0 QTime=8
> > > Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
> > > updateClusterState
> > > INFO: Updating cloud state from ZooKeeper... *
> > >
> > > Which is picked up on varnish01:
> > >
> > > *Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader$2
> > > process
> > > INFO: A cluster state change has occurred - updating...*
> > >
> > > It looks like it replicated successfully, only it didnt. The
> > > default1_Norwegian core on varnish01 now has 55.071 docs and the same
> > core
> > > on varnish02 has 35.088 docs.
> > >
> > > I checked the log files for both JVM's and no stop-the-world GC were
> > taking
> > > place.
> > >
> > > There is also nothing in the zookeeper log of interest that I can see.
> > >
> > >
> > > --
> > > Med venlig hilsen / Best regards
> > >
> > > *John Nielsen*
> > > Programmer
> > >
> > >
> > >
> > > *MCB A/S*
> > > Enghaven 15
> > > DK-7500 Holstebro
> > >
> > > Kundeservice: +45 9610 2824
> > > post@mcb.dk
> > > www.mcb.dk
> >
> >
> 

Re: Strange data-loss problem on one of our cores

Posted by John Nielsen <jn...@mcb.dk>.
I did a manual commit, and we are still missing docs, so it doesn't look
like the search race condition you mention.

My boss wasn't happy when i mentioned that I wanted to try out unreleased
code. Ill get him won over though and return with my findings. It will
probably be some time next week.

Thanks for your help.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
post@mcb.dk
www.mcb.dk



On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <ma...@gmail.com> wrote:

> Couple things to start:
>
> By default SolrCloud distributes updates a doc at a time. So if you have 1
> shard, whatever node you index too, it will send updates to the other.
> Replication is only used for recovery, not distributing data. So for some
> reason, there is an IOException when it tries to forward.
>
> The other issue is not something that Ive seen reported. Can/did you try
> and do another hard commit to make sure you had the latest search open when
> checking the # of docs on each node? There was previously a race around
> commit that could cause some issues around expected visibility.
>
> If you are able to, you might try out a nightly build - 4.1 will be ready
> very soon and has numerous bug fixes for SolrCloud.
>
> - Mark
>
> On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:
>
> > Hi all,
> >
> > We are seeing a strange problem on our 2-node solr4 cluster. This problem
> > has resultet in data loss.
> >
> > We have two servers, varnish01 and varnish02. Zookeeper is running on
> > varnish02, but in a separate jvm.
> >
> > We index directly to varnish02 and we read from varnish01. Data is thus
> > replicated from varnish02 to varnish01.
> >
> > I found this in the varnish01 log:
> >
> > *INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> }
> > status=0 QTime=42
> > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> }
> > status=0 QTime=41
> > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> }
> > status=0 QTime=33
> > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2
> }
> > status=0 QTime=33
> > Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
> > SEVERE: shard update error StdNode:
> >
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException
> :
> > IOException occured when talking to server at:
> > http://varnish02.lynero.net:8000/solr/default1_Norwegian
> >    at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
> >    at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >    at
> >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
> >    at
> >
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
> >    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >    at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >    at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >    at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >    at java.lang.Thread.run(Thread.java:636)
> > Caused by: org.apache.http.NoHttpResponseException: The target server
> > failed to respond
> >    at
> >
> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
> >    at
> >
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
> >    at
> >
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
> >    at
> >
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
> >    at
> >
> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
> >    at
> >
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
> >    at
> >
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> >    at
> >
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
> >    at
> >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
> >    at
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> >    at
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> >    at
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> >    at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> >    ... 11 more
> >
> > Dec 13, 2012 12:23:39 PM
> > org.apache.solr.update.processor.DistributedUpdateProcessor doFinish
> > INFO: try and ask http://varnish02.lynero.net:8000/solr to recover*
> >
> > It looks like it is sending updates from varnish01 to varnish02. I am not
> > sure for what since we only index on varnish02. Updates should never be
> > going from varnish01 to varnish02.
> >
> > Meanwhile on varnish02:
> >
> > *INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=16
> > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=15
> > Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=16
> > Dec 13, 2012 12:23:42 PM org.apache.solr.handler.admin.CoreAdminHandler
> > handleRequestRecoveryAction
> > INFO: It has been requested that we recover*
> > *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Danish] webapp=/solr path=/select
> >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > status=0 QTime=1
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Danish] webapp=/solr path=/select/
> > params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Danish] webapp=/solr path=/select
> >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > status=0 QTime=1
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Danish] webapp=/solr path=/select
> >
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > status=0 QTime=1
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Danish] webapp=/solr path=/select
> >
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > status=0 QTime=1
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=26
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=22
> > Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> > doRecovery
> > Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> > doRecovery
> > INFO: Running recovery - first canceling any ongoing recovery
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=25
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=24
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=20
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=25
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=23
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=21
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=23
> > Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Norwegian] webapp=/solr path=/update
> params={distrib.from=
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2
> }
> > status=0 QTime=16
> > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run
> > INFO: Starting recovery process.  core=default1_Norwegian
> > recoveringAfterStartup=false
> > Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader
> > updateClusterState
> > INFO: Updating cloud state from ZooKeeper...
> > Dec 13, 2012 12:23:42 PM
> > org.apache.solr.update.processor.LogUpdateProcessor finish*
> >
> > And less than a second later:
> >
> > *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> doRecovery
> > INFO: Attempting to PeerSync from
> >
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> > - recoveringAfterStartup=false
> > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > INFO: PeerSync: core=default1_Norwegian url=
> > http://varnish02.lynero.net:8000/solr START replicas=[
> > http://varnish01.lynero.net:8000/solr/default1_Norwegian/] nUpdates=100
> > Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> > WARNING: PeerSync: core=default1_Norwegian url=
> > http://varnish02.lynero.net:8000/solr too many updates received since
> start
> > - startingUpdates no longer overlaps with our currentUpdates
> > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> doRecovery
> > INFO: PeerSync Recovery was not successful - trying replication.
> > core=default1_Norwegian
> > Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy
> doRecovery
> > INFO: Starting Replication Recovery. core=default1_Norwegian
> > Dec 13, 2012 12:23:42 PM org.apache.solr.client.solrj.impl.HttpClientUtil
> > createClient
> > INFO: Creating new http client,
> > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
> > process
> > INFO: A cluster state change has occurred - updating...*
> >
> > State change on varnish01 at the same time:
> >
> > *Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
> > process
> > INFO: A cluster state change has occurred - updating...*
> > *
> > *And a few seconds later on varnish02, the recovery finishes:
> > *
> > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> doRecovery
> > INFO: Replication Recovery was successful - registering as Active.
> > core=default1_Norwegian
> > Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy
> doRecovery
> > INFO: Finished recovery process. core=default1_Norwegian
> > Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> > INFO: [default1_Danish] webapp=/solr path=/select
> >
> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> >
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text
> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> > status=0 QTime=8
> > Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
> > updateClusterState
> > INFO: Updating cloud state from ZooKeeper... *
> >
> > Which is picked up on varnish01:
> >
> > *Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader$2
> > process
> > INFO: A cluster state change has occurred - updating...*
> >
> > It looks like it replicated successfully, only it didnt. The
> > default1_Norwegian core on varnish01 now has 55.071 docs and the same
> core
> > on varnish02 has 35.088 docs.
> >
> > I checked the log files for both JVM's and no stop-the-world GC were
> taking
> > place.
> >
> > There is also nothing in the zookeeper log of interest that I can see.
> >
> >
> > --
> > Med venlig hilsen / Best regards
> >
> > *John Nielsen*
> > Programmer
> >
> >
> >
> > *MCB A/S*
> > Enghaven 15
> > DK-7500 Holstebro
> >
> > Kundeservice: +45 9610 2824
> > post@mcb.dk
> > www.mcb.dk
>
>

Re: Strange data-loss problem on one of our cores

Posted by Mark Miller <ma...@gmail.com>.
Couple things to start:

By default SolrCloud distributes updates a doc at a time. So if you have 1 shard, whatever node you index too, it will send updates to the other. Replication is only used for recovery, not distributing data. So for some reason, there is an IOException when it tries to forward.

The other issue is not something that Ive seen reported. Can/did you try and do another hard commit to make sure you had the latest search open when checking the # of docs on each node? There was previously a race around commit that could cause some issues around expected visibility. 

If you are able to, you might try out a nightly build - 4.1 will be ready very soon and has numerous bug fixes for SolrCloud.

- Mark

On Dec 13, 2012, at 9:53 AM, John Nielsen <jn...@mcb.dk> wrote:

> Hi all,
> 
> We are seeing a strange problem on our 2-node solr4 cluster. This problem
> has resultet in data loss.
> 
> We have two servers, varnish01 and varnish02. Zookeeper is running on
> varnish02, but in a separate jvm.
> 
> We index directly to varnish02 and we read from varnish01. Data is thus
> replicated from varnish02 to varnish01.
> 
> I found this in the varnish01 log:
> 
> *INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
> status=0 QTime=42
> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
> status=0 QTime=41
> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
> status=0 QTime=33
> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
> status=0 QTime=33
> Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
> SEVERE: shard update error StdNode:
> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException:
> IOException occured when talking to server at:
> http://varnish02.lynero.net:8000/solr/default1_Norwegian
>    at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
>    at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>    at
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
>    at
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>    at java.lang.Thread.run(Thread.java:636)
> Caused by: org.apache.http.NoHttpResponseException: The target server
> failed to respond
>    at
> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
>    at
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
>    at
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
>    at
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
>    at
> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
>    at
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
>    at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>    at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
>    at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
>    at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>    at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
>    at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
>    at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>    ... 11 more
> 
> Dec 13, 2012 12:23:39 PM
> org.apache.solr.update.processor.DistributedUpdateProcessor doFinish
> INFO: try and ask http://varnish02.lynero.net:8000/solr to recover*
> 
> It looks like it is sending updates from varnish01 to varnish02. I am not
> sure for what since we only index on varnish02. Updates should never be
> going from varnish01 to varnish02.
> 
> Meanwhile on varnish02:
> 
> *INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=16
> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=15
> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=16
> Dec 13, 2012 12:23:42 PM org.apache.solr.handler.admin.CoreAdminHandler
> handleRequestRecoveryAction
> INFO: It has been requested that we recover*
> *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Danish] webapp=/solr path=/select
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> status=0 QTime=1
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Danish] webapp=/solr path=/select/
> params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Danish] webapp=/solr path=/select
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> status=0 QTime=1
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Danish] webapp=/solr path=/select
> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> status=0 QTime=1
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Danish] webapp=/solr path=/select
> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> status=0 QTime=1
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=26
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=22
> Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> doRecovery
> Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
> doRecovery
> INFO: Running recovery - first canceling any ongoing recovery
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=25
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=24
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=20
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=25
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=23
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=21
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=23
> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
> status=0 QTime=16
> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run
> INFO: Starting recovery process.  core=default1_Norwegian
> recoveringAfterStartup=false
> Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader
> updateClusterState
> INFO: Updating cloud state from ZooKeeper...
> Dec 13, 2012 12:23:42 PM
> org.apache.solr.update.processor.LogUpdateProcessor finish*
> 
> And less than a second later:
> 
> *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
> INFO: Attempting to PeerSync from
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
> - recoveringAfterStartup=false
> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> INFO: PeerSync: core=default1_Norwegian url=
> http://varnish02.lynero.net:8000/solr START replicas=[
> http://varnish01.lynero.net:8000/solr/default1_Norwegian/] nUpdates=100
> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
> WARNING: PeerSync: core=default1_Norwegian url=
> http://varnish02.lynero.net:8000/solr too many updates received since start
> - startingUpdates no longer overlaps with our currentUpdates
> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
> INFO: PeerSync Recovery was not successful - trying replication.
> core=default1_Norwegian
> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
> INFO: Starting Replication Recovery. core=default1_Norwegian
> Dec 13, 2012 12:23:42 PM org.apache.solr.client.solrj.impl.HttpClientUtil
> createClient
> INFO: Creating new http client,
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
> process
> INFO: A cluster state change has occurred - updating...*
> 
> State change on varnish01 at the same time:
> 
> *Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
> process
> INFO: A cluster state change has occurred - updating...*
> *
> *And a few seconds later on varnish02, the recovery finishes:
> *
> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
> INFO: Replication Recovery was successful - registering as Active.
> core=default1_Norwegian
> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
> INFO: Finished recovery process. core=default1_Norwegian
> Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
> INFO: [default1_Danish] webapp=/solr path=/select
> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
> status=0 QTime=8
> Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
> updateClusterState
> INFO: Updating cloud state from ZooKeeper... *
> 
> Which is picked up on varnish01:
> 
> *Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader$2
> process
> INFO: A cluster state change has occurred - updating...*
> 
> It looks like it replicated successfully, only it didnt. The
> default1_Norwegian core on varnish01 now has 55.071 docs and the same core
> on varnish02 has 35.088 docs.
> 
> I checked the log files for both JVM's and no stop-the-world GC were taking
> place.
> 
> There is also nothing in the zookeeper log of interest that I can see.
> 
> 
> -- 
> Med venlig hilsen / Best regards
> 
> *John Nielsen*
> Programmer
> 
> 
> 
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
> 
> Kundeservice: +45 9610 2824
> post@mcb.dk
> www.mcb.dk