You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kenf_nc <ke...@realestate.com> on 2011/05/17 23:23:57 UTC

Anyone having these Replication issues as well?

Is it just me or is Replication a POS?  (Solr 1.4.1, Tomcat  6.0.32)

1) I had set my pollInterval to 60 seconds but it appears to fire constantly
so I set it to 5 minutes and I see in the Tomcat logs where it fires the
replication check anywhere from 2 minutes to 4 1/2 minutes, but never
anything remotely consistent and never approaching 5 minutes. What kind of
timer is being used, sundial?

2) When it does fire it seems to do the check between slave and master
anywhere from 3 to 8 times, for a single poll interval. I have 3 slaves and
1 master, the master gets pounded by replication check queries, when it
should get 3 every 5 minutes, it gets up to 24 every couple minutes.

3) Worse of all, there is a replication.properties file on the slaves. It
constantly shows errors, but the tomcat logs on both the slaves and the
master are error free. Below is a representative sample. The timesFailed
number just keeps climbing. The one below went from 10 to 32 in about 8
minutes on the same server, and it should only attempt once every 5 minutes.

#Replication details
#Tue May 17 17:10:00 EDT 2011
replicationFailedAtList= {some long string of large numbers}
previousCycleTimeInSeconds=0
timesFailed=10
indexReplicatedAtList= {some long string of large numbers}
indexReplicatedAt=1305666600335
replicationFailedAt=1305666600335
timesIndexReplicated=10
lastCycleBytesDownloaded=0

Keep in mind, replication actually works! If I add/update a document on the
master i see it on the slaves eventually. So the errors above are especially
frustrating.

Any help on any or all of these issues would be greatly appreciated.
Thanks,
Ken


--
View this message in context: http://lucene.472066.n3.nabble.com/Anyone-having-these-Replication-issues-as-well-tp2954365p2954365.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Anyone having these Replication issues as well?

Posted by Bill Bell <bi...@gmail.com>.
Sundial? Ha ha

Bill Bell
Sent from mobile


On May 17, 2011, at 3:23 PM, kenf_nc <ke...@realestate.com> wrote:

> Is it just me or is Replication a POS?  (Solr 1.4.1, Tomcat  6.0.32)
> 
> 1) I had set my pollInterval to 60 seconds but it appears to fire constantly
> so I set it to 5 minutes and I see in the Tomcat logs where it fires the
> replication check anywhere from 2 minutes to 4 1/2 minutes, but never
> anything remotely consistent and never approaching 5 minutes. What kind of
> timer is being used, sundial?
> 
> 2) When it does fire it seems to do the check between slave and master
> anywhere from 3 to 8 times, for a single poll interval. I have 3 slaves and
> 1 master, the master gets pounded by replication check queries, when it
> should get 3 every 5 minutes, it gets up to 24 every couple minutes.
> 
> 3) Worse of all, there is a replication.properties file on the slaves. It
> constantly shows errors, but the tomcat logs on both the slaves and the
> master are error free. Below is a representative sample. The timesFailed
> number just keeps climbing. The one below went from 10 to 32 in about 8
> minutes on the same server, and it should only attempt once every 5 minutes.
> 
> #Replication details
> #Tue May 17 17:10:00 EDT 2011
> replicationFailedAtList= {some long string of large numbers}
> previousCycleTimeInSeconds=0
> timesFailed=10
> indexReplicatedAtList= {some long string of large numbers}
> indexReplicatedAt=1305666600335
> replicationFailedAt=1305666600335
> timesIndexReplicated=10
> lastCycleBytesDownloaded=0
> 
> Keep in mind, replication actually works! If I add/update a document on the
> master i see it on the slaves eventually. So the errors above are especially
> frustrating.
> 
> Any help on any or all of these issues would be greatly appreciated.
> Thanks,
> Ken
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Anyone-having-these-Replication-issues-as-well-tp2954365p2954365.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache spam filter

Posted by Markus Jelsma <ma...@openindex.io>.
I know, that's why i only use old-skool 7-bit ascii:

Content-Type: Text/Plain; charset="us-ascii"

It passed my message the third time when i omitted Ken's original message from 
my e-mail.

> Markus:
> 
> I've had much better luck with the spam filter after switching to plain
> text rather than HTML-ized e-mail.
> 
> FWIW
> Erick
> 
> On Tue, May 17, 2011 at 6:23 PM, Markus Jelsma
> 
> <ma...@openindex.io> wrote:
> > Third and last attempt, Apache spam filter seems to hate me!
> > 
> > Hi,
> > 
> > I've remember a reported issue on the mailing list mentioning the funky
> > interval you describe but it had no replies. I've done several set ups
> > with replication of which one is a very high load service with a
> > pollInterval of 2 seconds. The other set ups have a much higher
> > interval. I've never seen this behaviour before in any set up. Is there
> > something else going on? Can you reproduce this weird behaviour with the
> > same index, software versions etc in a development environment?
> > 
> > About the replication.properties file's number of failed replication; i
> > might not remember correctly but this value, i think, is incremented
> > when a replication fails. A replication can fail when the slave is
> > trying to download a (large) list of large files when, in the meantime,
> > the master merges some segments. This specific issue can be remedied
> > using the commitReserveDuration replication property. However, if this
> > occurs there should be an exception in your log.

Re: Anyone having these Replication issues as well?

Posted by Erick Erickson <er...@gmail.com>.
Markus:

I've had much better luck with the spam filter after switching to plain text
rather than HTML-ized e-mail.

FWIW
Erick

On Tue, May 17, 2011 at 6:23 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> Third and last attempt, Apache spam filter seems to hate me!
>
> Hi,
>
> I've remember a reported issue on the mailing list mentioning the funky
> interval you describe but it had no replies. I've done several set ups with
> replication of which one is a very high load service with a pollInterval of 2
> seconds. The other set ups have a much higher interval. I've never seen this
> behaviour before in any set up. Is there something else going on? Can you
> reproduce this weird behaviour with the same index, software versions etc in a
> development environment?
>
> About the replication.properties file's number of failed replication; i might
> not remember correctly but this value, i think, is incremented when a
> replication fails. A replication can fail when the slave is trying to download
> a (large) list of large files when, in the meantime, the master merges some
> segments. This specific issue can be remedied using the commitReserveDuration
> replication property. However, if this occurs there should be an exception in
> your log.
>
>

Re: Anyone having these Replication issues as well?

Posted by kenf_nc <ke...@realestate.com>.
Thanks Markus, for your patience with getting the response in as well the
comments.

This is my Dev environment, I'm actually going to be setting up a new
master-slave configuration in a different environment today. I'll see if
it's environment specific or not. One thing I didn't mention, wasn't sure it
was germane, is that these servers are in Amazon EC2. Also, the master is
currently on a 32 bit OS the slaves are on 64 bit OS's. Just the order in
which the servers are getting upgraded in dev. 

The master has AutoCommit turned on at 30 second intervals. Even if nothing
is getting indexed, could an AutoCommit occurring during a replication
request cause a failed replication?

Ken

--
View this message in context: http://lucene.472066.n3.nabble.com/Anyone-having-these-Replication-issues-as-well-tp2954365p2957127.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Anyone having these Replication issues as well?

Posted by Markus Jelsma <ma...@openindex.io>.
Third and last attempt, Apache spam filter seems to hate me!

Hi,

I've remember a reported issue on the mailing list mentioning the funky 
interval you describe but it had no replies. I've done several set ups with 
replication of which one is a very high load service with a pollInterval of 2 
seconds. The other set ups have a much higher interval. I've never seen this 
behaviour before in any set up. Is there something else going on? Can you 
reproduce this weird behaviour with the same index, software versions etc in a 
development environment?

About the replication.properties file's number of failed replication; i might 
not remember correctly but this value, i think, is incremented when a 
replication fails. A replication can fail when the slave is trying to download 
a (large) list of large files when, in the meantime, the master merges some 
segments. This specific issue can be remedied using the commitReserveDuration 
replication property. However, if this occurs there should be an exception in 
your log.