You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Rachel McConnell <ra...@instructables.com> on 2008/02/02 00:01:24 UTC

duplicate entries being returned, possible caching issue?

We have just started seeing an intermittent problem in our production
Solr instances, where the same document is returned twice in one
request.  Most of the content of the response consists of duplicates.
It's not consistent; maybe 1/3 of the time this is happening and the
rest of the time, one return document is sent per actual Solr
document.

We recently made some changes to our caching strategy, basically to
increase the values across the board.  This is the only change to our
Solr instance for quite some time.

Our production system consists of the following:

* 'write', a Solr server used as the master index, optimized for
writes.  all 3 application servers use this
* 'read1' & 'read2', Solr servers optimized for reads, which synch
from the master every 20 minutes.  these two are behind a pound load
balancer.  Two application servers use these for searching.
* 'read3', a Solr server identical to read1 & read2, but which is not
load balanced, and used by only one application server.

Has anyone any ideas how to start debugging this?  What information
should I be looking for that could shed some light on this?

Thanks for any advice,
Rachel

Re: duplicate entries being returned, possible caching issue?

Posted by Chris Hostetter <ho...@fucit.org>.

: I've reviewed the wiki pages about snappuller
: (http://wiki.apache.org/solr/SolrCollectionDistributionScripts) and
: solrconfig.xml (http://wiki.apache.org/solr/SolrConfigXml) and it
: seems that the snappuller is intended to be used on the slave server.
: In our case, the slave servers do no updating and never commit; the
: master is the only one that commits.  Is there a standard way for the
: just-committed, consistent index to be pushed from the master server
: out to the slaves?

the master runs snapshooter in either a postCommit or postOptimize hook.
the slaves run snappuller and snapinstaller via schedule (or manually)

...maybe that's what you are already doing ... but earlier in this thread 
when yonik asked about postCommit you said...

> > Are you using the postCommit hook in solrconfig.xml to call snapshooter?
> 
> No, just the crontab.  We have only one master server on which commits
> are made, and the servers on which requests are made run the
> snapshooter periodically.  No data changes are made on the read
> servers, so postCommit would never be called anyway (I believe). 

...which (as i read it, and i'm pretty sure yonik read it the same way) 
means you are running snapshooter on your slave machiens (ie: "servers on 
which requests are made") ... there is (normally) no reason to run 
snapshooter on those machines ... which (along with the other stuff you're 
written) makes it seem like maybe you are getting the index off the master 
in some unrecommended way (and getting it before the delets which have 
been batched up are processed) and then only using the scripts on the 
slave/query machines.


Then again: this may all be a Red Herring and totally unrelated to your 
problem.




-Hoss

Re: duplicate entries being returned, possible caching issue?

Posted by Rachel McConnell <ra...@instructables.com>.

On 2/4/08, Yonik Seeley <yo...@apache.org> wrote:
> On Feb 4, 2008 2:20 PM, Rachel McConnell <ra...@instructables.com> wrote:
> > > If you are running snapshooter asynchronously, this would be the cause.
> > > It's designed to be run from solr (via a postCommit or postOptimize
> > > hook) at specific points where a consistent view of the index is
> > > available.
> >
> > So our cron job might be running DURING an update, for example, and
> > get duplicate values that way?
>
> Right.  Duplicates are removed on a commit(), so if a snapshot is
> being taken at any other time than right after a commit, those deletes
> will not have been performed.

I've reviewed the wiki pages about snappuller
(http://wiki.apache.org/solr/SolrCollectionDistributionScripts) and
solrconfig.xml (http://wiki.apache.org/solr/SolrConfigXml) and it
seems that the snappuller is intended to be used on the slave server.
In our case, the slave servers do no updating and never commit; the
master is the only one that commits.  Is there a standard way for the
just-committed, consistent index to be pushed from the master server
out to the slaves?

In fact I don't see how this is supposed to work in any environment
where the master and slave Solr servers are on different physical
machines.  The postCommit handler should run after a commit, which
only happens on the master server; yet it runs snappuller which should
run on a slave.  I am probably missing something here, is there any
more documentation you can point me to?

Rachel

Re: duplicate entries being returned, possible caching issue?

Posted by Yonik Seeley <yo...@apache.org>.

On Feb 4, 2008 2:20 PM, Rachel McConnell <ra...@instructables.com> wrote:
> > If you are running snapshooter asynchronously, this would be the cause.
> > It's designed to be run from solr (via a postCommit or postOptimize
> > hook) at specific points where a consistent view of the index is
> > available.
>
> So our cron job might be running DURING an update, for example, and
> get duplicate values that way?

Right.  Duplicates are removed on a commit(), so if a snapshot is
being taken at any other time than right after a commit, those deletes
will not have been performed.

>  I'd have thought that in that case,
> the dupe values would stick around until the next update, 20 minutes
> later,

If you don't call commit() on the master, those dups will still be there.

-Yonik

Re: duplicate entries being returned, possible caching issue?

Posted by Rachel McConnell <ra...@instructables.com>.

On 2/4/08, Yonik Seeley <yo...@apache.org> wrote:
> On Feb 4, 2008 1:48 PM, Rachel McConnell <ra...@instructables.com> wrote:
> > On 2/4/08, Yonik Seeley <yo...@apache.org> wrote:
> > > On Feb 4, 2008 1:15 PM, Rachel McConnell <ra...@instructables.com> wrote:
> > > > We are using Solr's replication scripts.  They are set to run every 20
> > > > minutes, via a cron job on the slave servers.  Any further useful info
> > > > I can give regarding them?
> > >
> > > Are you using the postCommit hook in solrconfig.xml to call snapshooter?
> >
> > No, just the crontab.  We have only one master server on which commits
> > are made, and the servers on which requests are made run the
> > snapshooter periodically.
>
> If you are running snapshooter asynchronously, this would be the cause.
> It's designed to be run from solr (via a postCommit or postOptimize
> hook) at specific points where a consistent view of the index is
> available.

So our cron job might be running DURING an update, for example, and
get duplicate values that way?  I'd have thought that in that case,
the dupe values would stick around until the next update, 20 minutes
later, and we have not observed that to happen.  Or do you mean
something else?

thanks,
Rachel

Re: duplicate entries being returned, possible caching issue?

Posted by Yonik Seeley <yo...@apache.org>.

On Feb 4, 2008 1:48 PM, Rachel McConnell <ra...@instructables.com> wrote:
> On 2/4/08, Yonik Seeley <yo...@apache.org> wrote:
> > On Feb 4, 2008 1:15 PM, Rachel McConnell <ra...@instructables.com> wrote:
> > > We are using Solr's replication scripts.  They are set to run every 20
> > > minutes, via a cron job on the slave servers.  Any further useful info
> > > I can give regarding them?
> >
> > Are you using the postCommit hook in solrconfig.xml to call snapshooter?
>
> No, just the crontab.  We have only one master server on which commits
> are made, and the servers on which requests are made run the
> snapshooter periodically.

If you are running snapshooter asynchronously, this would be the cause.
It's designed to be run from solr (via a postCommit or postOptimize
hook) at specific points where a consistent view of the index is
available.

-Yonik

Re: duplicate entries being returned, possible caching issue?

Posted by Rachel McConnell <ra...@instructables.com>.

On 2/4/08, Yonik Seeley <yo...@apache.org> wrote:
> On Feb 4, 2008 1:15 PM, Rachel McConnell <ra...@instructables.com> wrote:
> > We are using Solr's replication scripts.  They are set to run every 20
> > minutes, via a cron job on the slave servers.  Any further useful info
> > I can give regarding them?
>
> Are you using the postCommit hook in solrconfig.xml to call snapshooter?

No, just the crontab.  We have only one master server on which commits
are made, and the servers on which requests are made run the
snapshooter periodically.  No data changes are made on the read
servers, so postCommit would never be called anyway (I believe).

> The other possibility is a JVM crash happening before Solr removes
> deleted documents.

This would crash the appserver, which isn't happening.  Also the
duplicates don't seem to be returned often; we see a case of duplicate
results, but within a minute or less it goes away and the correct set
of results is returned again.  This seems to point to a problem with
the cache, to me.  But I don't have a good sense of how to debug it...

We tried changing the autowarmer settings to not pull anything from
the cache.  I'll write again if this seems to fix the problem - by
which I mean, if we don't see it at all for a day or two.

thanks,
Rachel

Re: duplicate entries being returned, possible caching issue?

Posted by Yonik Seeley <yo...@apache.org>.

On Feb 4, 2008 1:15 PM, Rachel McConnell <ra...@instructables.com> wrote:
> We are using Solr's replication scripts.  They are set to run every 20
> minutes, via a cron job on the slave servers.  Any further useful info
> I can give regarding them?

Are you using the postCommit hook in solrconfig.xml to call snapshooter?
The other possibility is a JVM crash happening before Solr removes
deleted documents.

-Yonik

Re: duplicate entries being returned, possible caching issue?

Posted by Rachel McConnell <ra...@instructables.com>.

We are using Solr's replication scripts.  They are set to run every 20
minutes, via a cron job on the slave servers.  Any further useful info
I can give regarding them?

R

On 2/3/08, Yonik Seeley <yo...@apache.org> wrote:
> I would guess you are seeing a view of the index after adding some
> documents but before the duplicates have been removed.  Are you using
> Solr's replication scripts?
>
> -Yonik
>
> On Feb 1, 2008 6:01 PM, Rachel McConnell <ra...@instructables.com> wrote:
> > We have just started seeing an intermittent problem in our production
> > Solr instances, where the same document is returned twice in one
> > request.  Most of the content of the response consists of duplicates.
> > It's not consistent; maybe 1/3 of the time this is happening and the
> > rest of the time, one return document is sent per actual Solr
> > document.
> >
> > We recently made some changes to our caching strategy, basically to
> > increase the values across the board.  This is the only change to our
> > Solr instance for quite some time.
> >
> > Our production system consists of the following:
> >
> > * 'write', a Solr server used as the master index, optimized for
> > writes.  all 3 application servers use this
> > * 'read1' & 'read2', Solr servers optimized for reads, which synch
> > from the master every 20 minutes.  these two are behind a pound load
> > balancer.  Two application servers use these for searching.
> > * 'read3', a Solr server identical to read1 & read2, but which is not
> > load balanced, and used by only one application server.
> >
> > Has anyone any ideas how to start debugging this?  What information
> > should I be looking for that could shed some light on this?
> >
> > Thanks for any advice,
> > Rachel
> >
>

Re: duplicate entries being returned, possible caching issue?

Posted by Yonik Seeley <yo...@apache.org>.

I would guess you are seeing a view of the index after adding some
documents but before the duplicates have been removed.  Are you using
Solr's replication scripts?

-Yonik

On Feb 1, 2008 6:01 PM, Rachel McConnell <ra...@instructables.com> wrote:
> We have just started seeing an intermittent problem in our production
> Solr instances, where the same document is returned twice in one
> request.  Most of the content of the response consists of duplicates.
> It's not consistent; maybe 1/3 of the time this is happening and the
> rest of the time, one return document is sent per actual Solr
> document.
>
> We recently made some changes to our caching strategy, basically to
> increase the values across the board.  This is the only change to our
> Solr instance for quite some time.
>
> Our production system consists of the following:
>
> * 'write', a Solr server used as the master index, optimized for
> writes.  all 3 application servers use this
> * 'read1' & 'read2', Solr servers optimized for reads, which synch
> from the master every 20 minutes.  these two are behind a pound load
> balancer.  Two application servers use these for searching.
> * 'read3', a Solr server identical to read1 & read2, but which is not
> load balanced, and used by only one application server.
>
> Has anyone any ideas how to start debugging this?  What information
> should I be looking for that could shed some light on this?
>
> Thanks for any advice,
> Rachel
>