You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2010/01/15 04:07:15 UTC

Re: [1.3] help with update timeout issue?

Jerome,

See those "waitFlush=true,waitSearcher=true" ?  Do things improve if you make them false? (not sure how with autocommit without looking at the config and not sure if this makes a difference when autocommit triggers commits)

 
Re deleted docs, they are probably getting expunged, it's just that you always have more deleted docs, so those 2 numbers will never be the same without optimize.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Jerome L Quinn <jl...@us.ibm.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, January 14, 2010 9:59:12 PM
> Subject: [1.3] help with update timeout issue? 
> 
> 
> 
> Hi, folks,
> 
> I am using Solr 1.3 pretty successfully, but am running into an issue that
> hits once in a long while.  I'm still using 1.3 since I have some custom
> code I will have to port forward to 1.4.
> 
> My basic setup is that I have data sources continually pushing data into
> Solr, around 20K adds per day.  The index is currently around 100G, stored
> on local disk on a fast linux server.  I'm trying to make new docs
> searchable as quickly as possible, so I currently have autocommit set to
> 15s.  I originally had 3s but that seems to be a little too unstable.  I
> never optimize the index since optimize will lock things up solid for 2
> hours, dropping docs until the optimize completes.  I'm using the default
> segment merging settings.
> 
> Every once in a while I'm getting a socket timeout when trying to add a
> document.  I traced it to a 20s timeout and then found the corresponding
> point in the Solr log.
> 
> Jan 13, 2010 2:59:15 PM org.apache.solr.core.SolrCore execute
> INFO: [tales] webapp=/solr path=/update params={} status=0 QTime=2
> Jan 13, 2010 2:59:15 PM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
> Jan 13, 2010 2:59:56 PM org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening Searcher@26e926e9 main
> Jan 13, 2010 2:59:56 PM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> 
> Solr locked up for 41 seconds here while doing some of the commit work.
> So, I have a few questions.
> 
> Is this related to GC?
> Does Solr always lock up when merging segments and I just have to live with
> losing the doc I want to add?
> Is there a timeout that would guarantee me a write success?
> Should I just retry in this situation? If so, how do I distinguish between
> this and Solr just being down?
> I already have had issues in the past with too many files open, so
> increasing the merge factor isn't an option.
> 
> 
> On a related note, I had previously asked about optimizing and was told
> that segment merging would take care of cleaning up deleted docs.  However,
> I have the following stats for my index:
> 
> numDocs : 2791091
> maxDoc : 4811416
> 
> My understanding is that numDocs is the docs being searched and maxDoc is
> the number of docs including ones that will disappear after optimization.
> How do I get this cleanup without using optimize, since it locks up Solr
> for multiple hours.  I'm deleting old docs daily as well.
> 
> Thanks for all the help,
> Jerry

Re: [1.3] help with update timeout issue?

Posted by Jerome L Quinn <jl...@us.ibm.com>.


Lance Norskog <go...@gmail.com> wrote on 01/16/2010 12:43:09 AM:

> If your indexing software does not have the ability to retry after a
> failure, you might with to change the timeout from 20 seconds to, say,
> 5 minutes.

I can make it retry, but I have somewhat real-time processes doing these
updates.  Does anyone
push updates into a temporary file and then have an async process push the
updates so that it
can survive the lockups without worry?  This seems like a real hack, but I
don't want a
long timeout like that in the program that currently pushes the data.

One thing that worries me is that solr may not respond to searches in these
windows.  I'm basing
that on the observation that search does not respond when solr is
optimizing.

Can anyone offer me insight on why these delays happen?

Thanks,
Jerry

Re: [1.3] help with update timeout issue?

Posted by Lance Norskog <go...@gmail.com>.

If your indexing software does not have the ability to retry after a
failure, you might with to change the timeout from 20 seconds to, say,
5 minutes.

On Fri, Jan 15, 2010 at 1:20 PM, Jerome L Quinn <jl...@us.ibm.com> wrote:
> Otis Gospodnetic <ot...@yahoo.com> wrote on 01/14/2010 10:07:15
> PM:
>
>> See those "waitFlush=true,waitSearcher=true" ?  Do things improve if
>> you make them false? (not sure how with autocommit without looking
>> at the config and not sure if this makes a difference when
>> autocommit triggers commits)
>
> Looking at DirectUpdateHandler2, it appears that those values are hardwired
> to true for autocommit.  Unless there's another mechanism for changing
> that.
>
>> Re deleted docs, they are probably getting expunged, it's just that
>> you always have more deleted docs, so those 2 numbers will never be
>> the same without optimize.
>
> I can accept that they will always be different, but that's a large
> difference.
> Hmm, a couple weeks ago, I manually deleted a bunch of docs that had
> associated
> data get corrupted.  Normally, I'd only be deleting a day's worth of docs
> at
> a time.  Is there a time I could expect the old stuff to get cleaned up by
> without optimizing?
>
> Thanks,
> Jerry



-- 
Lance Norskog
goksron@gmail.com

Re: [1.3] help with update timeout issue?

Posted by Jerome L Quinn <jl...@us.ibm.com>.

Otis Gospodnetic <ot...@yahoo.com> wrote on 01/14/2010 10:07:15
PM:

> See those "waitFlush=true,waitSearcher=true" ?  Do things improve if
> you make them false? (not sure how with autocommit without looking
> at the config and not sure if this makes a difference when
> autocommit triggers commits)

Looking at DirectUpdateHandler2, it appears that those values are hardwired
to true for autocommit.  Unless there's another mechanism for changing
that.

> Re deleted docs, they are probably getting expunged, it's just that
> you always have more deleted docs, so those 2 numbers will never be
> the same without optimize.

I can accept that they will always be different, but that's a large
difference.
Hmm, a couple weeks ago, I manually deleted a bunch of docs that had
associated
data get corrupted.  Normally, I'd only be deleting a day's worth of docs
at
a time.  Is there a time I could expect the old stuff to get cleaned up by
without optimizing?

Thanks,
Jerry