You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Nicholas Chase <nc...@earthlink.net> on 2011/07/18 16:53:20 UTC

NRT and commit behavior

Very glad to hear that NRT is finally here!  But my question is this: 
will things still come to a standstill during a commit?

Thanks...

----  Nick

Re: NRT and commit behavior

Posted by Nagendra Nagarajayya <nn...@transaxtions.com>.

 From one of the users of NRT, their system was freezing with commits at 
about 1.5 million docs due to the frequency of commits but with NRT 
(Solr  with RankingAlgorithm) update document performance and a commit 
interval of about 15 mins they no longer have the freeze problem.

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org  <http://solr-ra.tgels.com>
http://rankingalgorithm.tgels.org  <http://rankingalgorithm.tgels.com>

On 7/18/2011 7:53 AM, Nicholas Chase wrote:
> Very glad to hear that NRT is finally here!  But my question is this: 
> will things still come to a standstill during a commit?
>
> Thanks...
>
> ----  Nick
>
>

Re: NRT and commit behavior

Posted by Mark Miller <ma...@gmail.com>.

I've written a blog post on some of the recent improvements that explains things a bit:

http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-%E2%80%98near-realtime%E2%80%99-improvements/

On Jul 18, 2011, at 10:53 AM, Nicholas Chase wrote:

> Very glad to hear that NRT is finally here!  But my question is this: will things still come to a standstill during a commit?
> 
> Thanks...
> 
> ----  Nick

- Mark Miller
lucidimagination.com

Re: NRT and commit behavior

Posted by Vadim Kisselmann <v....@googlemail.com>.

Tirthankar,

are you indexing 1.smaller docs or 2.books?
if 1.  your caches are too big for your memory, as Erick already said.
Try to allocate 10GB für JVM, leave 14GB for your HDD-Cache and make your
caches smaller.

if 2.  read the blog-posts on hathitrust.com.
http://www.hathitrust.org/blogs/large-scale-search

Regards
Vadim


2011/9/24 Erick Erickson <er...@gmail.com>

> No <G>. The problem is that "number of documents" isn't a reliable
> indicator of resource consumption. Consider the difference between
> indexing a twitter message and a book. I can put a LOT more docs
> of 140 chars on a single machine of size X than I can books.
>
> Unfortunately, the only way I know of is to test. Use something like
> jMeter of SolrMeter to fire enough queries at your machine to
> determine when you're over-straining resources and shard at that
> point (or get a bigger machine <G>)..
>
> Best
> Erick
>
> On Wed, Sep 21, 2011 at 8:24 PM, Tirthankar Chatterjee
> <tc...@commvault.com> wrote:
> > Okay, but is there any number that if we reach on the index size or total
> docs in the index or the size of physical memory that sharding should be
> considered.
> >
> > I am trying to find the winning combination.
> > Tirthankar
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: Friday, September 16, 2011 7:46 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: NRT and commit behavior
> >
> > Uhm, you're putting  a lot of index into not very much memory. I really
> think you're going to have to shard your index across several machines to
> get past this problem. Simply increasing the size of your caches is still
> limited by the physical memory you're working with.
> >
> > You really have to put a profiler on the system to see what's going on.
> At that size there are too many things that it *could* be to definitively
> answer it with e-mails....
> >
> > Best
> > Erick
> >
> > On Wed, Sep 14, 2011 at 7:35 AM, Tirthankar Chatterjee <
> tchatterjee@commvault.com> wrote:
> >> Erick,
> >> Also, we had  our solrconfig where we have tried increasing the
> cache.... making the below value for autowarm count as 0 helps returning the
> commit call within the second, but that will slow us down on searches....
> >>
> >> <filterCache
> >>      class="solr.FastLRUCache"
> >>      size="16384"
> >>      initialSize="4096"
> >>      autowarmCount="4096"/>
> >>
> >>    <!-- Cache used to hold field values that are quickly accessible
> >>         by document id.  The fieldValueCache is created by default
> >>         even if not configured here.
> >>      <fieldValueCache
> >>        class="solr.FastLRUCache"
> >>        size="512"
> >>        autowarmCount="128"
> >>        showItems="32"
> >>      />
> >>    -->
> >>
> >>   <!-- queryResultCache caches results of searches - ordered lists of
> >>         document ids (DocList) based on a query, a sort, and the range
> >>         of documents requested.  -->
> >>    <queryResultCache
> >>      class="solr.LRUCache"
> >>      size="16384"
> >>      initialSize="4096"
> >>      autowarmCount="4096"/>
> >>
> >>  <!-- documentCache caches Lucene Document objects (the stored fields
> for each document).
> >>       Since Lucene internal document ids are transient, this cache
> >> will not be autowarmed.  -->
> >>    <documentCache
> >>      class="solr.LRUCache"
> >>      size="512"
> >>      initialSize="512"
> >>      autowarmCount="512"/>
> >>
> >> -----Original Message-----
> >> From: Tirthankar Chatterjee [mailto:tchatterjee@commvault.com]
> >> Sent: Wednesday, September 14, 2011 7:31 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: NRT and commit behavior
> >>
> >> Erick,
> >> Here is the answer to your questions:
> >> Our index is 267 GB
> >> We are not optimizing...
> >> No we have not profiled yet to check the bottleneck, but logs indicate
> opening the searchers is taking time...
> >> Nothing except SOLR
> >> Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and
> >> JVM and Tomcat
> >>
> >> -----Original Message-----
> >> From: Erick Erickson [mailto:erickerickson@gmail.com]
> >> Sent: Sunday, September 11, 2011 11:37 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: NRT and commit behavior
> >>
> >> Hmm, OK. You might want to look at the non-cached filter query stuff,
> it's quite recent.
> >> The point here is that it is a filter that is applied only after all of
> the less expensive filter queries are run, One of its uses is exactly ACL
> calculations. Rather than calculate the ACL for the entire doc set, it only
> calculates access for docs that have made it past all the other elements of
> the query.... See SOLR-2429 and note that it is a 3.4 (currently being
> released) only.
> >>
> >> As to why your commits are taking so long, I have no idea given that you
> really haven't given us much to work with.
> >>
> >> How big is your index? Are you optimizing? Have you profiled the
> application to see what the bottleneck is (I/O, CPU, etc?). What else is
> running on your machine? It's quite surprising that it takes that long. How
> much memory are you giving the JVM? etc...
> >>
> >> You might want to review:
> >> http://wiki.apache.org/solr/UsingMailingLists
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee <
> tchatterjee@commvault.com> wrote:
> >>> Erick,
> >>> What you said is correct for us the searches are based on some Active
> Directory permissions which are populated in Filter query parameter. So we
> don't have any warming query concept as we cannot fire for every user ahead
> of time.
> >>>
> >>> What we do here is that when user logs in we do an invalid query(which
> return no results instead of '*') with the correct filter query (which is
> his permissions based on the login). This way the cache gets warmed up with
> valid docs.
> >>>
> >>> It works then.
> >>>
> >>>
> >>> Also, can you please let me know why commit is taking 45 mins to 1
> hours on a good resourced hardware with multiple processors and 16gb RAM 64
> bit VM, etc. We tried passing waitSearcher as false and found that inside
> the code it hard coded to be true. Is there any specific reason. Can we
> change that value to honor what is being passed.
> >>>
> >>> Thanks,
> >>> Tirthankar
> >>>
> >>> -----Original Message-----
> >>> From: Erick Erickson [mailto:erickerickson@gmail.com]
> >>> Sent: Thursday, September 01, 2011 8:38 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: NRT and commit behavior
> >>>
> >>> Hmm, I'm guessing a bit here, but using an invalid query doesn't sound
> very safe, but I suppose it *might* be OK.
> >>>
> >>> What does "invalid" mean? Syntax error? not safe.
> >>>
> >>> search that returns 0 results? I don't know, but I'd guess that
> >>> filling your caches, which is the point of warming queries, might be
> >>> short circuited if the query returns
> >>> 0 results but I don't know for sure.
> >>>
> >>> But the fact that "invalid queries return quicker" does not inspire
> confidence since the *point* of warming queries is to spend the time up
> front so your users don't have to wait.
> >>>
> >>> So here's a test. Comment out your warming queries.
> >>> Restart your server and fire the warming query from the browser
> with&debugQuery=on and look at the QTime parameter.
> >>>
> >>> Now fire the same form of the query (as in the same sort, facet,
> grouping, etc, but presumably a valid term). See the QTime.
> >>>
> >>> Now fire the same form of the query with a *different* value in the
> query. That is, it should search on different terms but with the same sort,
> facet, etc. to avoid getting your data straight from the queryResultCache.
> >>>
> >>> My guess is that the last query will return much more quickly than the
> second query. Which would indicate that the first form isn't doing you any
> good.
> >>>
> >>> But a test is worth a thousand opinions.
> >>>
> >>> Best
> >>> Erick
> >>>
> >>> On Wed, Aug 31, 2011 at 11:04 AM, Tirthankar Chatterjee <
> tchatterjee@commvault.com> wrote:
> >>>> Also noticed that "waitSearcher" parameter value is not  honored
> inside commit. It is always defaulted to true which makes it slow during
> indexing.
> >>>>
> >>>> What we are trying to do is use an invalid query (which wont return
> any results) as a warming query. This way the commit returns faster. Are we
> doing something wrong here?
> >>>>
> >>>> Thanks,
> >>>> Tirthankar
> >>>>
> >>>> -----Original Message-----
> >>>> From: Jonathan Rochkind [mailto:rochkind@jhu.edu]
> >>>> Sent: Monday, July 18, 2011 11:38 AM
> >>>> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
> >>>> Subject: Re: NRT and commit behavior
> >>>>
> >>>> In practice, in my experience at least, a very 'expensive' commit
> >>>> can still slow down searches significantly, I think just due to CPU
> >>>> (or
> >>>> i/o?) starvation. Not sure anything can be done about that.  That's my
> experience in Solr 1.4.1, but since searches have always been async with
> commits, it probably is the same situation even in more recent versions, I'd
> guess.
> >>>>
> >>>> On 7/18/2011 11:07 AM, Yonik Seeley wrote:
> >>>>> On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chase<
> nchase@earthlink.net>  wrote:
> >>>>>> Very glad to hear that NRT is finally here!  But my question is
> this:
> >>>>>> will things still come to a standstill during a commit?
> >>>>> New updates can now proceed in parallel with a commit, and searches
> >>>>> have always been completely asynchronous w.r.t. commits.
> >>>>>
> >>>>> -Yonik
> >>>>> http://www.lucidimagination.com
> >>>>>
> >>>> ******************Legal Disclaimer***************************
> >>>> "This communication may contain confidential and privileged material
> >>>> for the sole use of the intended recipient. Any unauthorized review,
> >>>> use or distribution by others is strictly prohibited. If you have
> >>>> received the message in error, please advise the sender by reply
> >>>> email and delete the message. Thank you."
> >>>> *********************************************************
> >>>>
> >>>
> >>
> >
>