You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nitin Solanki <ni...@gmail.com> on 2015/03/13 04:19:12 UTC

Whole RAM consumed while Indexing.

Hello,
          I have written a python script to do 20000 documents indexing
each time on Solr. I have 28 GB RAM with 8 CPU.
When I started indexing, at that time 15 GB RAM was freed. While indexing,
all RAM is consumed but **not** a single document is indexed. Why so?
And it through *HTTPError: HTTP Error 503: Service Unavailable* in python
script.
I think it is due to heavy load on Zookeeper by which all nodes went down.
I am not sure about that. Any help please..
Or anything else is happening..
And how to overcome this issue.
Please assist me towards right path.
Thanks..

Warm Regards,
Nitin Solanki

Re: Whole RAM consumed while Indexing.

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/18/2015 9:44 AM, Nitin Solanki wrote:
>              I am just saying. I want to be sure on commits difference..
> What if I do frequent commits or not? And why I am saying that I need to
> commit things so very quickly because I have to index 28GB of data which
> takes 7-8 hours(frequent commits).
> As you said, do commits after 60000 seconds then it will be more expensive.
> If I don't encounter with **"overlapping searchers" warning messages** then
> I feel it seems to be okay. Is it?

Even if the commit only handles a single document and it's a soft
commit, it is an expensive operation in terms of CPU, and in a
garbage-collected environment like Java, memory churn as well.  A commit
also invalidates the Solr caches, so if you have autowarming turned on,
then you have the additional overhead of doing a bunch of queries to
warm the new cache - on every single soft commit.

Doing commits as often as three times a second (you did say the interval
was 300 milliseconds) is generally a bad idea.  Increasing the interval
to once a minute will take a huge amount of load off of your servers, so
indexing will happen faster.

Thanks,
Shawn


Re: Whole RAM consumed while Indexing.

Posted by Nitin Solanki <ni...@gmail.com>.
Hi Alxeandre,
                        Number of segment counts are different but document
counts are same.
With (soft commit - 300 and hardcommit - 6000) = No. of segment - 43
AND
With (soft commit - 60000 and hardcommit - 60000) = No. of segment - 31

I dont' have any idea related to segment counts. What is it? How to solve
it? Any idea.
Or it is fine without worrying about segments.
Just want to ask - If segment counts are more than searching will be slow?

On Wed, Mar 18, 2015 at 10:14 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> Probably merged somewhat differently with some terms indexes repeating
> between segments. Check the number of segments in data directory.And
> do search for *:* and make sure both do have the same document counts.
>
> Also, In all these discussions, you still haven't answered about how
> fast after indexing you want to _search_? Because, if you are not
> actually searching while committing, you could even index on a
> completely separate server (e.g. a faster one) and swap (or alias)
> index in afterwards. Unless, of course, I missed it, it's a lot of
> emails in a very short window of time.
>
> Regards,
>    Alex.
>
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 18 March 2015 at 12:09, Nitin Solanki <ni...@gmail.com> wrote:
> > When I kept my configuration to 300 for soft commit and 3000 for hard
> > commit and indexed some amount of data, I got the data size of the whole
> > index to be 6GB after completing the indexing.
> >
> > When I changed the configuration to 60000 for soft commit and 60000 for
> > hard commit and indexed same data then I got the data size of the whole
> > index to be 5GB after completing the indexing.
> >
> > But the number of documents in the both scenario were same. I am
> wondering
> > how that can be possible?
> >
> > On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <ni...@gmail.com>
> wrote:
> >
> >> Hi Erick,
> >>              I am just saying. I want to be sure on commits difference..
> >> What if I do frequent commits or not? And why I am saying that I need to
> >> commit things so very quickly because I have to index 28GB of data which
> >> takes 7-8 hours(frequent commits).
> >> As you said, do commits after 60000 seconds then it will be more
> expensive.
> >> If I don't encounter with **"overlapping searchers" warning messages**
> >> then I feel it seems to be okay. Is it?
> >>
> >>
> >>
> >>
> >> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
> erickerickson@gmail.com>
> >> wrote:
> >>
> >>> Don't do it. Really, why do you want to do this? This seems like
> >>> an "XY" problem, you haven't explained why you need to commit
> >>> things so very quickly.
> >>>
> >>> I suspect you haven't tried _searching_ while committing at such
> >>> a rate, and you might as well turn all your top-level caches off
> >>> in solrconfig.xml since they won't be useful at all.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <ni...@gmail.com>
> >>> wrote:
> >>> > Hi,
> >>> >        If I do very very fast indexing(softcommit = 300 and
> hardcommit =
> >>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000)
> as
> >>> you
> >>> > both said. Will fast indexing fail to index some data?
> >>> > Any suggestion on this ?
> >>> >
> >>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
> >>> > andyetitmoves@gmail.com> wrote:
> >>> >
> >>> >> Yes, and doing so is painful and takes lots of people and hardware
> >>> >> resources to get there for large amounts of data and queries :)
> >>> >>
> >>> >> As Erick says, work backwards from 60s and first establish how high
> the
> >>> >> commit interval can be to satisfy your use case..
> >>> >> On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com>
> >>> wrote:
> >>> >>
> >>> >> > First start by lengthening your soft and hard commit intervals
> >>> >> > substantially. Start with 60000 and work backwards I'd say.
> >>> >> >
> >>> >> > Ramkumar has tuned the heck out of his installation to get the
> commit
> >>> >> > intervals to be that short ;).
> >>> >> >
> >>> >> > I'm betting that you'll see your RAM usage go way down, but that'
> s a
> >>> >> > guess until you test.
> >>> >> >
> >>> >> > Best,
> >>> >> > Erick
> >>> >> >
> >>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
> >>> nitinmlvya@gmail.com>
> >>> >> > wrote:
> >>> >> > > Hi Erick,
> >>> >> > >             You are saying correct. Something, **"overlapping
> >>> >> searchers"
> >>> >> > > warning messages** are coming in logs.
> >>> >> > > **numDocs numbers** are changing when documents are adding at
> the
> >>> time
> >>> >> of
> >>> >> > > indexing.
> >>> >> > > Any help?
> >>> >> > >
> >>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
> >>> >> > erickerickson@gmail.com>
> >>> >> > > wrote:
> >>> >> > >
> >>> >> > >> First, the soft commit interval is very short. Very, very,
> very,
> >>> very
> >>> >> > >> short. 300ms is
> >>> >> > >> just short of insane unless it's a typo ;).
> >>> >> > >>
> >>> >> > >> Here's a long background:
> >>> >> > >>
> >>> >> > >>
> >>> >> >
> >>> >>
> >>>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>> >> > >>
> >>> >> > >> But the short form is that you're opening searchers every 300
> ms.
> >>> The
> >>> >> > >> hard commit is better,
> >>> >> > >> but every 3 seconds is still far too short IMO. I'd start with
> >>> soft
> >>> >> > >> commits of 60000 and hard
> >>> >> > >> commits of 60000 (60 seconds), meaning that you're going to
> have
> >>> to
> >>> >> > >> wait 1 minute for
> >>> >> > >> docs to show up unless you explicitly commit.
> >>> >> > >>
> >>> >> > >> You're throwing away all the caches configured in
> solrconfig.xml
> >>> more
> >>> >> > >> than 3 times a second,
> >>> >> > >> executing autowarming, etc, etc, etc....
> >>> >> > >>
> >>> >> > >> Changing these to longer intervals might cure the problem, but
> if
> >>> not
> >>> >> > >> then, as Hoss would
> >>> >> > >> say, "details matter". I suspect you're also seeing
> "overlapping
> >>> >> > >> searchers" warning messages
> >>> >> > >> in your log, and it;s _possible_ that what's happening is that
> >>> you're
> >>> >> > >> just exceeding the
> >>> >> > >> max warming searchers and never opening a new searcher with the
> >>> >> > >> newly-indexed documents.
> >>> >> > >> But that's a total shot in the dark.
> >>> >> > >>
> >>> >> > >> How are you looking for docs (and not finding them)? Does the
> >>> numDocs
> >>> >> > >> number in
> >>> >> > >> the solr admin screen change?
> >>> >> > >>
> >>> >> > >>
> >>> >> > >> Best,
> >>> >> > >> Erick
> >>> >> > >>
> >>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
> >>> nitinmlvya@gmail.com
> >>> >> >
> >>> >> > >> wrote:
> >>> >> > >> > Hi Alexandre,
> >>> >> > >> >
> >>> >> > >> >
> >>> >> > >> > *Hard Commit* is :
> >>> >> > >> >
> >>> >> > >> >      <autoCommit>
> >>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
> >>> >> > >> >        <openSearcher>false</openSearcher>
> >>> >> > >> >      </autoCommit>
> >>> >> > >> >
> >>> >> > >> > *Soft Commit* is :
> >>> >> > >> >
> >>> >> > >> > <autoSoftCommit>
> >>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
> >>> >> > >> > </autoSoftCommit>
> >>> >> > >> >
> >>> >> > >> > And I am committing 20000 documents each time.
> >>> >> > >> > Is it good config for committing?
> >>> >> > >> > Or I am good something wrong ?
> >>> >> > >> >
> >>> >> > >> >
> >>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
> >>> >> > >> arafalov@gmail.com>
> >>> >> > >> > wrote:
> >>> >> > >> >
> >>> >> > >> >> What's your commit strategy? Explicit commits? Soft
> >>> commits/hard
> >>> >> > >> >> commits (in solrconfig.xml)?
> >>> >> > >> >>
> >>> >> > >> >> Regards,
> >>> >> > >> >>    Alex.
> >>> >> > >> >> ----
> >>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
> >>> newsletter:
> >>> >> > >> >> http://www.solr-start.com/
> >>> >> > >> >>
> >>> >> > >> >>
> >>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <
> nitinmlvya@gmail.com
> >>> >
> >>> >> > wrote:
> >>> >> > >> >> > Hello,
> >>> >> > >> >> >           I have written a python script to do 20000
> >>> documents
> >>> >> > >> indexing
> >>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
> >>> >> > >> >> > When I started indexing, at that time 15 GB RAM was freed.
> >>> While
> >>> >> > >> >> indexing,
> >>> >> > >> >> > all RAM is consumed but **not** a single document is
> >>> indexed. Why
> >>> >> > so?
> >>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service
> >>> Unavailable*
> >>> >> in
> >>> >> > >> python
> >>> >> > >> >> > script.
> >>> >> > >> >> > I think it is due to heavy load on Zookeeper by which all
> >>> nodes
> >>> >> > went
> >>> >> > >> >> down.
> >>> >> > >> >> > I am not sure about that. Any help please..
> >>> >> > >> >> > Or anything else is happening..
> >>> >> > >> >> > And how to overcome this issue.
> >>> >> > >> >> > Please assist me towards right path.
> >>> >> > >> >> > Thanks..
> >>> >> > >> >> >
> >>> >> > >> >> > Warm Regards,
> >>> >> > >> >> > Nitin Solanki
> >>> >> > >> >>
> >>> >> > >>
> >>> >> >
> >>> >>
> >>>
> >>
> >>
>

Re: Whole RAM consumed while Indexing.

Posted by Nitin Solanki <ni...@gmail.com>.
Hi Erick,
           I read mergeFactor Policy for indexing. By default, mergerFactor
is 10. As said in document,

High value merge factor (e.g., 25):

   - Pro: Generally improves indexing speed
   - Con: Less frequent merges, resulting in a collection with more index
   files which may slow searching

Low value merge factor (e.g., 2):

   - Pro: Smaller number of index files, which speeds up searching.
   - Con: More segment merges slow down indexing.

So, My main purpose is **searching**. Searching must be fast. Therefore, If
I set the value of **mergeFactor = 2 ** then indexing will be slow but
searching may fast right.

Once Again, I will tell. I am indexing(Total data size - 28GB)  20000
document at a time that encounter commits after 15 seconds(hard commit) and
10 mins(soft commit).

Is searching be fast, if I set **mergeFactor = 2 ** and what should be the
value for ramBufferSizeMB, maxBufferedDocs, maxIndexingThreads?

Right now, All value are set by default..

On Fri, Mar 20, 2015 at 11:42 AM, Nitin Solanki <ni...@gmail.com>
wrote:

>
>
> On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> That or even hard commit to 60 seconds. It's strictly a matter of how
>> often
>> you want to close old segments and open new ones.
>>
>> On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki <ni...@gmail.com>
>> wrote:
>> > Hi Erick..
>> >               I read your Article. Really nice...
>> > Inside that you said that for bulk indexing. Set soft commit = 10 mins
>> and
>> > hard commit = 15sec. Is it also okay for my scenario?
>> >
>> > On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson <
>> erickerickson@gmail.com>
>> > wrote:
>> >
>> >> bq: As you said, do commits after 60000 seconds
>> >>
>> >> No, No, No. I'm NOT saying 60000 seconds! That time is in
>> _milliseconds_
>> >> as Shawn said. So setting it to 60000 is every minute.
>> >>
>> >> From solrconfig.xml, conveniently located immediately above the
>> >> <autoCommit> tag:
>> >>
>> >> maxTime - Maximum amount of time in ms that is allowed to pass since a
>> >> document was added before automatically triggering a new commit.
>> >>
>> >> Also, a lot of answers to soft and hard commits is here as I pointed
>> >> out before, did you read it?
>> >>
>> >>
>> >>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
>> >> <ar...@gmail.com> wrote:
>> >> > Probably merged somewhat differently with some terms indexes
>> repeating
>> >> > between segments. Check the number of segments in data directory.And
>> >> > do search for *:* and make sure both do have the same document
>> counts.
>> >> >
>> >> > Also, In all these discussions, you still haven't answered about how
>> >> > fast after indexing you want to _search_? Because, if you are not
>> >> > actually searching while committing, you could even index on a
>> >> > completely separate server (e.g. a faster one) and swap (or alias)
>> >> > index in afterwards. Unless, of course, I missed it, it's a lot of
>> >> > emails in a very short window of time.
>> >> >
>> >> > Regards,
>> >> >    Alex.
>> >> >
>> >> > ----
>> >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> >> > http://www.solr-start.com/
>> >> >
>> >> >
>> >> > On 18 March 2015 at 12:09, Nitin Solanki <ni...@gmail.com>
>> wrote:
>> >> >> When I kept my configuration to 300 for soft commit and 3000 for
>> hard
>> >> >> commit and indexed some amount of data, I got the data size of the
>> whole
>> >> >> index to be 6GB after completing the indexing.
>> >> >>
>> >> >> When I changed the configuration to 60000 for soft commit and 60000
>> for
>> >> >> hard commit and indexed same data then I got the data size of the
>> whole
>> >> >> index to be 5GB after completing the indexing.
>> >> >>
>> >> >> But the number of documents in the both scenario were same. I am
>> >> wondering
>> >> >> how that can be possible?
>> >> >>
>> >> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <
>> nitinmlvya@gmail.com>
>> >> wrote:
>> >> >>
>> >> >>> Hi Erick,
>> >> >>>              I am just saying. I want to be sure on commits
>> >> difference..
>> >> >>> What if I do frequent commits or not? And why I am saying that I
>> need
>> >> to
>> >> >>> commit things so very quickly because I have to index 28GB of data
>> >> which
>> >> >>> takes 7-8 hours(frequent commits).
>> >> >>> As you said, do commits after 60000 seconds then it will be more
>> >> expensive.
>> >> >>> If I don't encounter with **"overlapping searchers" warning
>> messages**
>> >> >>> then I feel it seems to be okay. Is it?
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
>> >> erickerickson@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Don't do it. Really, why do you want to do this? This seems like
>> >> >>>> an "XY" problem, you haven't explained why you need to commit
>> >> >>>> things so very quickly.
>> >> >>>>
>> >> >>>> I suspect you haven't tried _searching_ while committing at such
>> >> >>>> a rate, and you might as well turn all your top-level caches off
>> >> >>>> in solrconfig.xml since they won't be useful at all.
>> >> >>>>
>> >> >>>> Best,
>> >> >>>> Erick
>> >> >>>>
>> >> >>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <
>> nitinmlvya@gmail.com>
>> >> >>>> wrote:
>> >> >>>> > Hi,
>> >> >>>> >        If I do very very fast indexing(softcommit = 300 and
>> >> hardcommit =
>> >> >>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit =
>> 60000)
>> >> as
>> >> >>>> you
>> >> >>>> > both said. Will fast indexing fail to index some data?
>> >> >>>> > Any suggestion on this ?
>> >> >>>> >
>> >> >>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
>> >> >>>> > andyetitmoves@gmail.com> wrote:
>> >> >>>> >
>> >> >>>> >> Yes, and doing so is painful and takes lots of people and
>> hardware
>> >> >>>> >> resources to get there for large amounts of data and queries :)
>> >> >>>> >>
>> >> >>>> >> As Erick says, work backwards from 60s and first establish how
>> >> high the
>> >> >>>> >> commit interval can be to satisfy your use case..
>> >> >>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <
>> erickerickson@gmail.com>
>> >> >>>> wrote:
>> >> >>>> >>
>> >> >>>> >> > First start by lengthening your soft and hard commit
>> intervals
>> >> >>>> >> > substantially. Start with 60000 and work backwards I'd say.
>> >> >>>> >> >
>> >> >>>> >> > Ramkumar has tuned the heck out of his installation to get
>> the
>> >> commit
>> >> >>>> >> > intervals to be that short ;).
>> >> >>>> >> >
>> >> >>>> >> > I'm betting that you'll see your RAM usage go way down, but
>> >> that' s a
>> >> >>>> >> > guess until you test.
>> >> >>>> >> >
>> >> >>>> >> > Best,
>> >> >>>> >> > Erick
>> >> >>>> >> >
>> >> >>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
>> >> >>>> nitinmlvya@gmail.com>
>> >> >>>> >> > wrote:
>> >> >>>> >> > > Hi Erick,
>> >> >>>> >> > >             You are saying correct. Something,
>> **"overlapping
>> >> >>>> >> searchers"
>> >> >>>> >> > > warning messages** are coming in logs.
>> >> >>>> >> > > **numDocs numbers** are changing when documents are adding
>> at
>> >> the
>> >> >>>> time
>> >> >>>> >> of
>> >> >>>> >> > > indexing.
>> >> >>>> >> > > Any help?
>> >> >>>> >> > >
>> >> >>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
>> >> >>>> >> > erickerickson@gmail.com>
>> >> >>>> >> > > wrote:
>> >> >>>> >> > >
>> >> >>>> >> > >> First, the soft commit interval is very short. Very, very,
>> >> very,
>> >> >>>> very
>> >> >>>> >> > >> short. 300ms is
>> >> >>>> >> > >> just short of insane unless it's a typo ;).
>> >> >>>> >> > >>
>> >> >>>> >> > >> Here's a long background:
>> >> >>>> >> > >>
>> >> >>>> >> > >>
>> >> >>>> >> >
>> >> >>>> >>
>> >> >>>>
>> >>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >> >>>> >> > >>
>> >> >>>> >> > >> But the short form is that you're opening searchers every
>> 300
>> >> ms.
>> >> >>>> The
>> >> >>>> >> > >> hard commit is better,
>> >> >>>> >> > >> but every 3 seconds is still far too short IMO. I'd start
>> with
>> >> >>>> soft
>> >> >>>> >> > >> commits of 60000 and hard
>> >> >>>> >> > >> commits of 60000 (60 seconds), meaning that you're going
>> to
>> >> have
>> >> >>>> to
>> >> >>>> >> > >> wait 1 minute for
>> >> >>>> >> > >> docs to show up unless you explicitly commit.
>> >> >>>> >> > >>
>> >> >>>> >> > >> You're throwing away all the caches configured in
>> >> solrconfig.xml
>> >> >>>> more
>> >> >>>> >> > >> than 3 times a second,
>> >> >>>> >> > >> executing autowarming, etc, etc, etc....
>> >> >>>> >> > >>
>> >> >>>> >> > >> Changing these to longer intervals might cure the problem,
>> >> but if
>> >> >>>> not
>> >> >>>> >> > >> then, as Hoss would
>> >> >>>> >> > >> say, "details matter". I suspect you're also seeing
>> >> "overlapping
>> >> >>>> >> > >> searchers" warning messages
>> >> >>>> >> > >> in your log, and it;s _possible_ that what's happening is
>> that
>> >> >>>> you're
>> >> >>>> >> > >> just exceeding the
>> >> >>>> >> > >> max warming searchers and never opening a new searcher
>> with
>> >> the
>> >> >>>> >> > >> newly-indexed documents.
>> >> >>>> >> > >> But that's a total shot in the dark.
>> >> >>>> >> > >>
>> >> >>>> >> > >> How are you looking for docs (and not finding them)? Does
>> the
>> >> >>>> numDocs
>> >> >>>> >> > >> number in
>> >> >>>> >> > >> the solr admin screen change?
>> >> >>>> >> > >>
>> >> >>>> >> > >>
>> >> >>>> >> > >> Best,
>> >> >>>> >> > >> Erick
>> >> >>>> >> > >>
>> >> >>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
>> >> >>>> nitinmlvya@gmail.com
>> >> >>>> >> >
>> >> >>>> >> > >> wrote:
>> >> >>>> >> > >> > Hi Alexandre,
>> >> >>>> >> > >> >
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > *Hard Commit* is :
>> >> >>>> >> > >> >
>> >> >>>> >> > >> >      <autoCommit>
>> >> >>>> >> > >> >
>> <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>> >> >>>> >> > >> >        <openSearcher>false</openSearcher>
>> >> >>>> >> > >> >      </autoCommit>
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > *Soft Commit* is :
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > <autoSoftCommit>
>> >> >>>> >> > >> >
>>  <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>> >> >>>> >> > >> > </autoSoftCommit>
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > And I am committing 20000 documents each time.
>> >> >>>> >> > >> > Is it good config for committing?
>> >> >>>> >> > >> > Or I am good something wrong ?
>> >> >>>> >> > >> >
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
>> >> >>>> >> > >> arafalov@gmail.com>
>> >> >>>> >> > >> > wrote:
>> >> >>>> >> > >> >
>> >> >>>> >> > >> >> What's your commit strategy? Explicit commits? Soft
>> >> >>>> commits/hard
>> >> >>>> >> > >> >> commits (in solrconfig.xml)?
>> >> >>>> >> > >> >>
>> >> >>>> >> > >> >> Regards,
>> >> >>>> >> > >> >>    Alex.
>> >> >>>> >> > >> >> ----
>> >> >>>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
>> >> >>>> newsletter:
>> >> >>>> >> > >> >> http://www.solr-start.com/
>> >> >>>> >> > >> >>
>> >> >>>> >> > >> >>
>> >> >>>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <
>> >> nitinmlvya@gmail.com
>> >> >>>> >
>> >> >>>> >> > wrote:
>> >> >>>> >> > >> >> > Hello,
>> >> >>>> >> > >> >> >           I have written a python script to do 20000
>> >> >>>> documents
>> >> >>>> >> > >> indexing
>> >> >>>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
>> >> >>>> >> > >> >> > When I started indexing, at that time 15 GB RAM was
>> >> freed.
>> >> >>>> While
>> >> >>>> >> > >> >> indexing,
>> >> >>>> >> > >> >> > all RAM is consumed but **not** a single document is
>> >> >>>> indexed. Why
>> >> >>>> >> > so?
>> >> >>>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service
>> >> >>>> Unavailable*
>> >> >>>> >> in
>> >> >>>> >> > >> python
>> >> >>>> >> > >> >> > script.
>> >> >>>> >> > >> >> > I think it is due to heavy load on Zookeeper by
>> which all
>> >> >>>> nodes
>> >> >>>> >> > went
>> >> >>>> >> > >> >> down.
>> >> >>>> >> > >> >> > I am not sure about that. Any help please..
>> >> >>>> >> > >> >> > Or anything else is happening..
>> >> >>>> >> > >> >> > And how to overcome this issue.
>> >> >>>> >> > >> >> > Please assist me towards right path.
>> >> >>>> >> > >> >> > Thanks..
>> >> >>>> >> > >> >> >
>> >> >>>> >> > >> >> > Warm Regards,
>> >> >>>> >> > >> >> > Nitin Solanki
>> >> >>>> >> > >> >>
>> >> >>>> >> > >>
>> >> >>>> >> >
>> >> >>>> >>
>> >> >>>>
>> >> >>>
>> >> >>>
>> >>
>>
>
>

Re: Whole RAM consumed while Indexing.

Posted by Nitin Solanki <ni...@gmail.com>.
On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson <er...@gmail.com>
wrote:

> That or even hard commit to 60 seconds. It's strictly a matter of how often
> you want to close old segments and open new ones.
>
> On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki <ni...@gmail.com>
> wrote:
> > Hi Erick..
> >               I read your Article. Really nice...
> > Inside that you said that for bulk indexing. Set soft commit = 10 mins
> and
> > hard commit = 15sec. Is it also okay for my scenario?
> >
> > On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> bq: As you said, do commits after 60000 seconds
> >>
> >> No, No, No. I'm NOT saying 60000 seconds! That time is in _milliseconds_
> >> as Shawn said. So setting it to 60000 is every minute.
> >>
> >> From solrconfig.xml, conveniently located immediately above the
> >> <autoCommit> tag:
> >>
> >> maxTime - Maximum amount of time in ms that is allowed to pass since a
> >> document was added before automatically triggering a new commit.
> >>
> >> Also, a lot of answers to soft and hard commits is here as I pointed
> >> out before, did you read it?
> >>
> >>
> >>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
> >> <ar...@gmail.com> wrote:
> >> > Probably merged somewhat differently with some terms indexes repeating
> >> > between segments. Check the number of segments in data directory.And
> >> > do search for *:* and make sure both do have the same document counts.
> >> >
> >> > Also, In all these discussions, you still haven't answered about how
> >> > fast after indexing you want to _search_? Because, if you are not
> >> > actually searching while committing, you could even index on a
> >> > completely separate server (e.g. a faster one) and swap (or alias)
> >> > index in afterwards. Unless, of course, I missed it, it's a lot of
> >> > emails in a very short window of time.
> >> >
> >> > Regards,
> >> >    Alex.
> >> >
> >> > ----
> >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> > http://www.solr-start.com/
> >> >
> >> >
> >> > On 18 March 2015 at 12:09, Nitin Solanki <ni...@gmail.com>
> wrote:
> >> >> When I kept my configuration to 300 for soft commit and 3000 for hard
> >> >> commit and indexed some amount of data, I got the data size of the
> whole
> >> >> index to be 6GB after completing the indexing.
> >> >>
> >> >> When I changed the configuration to 60000 for soft commit and 60000
> for
> >> >> hard commit and indexed same data then I got the data size of the
> whole
> >> >> index to be 5GB after completing the indexing.
> >> >>
> >> >> But the number of documents in the both scenario were same. I am
> >> wondering
> >> >> how that can be possible?
> >> >>
> >> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <nitinmlvya@gmail.com
> >
> >> wrote:
> >> >>
> >> >>> Hi Erick,
> >> >>>              I am just saying. I want to be sure on commits
> >> difference..
> >> >>> What if I do frequent commits or not? And why I am saying that I
> need
> >> to
> >> >>> commit things so very quickly because I have to index 28GB of data
> >> which
> >> >>> takes 7-8 hours(frequent commits).
> >> >>> As you said, do commits after 60000 seconds then it will be more
> >> expensive.
> >> >>> If I don't encounter with **"overlapping searchers" warning
> messages**
> >> >>> then I feel it seems to be okay. Is it?
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
> >> erickerickson@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> Don't do it. Really, why do you want to do this? This seems like
> >> >>>> an "XY" problem, you haven't explained why you need to commit
> >> >>>> things so very quickly.
> >> >>>>
> >> >>>> I suspect you haven't tried _searching_ while committing at such
> >> >>>> a rate, and you might as well turn all your top-level caches off
> >> >>>> in solrconfig.xml since they won't be useful at all.
> >> >>>>
> >> >>>> Best,
> >> >>>> Erick
> >> >>>>
> >> >>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <
> nitinmlvya@gmail.com>
> >> >>>> wrote:
> >> >>>> > Hi,
> >> >>>> >        If I do very very fast indexing(softcommit = 300 and
> >> hardcommit =
> >> >>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit =
> 60000)
> >> as
> >> >>>> you
> >> >>>> > both said. Will fast indexing fail to index some data?
> >> >>>> > Any suggestion on this ?
> >> >>>> >
> >> >>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
> >> >>>> > andyetitmoves@gmail.com> wrote:
> >> >>>> >
> >> >>>> >> Yes, and doing so is painful and takes lots of people and
> hardware
> >> >>>> >> resources to get there for large amounts of data and queries :)
> >> >>>> >>
> >> >>>> >> As Erick says, work backwards from 60s and first establish how
> >> high the
> >> >>>> >> commit interval can be to satisfy your use case..
> >> >>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <erickerickson@gmail.com
> >
> >> >>>> wrote:
> >> >>>> >>
> >> >>>> >> > First start by lengthening your soft and hard commit intervals
> >> >>>> >> > substantially. Start with 60000 and work backwards I'd say.
> >> >>>> >> >
> >> >>>> >> > Ramkumar has tuned the heck out of his installation to get the
> >> commit
> >> >>>> >> > intervals to be that short ;).
> >> >>>> >> >
> >> >>>> >> > I'm betting that you'll see your RAM usage go way down, but
> >> that' s a
> >> >>>> >> > guess until you test.
> >> >>>> >> >
> >> >>>> >> > Best,
> >> >>>> >> > Erick
> >> >>>> >> >
> >> >>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
> >> >>>> nitinmlvya@gmail.com>
> >> >>>> >> > wrote:
> >> >>>> >> > > Hi Erick,
> >> >>>> >> > >             You are saying correct. Something,
> **"overlapping
> >> >>>> >> searchers"
> >> >>>> >> > > warning messages** are coming in logs.
> >> >>>> >> > > **numDocs numbers** are changing when documents are adding
> at
> >> the
> >> >>>> time
> >> >>>> >> of
> >> >>>> >> > > indexing.
> >> >>>> >> > > Any help?
> >> >>>> >> > >
> >> >>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
> >> >>>> >> > erickerickson@gmail.com>
> >> >>>> >> > > wrote:
> >> >>>> >> > >
> >> >>>> >> > >> First, the soft commit interval is very short. Very, very,
> >> very,
> >> >>>> very
> >> >>>> >> > >> short. 300ms is
> >> >>>> >> > >> just short of insane unless it's a typo ;).
> >> >>>> >> > >>
> >> >>>> >> > >> Here's a long background:
> >> >>>> >> > >>
> >> >>>> >> > >>
> >> >>>> >> >
> >> >>>> >>
> >> >>>>
> >>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >> >>>> >> > >>
> >> >>>> >> > >> But the short form is that you're opening searchers every
> 300
> >> ms.
> >> >>>> The
> >> >>>> >> > >> hard commit is better,
> >> >>>> >> > >> but every 3 seconds is still far too short IMO. I'd start
> with
> >> >>>> soft
> >> >>>> >> > >> commits of 60000 and hard
> >> >>>> >> > >> commits of 60000 (60 seconds), meaning that you're going to
> >> have
> >> >>>> to
> >> >>>> >> > >> wait 1 minute for
> >> >>>> >> > >> docs to show up unless you explicitly commit.
> >> >>>> >> > >>
> >> >>>> >> > >> You're throwing away all the caches configured in
> >> solrconfig.xml
> >> >>>> more
> >> >>>> >> > >> than 3 times a second,
> >> >>>> >> > >> executing autowarming, etc, etc, etc....
> >> >>>> >> > >>
> >> >>>> >> > >> Changing these to longer intervals might cure the problem,
> >> but if
> >> >>>> not
> >> >>>> >> > >> then, as Hoss would
> >> >>>> >> > >> say, "details matter". I suspect you're also seeing
> >> "overlapping
> >> >>>> >> > >> searchers" warning messages
> >> >>>> >> > >> in your log, and it;s _possible_ that what's happening is
> that
> >> >>>> you're
> >> >>>> >> > >> just exceeding the
> >> >>>> >> > >> max warming searchers and never opening a new searcher with
> >> the
> >> >>>> >> > >> newly-indexed documents.
> >> >>>> >> > >> But that's a total shot in the dark.
> >> >>>> >> > >>
> >> >>>> >> > >> How are you looking for docs (and not finding them)? Does
> the
> >> >>>> numDocs
> >> >>>> >> > >> number in
> >> >>>> >> > >> the solr admin screen change?
> >> >>>> >> > >>
> >> >>>> >> > >>
> >> >>>> >> > >> Best,
> >> >>>> >> > >> Erick
> >> >>>> >> > >>
> >> >>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
> >> >>>> nitinmlvya@gmail.com
> >> >>>> >> >
> >> >>>> >> > >> wrote:
> >> >>>> >> > >> > Hi Alexandre,
> >> >>>> >> > >> >
> >> >>>> >> > >> >
> >> >>>> >> > >> > *Hard Commit* is :
> >> >>>> >> > >> >
> >> >>>> >> > >> >      <autoCommit>
> >> >>>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
> >> >>>> >> > >> >        <openSearcher>false</openSearcher>
> >> >>>> >> > >> >      </autoCommit>
> >> >>>> >> > >> >
> >> >>>> >> > >> > *Soft Commit* is :
> >> >>>> >> > >> >
> >> >>>> >> > >> > <autoSoftCommit>
> >> >>>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
> >> >>>> >> > >> > </autoSoftCommit>
> >> >>>> >> > >> >
> >> >>>> >> > >> > And I am committing 20000 documents each time.
> >> >>>> >> > >> > Is it good config for committing?
> >> >>>> >> > >> > Or I am good something wrong ?
> >> >>>> >> > >> >
> >> >>>> >> > >> >
> >> >>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
> >> >>>> >> > >> arafalov@gmail.com>
> >> >>>> >> > >> > wrote:
> >> >>>> >> > >> >
> >> >>>> >> > >> >> What's your commit strategy? Explicit commits? Soft
> >> >>>> commits/hard
> >> >>>> >> > >> >> commits (in solrconfig.xml)?
> >> >>>> >> > >> >>
> >> >>>> >> > >> >> Regards,
> >> >>>> >> > >> >>    Alex.
> >> >>>> >> > >> >> ----
> >> >>>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
> >> >>>> newsletter:
> >> >>>> >> > >> >> http://www.solr-start.com/
> >> >>>> >> > >> >>
> >> >>>> >> > >> >>
> >> >>>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <
> >> nitinmlvya@gmail.com
> >> >>>> >
> >> >>>> >> > wrote:
> >> >>>> >> > >> >> > Hello,
> >> >>>> >> > >> >> >           I have written a python script to do 20000
> >> >>>> documents
> >> >>>> >> > >> indexing
> >> >>>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
> >> >>>> >> > >> >> > When I started indexing, at that time 15 GB RAM was
> >> freed.
> >> >>>> While
> >> >>>> >> > >> >> indexing,
> >> >>>> >> > >> >> > all RAM is consumed but **not** a single document is
> >> >>>> indexed. Why
> >> >>>> >> > so?
> >> >>>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service
> >> >>>> Unavailable*
> >> >>>> >> in
> >> >>>> >> > >> python
> >> >>>> >> > >> >> > script.
> >> >>>> >> > >> >> > I think it is due to heavy load on Zookeeper by which
> all
> >> >>>> nodes
> >> >>>> >> > went
> >> >>>> >> > >> >> down.
> >> >>>> >> > >> >> > I am not sure about that. Any help please..
> >> >>>> >> > >> >> > Or anything else is happening..
> >> >>>> >> > >> >> > And how to overcome this issue.
> >> >>>> >> > >> >> > Please assist me towards right path.
> >> >>>> >> > >> >> > Thanks..
> >> >>>> >> > >> >> >
> >> >>>> >> > >> >> > Warm Regards,
> >> >>>> >> > >> >> > Nitin Solanki
> >> >>>> >> > >> >>
> >> >>>> >> > >>
> >> >>>> >> >
> >> >>>> >>
> >> >>>>
> >> >>>
> >> >>>
> >>
>

Re: Whole RAM consumed while Indexing.

Posted by Erick Erickson <er...@gmail.com>.
That or even hard commit to 60 seconds. It's strictly a matter of how often
you want to close old segments and open new ones.

On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki <ni...@gmail.com> wrote:
> Hi Erick..
>               I read your Article. Really nice...
> Inside that you said that for bulk indexing. Set soft commit = 10 mins and
> hard commit = 15sec. Is it also okay for my scenario?
>
> On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> bq: As you said, do commits after 60000 seconds
>>
>> No, No, No. I'm NOT saying 60000 seconds! That time is in _milliseconds_
>> as Shawn said. So setting it to 60000 is every minute.
>>
>> From solrconfig.xml, conveniently located immediately above the
>> <autoCommit> tag:
>>
>> maxTime - Maximum amount of time in ms that is allowed to pass since a
>> document was added before automatically triggering a new commit.
>>
>> Also, a lot of answers to soft and hard commits is here as I pointed
>> out before, did you read it?
>>
>>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best
>> Erick
>>
>> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
>> <ar...@gmail.com> wrote:
>> > Probably merged somewhat differently with some terms indexes repeating
>> > between segments. Check the number of segments in data directory.And
>> > do search for *:* and make sure both do have the same document counts.
>> >
>> > Also, In all these discussions, you still haven't answered about how
>> > fast after indexing you want to _search_? Because, if you are not
>> > actually searching while committing, you could even index on a
>> > completely separate server (e.g. a faster one) and swap (or alias)
>> > index in afterwards. Unless, of course, I missed it, it's a lot of
>> > emails in a very short window of time.
>> >
>> > Regards,
>> >    Alex.
>> >
>> > ----
>> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> > http://www.solr-start.com/
>> >
>> >
>> > On 18 March 2015 at 12:09, Nitin Solanki <ni...@gmail.com> wrote:
>> >> When I kept my configuration to 300 for soft commit and 3000 for hard
>> >> commit and indexed some amount of data, I got the data size of the whole
>> >> index to be 6GB after completing the indexing.
>> >>
>> >> When I changed the configuration to 60000 for soft commit and 60000 for
>> >> hard commit and indexed same data then I got the data size of the whole
>> >> index to be 5GB after completing the indexing.
>> >>
>> >> But the number of documents in the both scenario were same. I am
>> wondering
>> >> how that can be possible?
>> >>
>> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <ni...@gmail.com>
>> wrote:
>> >>
>> >>> Hi Erick,
>> >>>              I am just saying. I want to be sure on commits
>> difference..
>> >>> What if I do frequent commits or not? And why I am saying that I need
>> to
>> >>> commit things so very quickly because I have to index 28GB of data
>> which
>> >>> takes 7-8 hours(frequent commits).
>> >>> As you said, do commits after 60000 seconds then it will be more
>> expensive.
>> >>> If I don't encounter with **"overlapping searchers" warning messages**
>> >>> then I feel it seems to be okay. Is it?
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
>> erickerickson@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Don't do it. Really, why do you want to do this? This seems like
>> >>>> an "XY" problem, you haven't explained why you need to commit
>> >>>> things so very quickly.
>> >>>>
>> >>>> I suspect you haven't tried _searching_ while committing at such
>> >>>> a rate, and you might as well turn all your top-level caches off
>> >>>> in solrconfig.xml since they won't be useful at all.
>> >>>>
>> >>>> Best,
>> >>>> Erick
>> >>>>
>> >>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <ni...@gmail.com>
>> >>>> wrote:
>> >>>> > Hi,
>> >>>> >        If I do very very fast indexing(softcommit = 300 and
>> hardcommit =
>> >>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000)
>> as
>> >>>> you
>> >>>> > both said. Will fast indexing fail to index some data?
>> >>>> > Any suggestion on this ?
>> >>>> >
>> >>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
>> >>>> > andyetitmoves@gmail.com> wrote:
>> >>>> >
>> >>>> >> Yes, and doing so is painful and takes lots of people and hardware
>> >>>> >> resources to get there for large amounts of data and queries :)
>> >>>> >>
>> >>>> >> As Erick says, work backwards from 60s and first establish how
>> high the
>> >>>> >> commit interval can be to satisfy your use case..
>> >>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com>
>> >>>> wrote:
>> >>>> >>
>> >>>> >> > First start by lengthening your soft and hard commit intervals
>> >>>> >> > substantially. Start with 60000 and work backwards I'd say.
>> >>>> >> >
>> >>>> >> > Ramkumar has tuned the heck out of his installation to get the
>> commit
>> >>>> >> > intervals to be that short ;).
>> >>>> >> >
>> >>>> >> > I'm betting that you'll see your RAM usage go way down, but
>> that' s a
>> >>>> >> > guess until you test.
>> >>>> >> >
>> >>>> >> > Best,
>> >>>> >> > Erick
>> >>>> >> >
>> >>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
>> >>>> nitinmlvya@gmail.com>
>> >>>> >> > wrote:
>> >>>> >> > > Hi Erick,
>> >>>> >> > >             You are saying correct. Something, **"overlapping
>> >>>> >> searchers"
>> >>>> >> > > warning messages** are coming in logs.
>> >>>> >> > > **numDocs numbers** are changing when documents are adding at
>> the
>> >>>> time
>> >>>> >> of
>> >>>> >> > > indexing.
>> >>>> >> > > Any help?
>> >>>> >> > >
>> >>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
>> >>>> >> > erickerickson@gmail.com>
>> >>>> >> > > wrote:
>> >>>> >> > >
>> >>>> >> > >> First, the soft commit interval is very short. Very, very,
>> very,
>> >>>> very
>> >>>> >> > >> short. 300ms is
>> >>>> >> > >> just short of insane unless it's a typo ;).
>> >>>> >> > >>
>> >>>> >> > >> Here's a long background:
>> >>>> >> > >>
>> >>>> >> > >>
>> >>>> >> >
>> >>>> >>
>> >>>>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >>>> >> > >>
>> >>>> >> > >> But the short form is that you're opening searchers every 300
>> ms.
>> >>>> The
>> >>>> >> > >> hard commit is better,
>> >>>> >> > >> but every 3 seconds is still far too short IMO. I'd start with
>> >>>> soft
>> >>>> >> > >> commits of 60000 and hard
>> >>>> >> > >> commits of 60000 (60 seconds), meaning that you're going to
>> have
>> >>>> to
>> >>>> >> > >> wait 1 minute for
>> >>>> >> > >> docs to show up unless you explicitly commit.
>> >>>> >> > >>
>> >>>> >> > >> You're throwing away all the caches configured in
>> solrconfig.xml
>> >>>> more
>> >>>> >> > >> than 3 times a second,
>> >>>> >> > >> executing autowarming, etc, etc, etc....
>> >>>> >> > >>
>> >>>> >> > >> Changing these to longer intervals might cure the problem,
>> but if
>> >>>> not
>> >>>> >> > >> then, as Hoss would
>> >>>> >> > >> say, "details matter". I suspect you're also seeing
>> "overlapping
>> >>>> >> > >> searchers" warning messages
>> >>>> >> > >> in your log, and it;s _possible_ that what's happening is that
>> >>>> you're
>> >>>> >> > >> just exceeding the
>> >>>> >> > >> max warming searchers and never opening a new searcher with
>> the
>> >>>> >> > >> newly-indexed documents.
>> >>>> >> > >> But that's a total shot in the dark.
>> >>>> >> > >>
>> >>>> >> > >> How are you looking for docs (and not finding them)? Does the
>> >>>> numDocs
>> >>>> >> > >> number in
>> >>>> >> > >> the solr admin screen change?
>> >>>> >> > >>
>> >>>> >> > >>
>> >>>> >> > >> Best,
>> >>>> >> > >> Erick
>> >>>> >> > >>
>> >>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
>> >>>> nitinmlvya@gmail.com
>> >>>> >> >
>> >>>> >> > >> wrote:
>> >>>> >> > >> > Hi Alexandre,
>> >>>> >> > >> >
>> >>>> >> > >> >
>> >>>> >> > >> > *Hard Commit* is :
>> >>>> >> > >> >
>> >>>> >> > >> >      <autoCommit>
>> >>>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>> >>>> >> > >> >        <openSearcher>false</openSearcher>
>> >>>> >> > >> >      </autoCommit>
>> >>>> >> > >> >
>> >>>> >> > >> > *Soft Commit* is :
>> >>>> >> > >> >
>> >>>> >> > >> > <autoSoftCommit>
>> >>>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>> >>>> >> > >> > </autoSoftCommit>
>> >>>> >> > >> >
>> >>>> >> > >> > And I am committing 20000 documents each time.
>> >>>> >> > >> > Is it good config for committing?
>> >>>> >> > >> > Or I am good something wrong ?
>> >>>> >> > >> >
>> >>>> >> > >> >
>> >>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
>> >>>> >> > >> arafalov@gmail.com>
>> >>>> >> > >> > wrote:
>> >>>> >> > >> >
>> >>>> >> > >> >> What's your commit strategy? Explicit commits? Soft
>> >>>> commits/hard
>> >>>> >> > >> >> commits (in solrconfig.xml)?
>> >>>> >> > >> >>
>> >>>> >> > >> >> Regards,
>> >>>> >> > >> >>    Alex.
>> >>>> >> > >> >> ----
>> >>>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
>> >>>> newsletter:
>> >>>> >> > >> >> http://www.solr-start.com/
>> >>>> >> > >> >>
>> >>>> >> > >> >>
>> >>>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <
>> nitinmlvya@gmail.com
>> >>>> >
>> >>>> >> > wrote:
>> >>>> >> > >> >> > Hello,
>> >>>> >> > >> >> >           I have written a python script to do 20000
>> >>>> documents
>> >>>> >> > >> indexing
>> >>>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
>> >>>> >> > >> >> > When I started indexing, at that time 15 GB RAM was
>> freed.
>> >>>> While
>> >>>> >> > >> >> indexing,
>> >>>> >> > >> >> > all RAM is consumed but **not** a single document is
>> >>>> indexed. Why
>> >>>> >> > so?
>> >>>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service
>> >>>> Unavailable*
>> >>>> >> in
>> >>>> >> > >> python
>> >>>> >> > >> >> > script.
>> >>>> >> > >> >> > I think it is due to heavy load on Zookeeper by which all
>> >>>> nodes
>> >>>> >> > went
>> >>>> >> > >> >> down.
>> >>>> >> > >> >> > I am not sure about that. Any help please..
>> >>>> >> > >> >> > Or anything else is happening..
>> >>>> >> > >> >> > And how to overcome this issue.
>> >>>> >> > >> >> > Please assist me towards right path.
>> >>>> >> > >> >> > Thanks..
>> >>>> >> > >> >> >
>> >>>> >> > >> >> > Warm Regards,
>> >>>> >> > >> >> > Nitin Solanki
>> >>>> >> > >> >>
>> >>>> >> > >>
>> >>>> >> >
>> >>>> >>
>> >>>>
>> >>>
>> >>>
>>

Re: Whole RAM consumed while Indexing.

Posted by Nitin Solanki <ni...@gmail.com>.
Hi Erick..
              I read your Article. Really nice...
Inside that you said that for bulk indexing. Set soft commit = 10 mins and
hard commit = 15sec. Is it also okay for my scenario?

On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson <er...@gmail.com>
wrote:

> bq: As you said, do commits after 60000 seconds
>
> No, No, No. I'm NOT saying 60000 seconds! That time is in _milliseconds_
> as Shawn said. So setting it to 60000 is every minute.
>
> From solrconfig.xml, conveniently located immediately above the
> <autoCommit> tag:
>
> maxTime - Maximum amount of time in ms that is allowed to pass since a
> document was added before automatically triggering a new commit.
>
> Also, a lot of answers to soft and hard commits is here as I pointed
> out before, did you read it?
>
>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best
> Erick
>
> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
> <ar...@gmail.com> wrote:
> > Probably merged somewhat differently with some terms indexes repeating
> > between segments. Check the number of segments in data directory.And
> > do search for *:* and make sure both do have the same document counts.
> >
> > Also, In all these discussions, you still haven't answered about how
> > fast after indexing you want to _search_? Because, if you are not
> > actually searching while committing, you could even index on a
> > completely separate server (e.g. a faster one) and swap (or alias)
> > index in afterwards. Unless, of course, I missed it, it's a lot of
> > emails in a very short window of time.
> >
> > Regards,
> >    Alex.
> >
> > ----
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 18 March 2015 at 12:09, Nitin Solanki <ni...@gmail.com> wrote:
> >> When I kept my configuration to 300 for soft commit and 3000 for hard
> >> commit and indexed some amount of data, I got the data size of the whole
> >> index to be 6GB after completing the indexing.
> >>
> >> When I changed the configuration to 60000 for soft commit and 60000 for
> >> hard commit and indexed same data then I got the data size of the whole
> >> index to be 5GB after completing the indexing.
> >>
> >> But the number of documents in the both scenario were same. I am
> wondering
> >> how that can be possible?
> >>
> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <ni...@gmail.com>
> wrote:
> >>
> >>> Hi Erick,
> >>>              I am just saying. I want to be sure on commits
> difference..
> >>> What if I do frequent commits or not? And why I am saying that I need
> to
> >>> commit things so very quickly because I have to index 28GB of data
> which
> >>> takes 7-8 hours(frequent commits).
> >>> As you said, do commits after 60000 seconds then it will be more
> expensive.
> >>> If I don't encounter with **"overlapping searchers" warning messages**
> >>> then I feel it seems to be okay. Is it?
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
> erickerickson@gmail.com>
> >>> wrote:
> >>>
> >>>> Don't do it. Really, why do you want to do this? This seems like
> >>>> an "XY" problem, you haven't explained why you need to commit
> >>>> things so very quickly.
> >>>>
> >>>> I suspect you haven't tried _searching_ while committing at such
> >>>> a rate, and you might as well turn all your top-level caches off
> >>>> in solrconfig.xml since they won't be useful at all.
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <ni...@gmail.com>
> >>>> wrote:
> >>>> > Hi,
> >>>> >        If I do very very fast indexing(softcommit = 300 and
> hardcommit =
> >>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000)
> as
> >>>> you
> >>>> > both said. Will fast indexing fail to index some data?
> >>>> > Any suggestion on this ?
> >>>> >
> >>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
> >>>> > andyetitmoves@gmail.com> wrote:
> >>>> >
> >>>> >> Yes, and doing so is painful and takes lots of people and hardware
> >>>> >> resources to get there for large amounts of data and queries :)
> >>>> >>
> >>>> >> As Erick says, work backwards from 60s and first establish how
> high the
> >>>> >> commit interval can be to satisfy your use case..
> >>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com>
> >>>> wrote:
> >>>> >>
> >>>> >> > First start by lengthening your soft and hard commit intervals
> >>>> >> > substantially. Start with 60000 and work backwards I'd say.
> >>>> >> >
> >>>> >> > Ramkumar has tuned the heck out of his installation to get the
> commit
> >>>> >> > intervals to be that short ;).
> >>>> >> >
> >>>> >> > I'm betting that you'll see your RAM usage go way down, but
> that' s a
> >>>> >> > guess until you test.
> >>>> >> >
> >>>> >> > Best,
> >>>> >> > Erick
> >>>> >> >
> >>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
> >>>> nitinmlvya@gmail.com>
> >>>> >> > wrote:
> >>>> >> > > Hi Erick,
> >>>> >> > >             You are saying correct. Something, **"overlapping
> >>>> >> searchers"
> >>>> >> > > warning messages** are coming in logs.
> >>>> >> > > **numDocs numbers** are changing when documents are adding at
> the
> >>>> time
> >>>> >> of
> >>>> >> > > indexing.
> >>>> >> > > Any help?
> >>>> >> > >
> >>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
> >>>> >> > erickerickson@gmail.com>
> >>>> >> > > wrote:
> >>>> >> > >
> >>>> >> > >> First, the soft commit interval is very short. Very, very,
> very,
> >>>> very
> >>>> >> > >> short. 300ms is
> >>>> >> > >> just short of insane unless it's a typo ;).
> >>>> >> > >>
> >>>> >> > >> Here's a long background:
> >>>> >> > >>
> >>>> >> > >>
> >>>> >> >
> >>>> >>
> >>>>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>>> >> > >>
> >>>> >> > >> But the short form is that you're opening searchers every 300
> ms.
> >>>> The
> >>>> >> > >> hard commit is better,
> >>>> >> > >> but every 3 seconds is still far too short IMO. I'd start with
> >>>> soft
> >>>> >> > >> commits of 60000 and hard
> >>>> >> > >> commits of 60000 (60 seconds), meaning that you're going to
> have
> >>>> to
> >>>> >> > >> wait 1 minute for
> >>>> >> > >> docs to show up unless you explicitly commit.
> >>>> >> > >>
> >>>> >> > >> You're throwing away all the caches configured in
> solrconfig.xml
> >>>> more
> >>>> >> > >> than 3 times a second,
> >>>> >> > >> executing autowarming, etc, etc, etc....
> >>>> >> > >>
> >>>> >> > >> Changing these to longer intervals might cure the problem,
> but if
> >>>> not
> >>>> >> > >> then, as Hoss would
> >>>> >> > >> say, "details matter". I suspect you're also seeing
> "overlapping
> >>>> >> > >> searchers" warning messages
> >>>> >> > >> in your log, and it;s _possible_ that what's happening is that
> >>>> you're
> >>>> >> > >> just exceeding the
> >>>> >> > >> max warming searchers and never opening a new searcher with
> the
> >>>> >> > >> newly-indexed documents.
> >>>> >> > >> But that's a total shot in the dark.
> >>>> >> > >>
> >>>> >> > >> How are you looking for docs (and not finding them)? Does the
> >>>> numDocs
> >>>> >> > >> number in
> >>>> >> > >> the solr admin screen change?
> >>>> >> > >>
> >>>> >> > >>
> >>>> >> > >> Best,
> >>>> >> > >> Erick
> >>>> >> > >>
> >>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
> >>>> nitinmlvya@gmail.com
> >>>> >> >
> >>>> >> > >> wrote:
> >>>> >> > >> > Hi Alexandre,
> >>>> >> > >> >
> >>>> >> > >> >
> >>>> >> > >> > *Hard Commit* is :
> >>>> >> > >> >
> >>>> >> > >> >      <autoCommit>
> >>>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
> >>>> >> > >> >        <openSearcher>false</openSearcher>
> >>>> >> > >> >      </autoCommit>
> >>>> >> > >> >
> >>>> >> > >> > *Soft Commit* is :
> >>>> >> > >> >
> >>>> >> > >> > <autoSoftCommit>
> >>>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
> >>>> >> > >> > </autoSoftCommit>
> >>>> >> > >> >
> >>>> >> > >> > And I am committing 20000 documents each time.
> >>>> >> > >> > Is it good config for committing?
> >>>> >> > >> > Or I am good something wrong ?
> >>>> >> > >> >
> >>>> >> > >> >
> >>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
> >>>> >> > >> arafalov@gmail.com>
> >>>> >> > >> > wrote:
> >>>> >> > >> >
> >>>> >> > >> >> What's your commit strategy? Explicit commits? Soft
> >>>> commits/hard
> >>>> >> > >> >> commits (in solrconfig.xml)?
> >>>> >> > >> >>
> >>>> >> > >> >> Regards,
> >>>> >> > >> >>    Alex.
> >>>> >> > >> >> ----
> >>>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
> >>>> newsletter:
> >>>> >> > >> >> http://www.solr-start.com/
> >>>> >> > >> >>
> >>>> >> > >> >>
> >>>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <
> nitinmlvya@gmail.com
> >>>> >
> >>>> >> > wrote:
> >>>> >> > >> >> > Hello,
> >>>> >> > >> >> >           I have written a python script to do 20000
> >>>> documents
> >>>> >> > >> indexing
> >>>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
> >>>> >> > >> >> > When I started indexing, at that time 15 GB RAM was
> freed.
> >>>> While
> >>>> >> > >> >> indexing,
> >>>> >> > >> >> > all RAM is consumed but **not** a single document is
> >>>> indexed. Why
> >>>> >> > so?
> >>>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service
> >>>> Unavailable*
> >>>> >> in
> >>>> >> > >> python
> >>>> >> > >> >> > script.
> >>>> >> > >> >> > I think it is due to heavy load on Zookeeper by which all
> >>>> nodes
> >>>> >> > went
> >>>> >> > >> >> down.
> >>>> >> > >> >> > I am not sure about that. Any help please..
> >>>> >> > >> >> > Or anything else is happening..
> >>>> >> > >> >> > And how to overcome this issue.
> >>>> >> > >> >> > Please assist me towards right path.
> >>>> >> > >> >> > Thanks..
> >>>> >> > >> >> >
> >>>> >> > >> >> > Warm Regards,
> >>>> >> > >> >> > Nitin Solanki
> >>>> >> > >> >>
> >>>> >> > >>
> >>>> >> >
> >>>> >>
> >>>>
> >>>
> >>>
>

Re: Whole RAM consumed while Indexing.

Posted by Erick Erickson <er...@gmail.com>.
bq: As you said, do commits after 60000 seconds

No, No, No. I'm NOT saying 60000 seconds! That time is in _milliseconds_
as Shawn said. So setting it to 60000 is every minute.

>From solrconfig.xml, conveniently located immediately above the
<autoCommit> tag:

maxTime - Maximum amount of time in ms that is allowed to pass since a
document was added before automatically triggering a new commit.

Also, a lot of answers to soft and hard commits is here as I pointed
out before, did you read it?

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best
Erick

On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
<ar...@gmail.com> wrote:
> Probably merged somewhat differently with some terms indexes repeating
> between segments. Check the number of segments in data directory.And
> do search for *:* and make sure both do have the same document counts.
>
> Also, In all these discussions, you still haven't answered about how
> fast after indexing you want to _search_? Because, if you are not
> actually searching while committing, you could even index on a
> completely separate server (e.g. a faster one) and swap (or alias)
> index in afterwards. Unless, of course, I missed it, it's a lot of
> emails in a very short window of time.
>
> Regards,
>    Alex.
>
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 18 March 2015 at 12:09, Nitin Solanki <ni...@gmail.com> wrote:
>> When I kept my configuration to 300 for soft commit and 3000 for hard
>> commit and indexed some amount of data, I got the data size of the whole
>> index to be 6GB after completing the indexing.
>>
>> When I changed the configuration to 60000 for soft commit and 60000 for
>> hard commit and indexed same data then I got the data size of the whole
>> index to be 5GB after completing the indexing.
>>
>> But the number of documents in the both scenario were same. I am wondering
>> how that can be possible?
>>
>> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <ni...@gmail.com> wrote:
>>
>>> Hi Erick,
>>>              I am just saying. I want to be sure on commits difference..
>>> What if I do frequent commits or not? And why I am saying that I need to
>>> commit things so very quickly because I have to index 28GB of data which
>>> takes 7-8 hours(frequent commits).
>>> As you said, do commits after 60000 seconds then it will be more expensive.
>>> If I don't encounter with **"overlapping searchers" warning messages**
>>> then I feel it seems to be okay. Is it?
>>>
>>>
>>>
>>>
>>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <er...@gmail.com>
>>> wrote:
>>>
>>>> Don't do it. Really, why do you want to do this? This seems like
>>>> an "XY" problem, you haven't explained why you need to commit
>>>> things so very quickly.
>>>>
>>>> I suspect you haven't tried _searching_ while committing at such
>>>> a rate, and you might as well turn all your top-level caches off
>>>> in solrconfig.xml since they won't be useful at all.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <ni...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >        If I do very very fast indexing(softcommit = 300 and hardcommit =
>>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000) as
>>>> you
>>>> > both said. Will fast indexing fail to index some data?
>>>> > Any suggestion on this ?
>>>> >
>>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
>>>> > andyetitmoves@gmail.com> wrote:
>>>> >
>>>> >> Yes, and doing so is painful and takes lots of people and hardware
>>>> >> resources to get there for large amounts of data and queries :)
>>>> >>
>>>> >> As Erick says, work backwards from 60s and first establish how high the
>>>> >> commit interval can be to satisfy your use case..
>>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> > First start by lengthening your soft and hard commit intervals
>>>> >> > substantially. Start with 60000 and work backwards I'd say.
>>>> >> >
>>>> >> > Ramkumar has tuned the heck out of his installation to get the commit
>>>> >> > intervals to be that short ;).
>>>> >> >
>>>> >> > I'm betting that you'll see your RAM usage go way down, but that' s a
>>>> >> > guess until you test.
>>>> >> >
>>>> >> > Best,
>>>> >> > Erick
>>>> >> >
>>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
>>>> nitinmlvya@gmail.com>
>>>> >> > wrote:
>>>> >> > > Hi Erick,
>>>> >> > >             You are saying correct. Something, **"overlapping
>>>> >> searchers"
>>>> >> > > warning messages** are coming in logs.
>>>> >> > > **numDocs numbers** are changing when documents are adding at the
>>>> time
>>>> >> of
>>>> >> > > indexing.
>>>> >> > > Any help?
>>>> >> > >
>>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
>>>> >> > erickerickson@gmail.com>
>>>> >> > > wrote:
>>>> >> > >
>>>> >> > >> First, the soft commit interval is very short. Very, very, very,
>>>> very
>>>> >> > >> short. 300ms is
>>>> >> > >> just short of insane unless it's a typo ;).
>>>> >> > >>
>>>> >> > >> Here's a long background:
>>>> >> > >>
>>>> >> > >>
>>>> >> >
>>>> >>
>>>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>>> >> > >>
>>>> >> > >> But the short form is that you're opening searchers every 300 ms.
>>>> The
>>>> >> > >> hard commit is better,
>>>> >> > >> but every 3 seconds is still far too short IMO. I'd start with
>>>> soft
>>>> >> > >> commits of 60000 and hard
>>>> >> > >> commits of 60000 (60 seconds), meaning that you're going to have
>>>> to
>>>> >> > >> wait 1 minute for
>>>> >> > >> docs to show up unless you explicitly commit.
>>>> >> > >>
>>>> >> > >> You're throwing away all the caches configured in solrconfig.xml
>>>> more
>>>> >> > >> than 3 times a second,
>>>> >> > >> executing autowarming, etc, etc, etc....
>>>> >> > >>
>>>> >> > >> Changing these to longer intervals might cure the problem, but if
>>>> not
>>>> >> > >> then, as Hoss would
>>>> >> > >> say, "details matter". I suspect you're also seeing "overlapping
>>>> >> > >> searchers" warning messages
>>>> >> > >> in your log, and it;s _possible_ that what's happening is that
>>>> you're
>>>> >> > >> just exceeding the
>>>> >> > >> max warming searchers and never opening a new searcher with the
>>>> >> > >> newly-indexed documents.
>>>> >> > >> But that's a total shot in the dark.
>>>> >> > >>
>>>> >> > >> How are you looking for docs (and not finding them)? Does the
>>>> numDocs
>>>> >> > >> number in
>>>> >> > >> the solr admin screen change?
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> Best,
>>>> >> > >> Erick
>>>> >> > >>
>>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
>>>> nitinmlvya@gmail.com
>>>> >> >
>>>> >> > >> wrote:
>>>> >> > >> > Hi Alexandre,
>>>> >> > >> >
>>>> >> > >> >
>>>> >> > >> > *Hard Commit* is :
>>>> >> > >> >
>>>> >> > >> >      <autoCommit>
>>>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>>>> >> > >> >        <openSearcher>false</openSearcher>
>>>> >> > >> >      </autoCommit>
>>>> >> > >> >
>>>> >> > >> > *Soft Commit* is :
>>>> >> > >> >
>>>> >> > >> > <autoSoftCommit>
>>>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>>>> >> > >> > </autoSoftCommit>
>>>> >> > >> >
>>>> >> > >> > And I am committing 20000 documents each time.
>>>> >> > >> > Is it good config for committing?
>>>> >> > >> > Or I am good something wrong ?
>>>> >> > >> >
>>>> >> > >> >
>>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
>>>> >> > >> arafalov@gmail.com>
>>>> >> > >> > wrote:
>>>> >> > >> >
>>>> >> > >> >> What's your commit strategy? Explicit commits? Soft
>>>> commits/hard
>>>> >> > >> >> commits (in solrconfig.xml)?
>>>> >> > >> >>
>>>> >> > >> >> Regards,
>>>> >> > >> >>    Alex.
>>>> >> > >> >> ----
>>>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
>>>> newsletter:
>>>> >> > >> >> http://www.solr-start.com/
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <nitinmlvya@gmail.com
>>>> >
>>>> >> > wrote:
>>>> >> > >> >> > Hello,
>>>> >> > >> >> >           I have written a python script to do 20000
>>>> documents
>>>> >> > >> indexing
>>>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
>>>> >> > >> >> > When I started indexing, at that time 15 GB RAM was freed.
>>>> While
>>>> >> > >> >> indexing,
>>>> >> > >> >> > all RAM is consumed but **not** a single document is
>>>> indexed. Why
>>>> >> > so?
>>>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service
>>>> Unavailable*
>>>> >> in
>>>> >> > >> python
>>>> >> > >> >> > script.
>>>> >> > >> >> > I think it is due to heavy load on Zookeeper by which all
>>>> nodes
>>>> >> > went
>>>> >> > >> >> down.
>>>> >> > >> >> > I am not sure about that. Any help please..
>>>> >> > >> >> > Or anything else is happening..
>>>> >> > >> >> > And how to overcome this issue.
>>>> >> > >> >> > Please assist me towards right path.
>>>> >> > >> >> > Thanks..
>>>> >> > >> >> >
>>>> >> > >> >> > Warm Regards,
>>>> >> > >> >> > Nitin Solanki
>>>> >> > >> >>
>>>> >> > >>
>>>> >> >
>>>> >>
>>>>
>>>
>>>

Re: Whole RAM consumed while Indexing.

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Probably merged somewhat differently with some terms indexes repeating
between segments. Check the number of segments in data directory.And
do search for *:* and make sure both do have the same document counts.

Also, In all these discussions, you still haven't answered about how
fast after indexing you want to _search_? Because, if you are not
actually searching while committing, you could even index on a
completely separate server (e.g. a faster one) and swap (or alias)
index in afterwards. Unless, of course, I missed it, it's a lot of
emails in a very short window of time.

Regards,
   Alex.

----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 18 March 2015 at 12:09, Nitin Solanki <ni...@gmail.com> wrote:
> When I kept my configuration to 300 for soft commit and 3000 for hard
> commit and indexed some amount of data, I got the data size of the whole
> index to be 6GB after completing the indexing.
>
> When I changed the configuration to 60000 for soft commit and 60000 for
> hard commit and indexed same data then I got the data size of the whole
> index to be 5GB after completing the indexing.
>
> But the number of documents in the both scenario were same. I am wondering
> how that can be possible?
>
> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <ni...@gmail.com> wrote:
>
>> Hi Erick,
>>              I am just saying. I want to be sure on commits difference..
>> What if I do frequent commits or not? And why I am saying that I need to
>> commit things so very quickly because I have to index 28GB of data which
>> takes 7-8 hours(frequent commits).
>> As you said, do commits after 60000 seconds then it will be more expensive.
>> If I don't encounter with **"overlapping searchers" warning messages**
>> then I feel it seems to be okay. Is it?
>>
>>
>>
>>
>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <er...@gmail.com>
>> wrote:
>>
>>> Don't do it. Really, why do you want to do this? This seems like
>>> an "XY" problem, you haven't explained why you need to commit
>>> things so very quickly.
>>>
>>> I suspect you haven't tried _searching_ while committing at such
>>> a rate, and you might as well turn all your top-level caches off
>>> in solrconfig.xml since they won't be useful at all.
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <ni...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >        If I do very very fast indexing(softcommit = 300 and hardcommit =
>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000) as
>>> you
>>> > both said. Will fast indexing fail to index some data?
>>> > Any suggestion on this ?
>>> >
>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
>>> > andyetitmoves@gmail.com> wrote:
>>> >
>>> >> Yes, and doing so is painful and takes lots of people and hardware
>>> >> resources to get there for large amounts of data and queries :)
>>> >>
>>> >> As Erick says, work backwards from 60s and first establish how high the
>>> >> commit interval can be to satisfy your use case..
>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com>
>>> wrote:
>>> >>
>>> >> > First start by lengthening your soft and hard commit intervals
>>> >> > substantially. Start with 60000 and work backwards I'd say.
>>> >> >
>>> >> > Ramkumar has tuned the heck out of his installation to get the commit
>>> >> > intervals to be that short ;).
>>> >> >
>>> >> > I'm betting that you'll see your RAM usage go way down, but that' s a
>>> >> > guess until you test.
>>> >> >
>>> >> > Best,
>>> >> > Erick
>>> >> >
>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
>>> nitinmlvya@gmail.com>
>>> >> > wrote:
>>> >> > > Hi Erick,
>>> >> > >             You are saying correct. Something, **"overlapping
>>> >> searchers"
>>> >> > > warning messages** are coming in logs.
>>> >> > > **numDocs numbers** are changing when documents are adding at the
>>> time
>>> >> of
>>> >> > > indexing.
>>> >> > > Any help?
>>> >> > >
>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
>>> >> > erickerickson@gmail.com>
>>> >> > > wrote:
>>> >> > >
>>> >> > >> First, the soft commit interval is very short. Very, very, very,
>>> very
>>> >> > >> short. 300ms is
>>> >> > >> just short of insane unless it's a typo ;).
>>> >> > >>
>>> >> > >> Here's a long background:
>>> >> > >>
>>> >> > >>
>>> >> >
>>> >>
>>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>> >> > >>
>>> >> > >> But the short form is that you're opening searchers every 300 ms.
>>> The
>>> >> > >> hard commit is better,
>>> >> > >> but every 3 seconds is still far too short IMO. I'd start with
>>> soft
>>> >> > >> commits of 60000 and hard
>>> >> > >> commits of 60000 (60 seconds), meaning that you're going to have
>>> to
>>> >> > >> wait 1 minute for
>>> >> > >> docs to show up unless you explicitly commit.
>>> >> > >>
>>> >> > >> You're throwing away all the caches configured in solrconfig.xml
>>> more
>>> >> > >> than 3 times a second,
>>> >> > >> executing autowarming, etc, etc, etc....
>>> >> > >>
>>> >> > >> Changing these to longer intervals might cure the problem, but if
>>> not
>>> >> > >> then, as Hoss would
>>> >> > >> say, "details matter". I suspect you're also seeing "overlapping
>>> >> > >> searchers" warning messages
>>> >> > >> in your log, and it;s _possible_ that what's happening is that
>>> you're
>>> >> > >> just exceeding the
>>> >> > >> max warming searchers and never opening a new searcher with the
>>> >> > >> newly-indexed documents.
>>> >> > >> But that's a total shot in the dark.
>>> >> > >>
>>> >> > >> How are you looking for docs (and not finding them)? Does the
>>> numDocs
>>> >> > >> number in
>>> >> > >> the solr admin screen change?
>>> >> > >>
>>> >> > >>
>>> >> > >> Best,
>>> >> > >> Erick
>>> >> > >>
>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
>>> nitinmlvya@gmail.com
>>> >> >
>>> >> > >> wrote:
>>> >> > >> > Hi Alexandre,
>>> >> > >> >
>>> >> > >> >
>>> >> > >> > *Hard Commit* is :
>>> >> > >> >
>>> >> > >> >      <autoCommit>
>>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>>> >> > >> >        <openSearcher>false</openSearcher>
>>> >> > >> >      </autoCommit>
>>> >> > >> >
>>> >> > >> > *Soft Commit* is :
>>> >> > >> >
>>> >> > >> > <autoSoftCommit>
>>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>>> >> > >> > </autoSoftCommit>
>>> >> > >> >
>>> >> > >> > And I am committing 20000 documents each time.
>>> >> > >> > Is it good config for committing?
>>> >> > >> > Or I am good something wrong ?
>>> >> > >> >
>>> >> > >> >
>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
>>> >> > >> arafalov@gmail.com>
>>> >> > >> > wrote:
>>> >> > >> >
>>> >> > >> >> What's your commit strategy? Explicit commits? Soft
>>> commits/hard
>>> >> > >> >> commits (in solrconfig.xml)?
>>> >> > >> >>
>>> >> > >> >> Regards,
>>> >> > >> >>    Alex.
>>> >> > >> >> ----
>>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
>>> newsletter:
>>> >> > >> >> http://www.solr-start.com/
>>> >> > >> >>
>>> >> > >> >>
>>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <nitinmlvya@gmail.com
>>> >
>>> >> > wrote:
>>> >> > >> >> > Hello,
>>> >> > >> >> >           I have written a python script to do 20000
>>> documents
>>> >> > >> indexing
>>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
>>> >> > >> >> > When I started indexing, at that time 15 GB RAM was freed.
>>> While
>>> >> > >> >> indexing,
>>> >> > >> >> > all RAM is consumed but **not** a single document is
>>> indexed. Why
>>> >> > so?
>>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service
>>> Unavailable*
>>> >> in
>>> >> > >> python
>>> >> > >> >> > script.
>>> >> > >> >> > I think it is due to heavy load on Zookeeper by which all
>>> nodes
>>> >> > went
>>> >> > >> >> down.
>>> >> > >> >> > I am not sure about that. Any help please..
>>> >> > >> >> > Or anything else is happening..
>>> >> > >> >> > And how to overcome this issue.
>>> >> > >> >> > Please assist me towards right path.
>>> >> > >> >> > Thanks..
>>> >> > >> >> >
>>> >> > >> >> > Warm Regards,
>>> >> > >> >> > Nitin Solanki
>>> >> > >> >>
>>> >> > >>
>>> >> >
>>> >>
>>>
>>
>>

Re: Whole RAM consumed while Indexing.

Posted by Nitin Solanki <ni...@gmail.com>.
When I kept my configuration to 300 for soft commit and 3000 for hard
commit and indexed some amount of data, I got the data size of the whole
index to be 6GB after completing the indexing.

When I changed the configuration to 60000 for soft commit and 60000 for
hard commit and indexed same data then I got the data size of the whole
index to be 5GB after completing the indexing.

But the number of documents in the both scenario were same. I am wondering
how that can be possible?

On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <ni...@gmail.com> wrote:

> Hi Erick,
>              I am just saying. I want to be sure on commits difference..
> What if I do frequent commits or not? And why I am saying that I need to
> commit things so very quickly because I have to index 28GB of data which
> takes 7-8 hours(frequent commits).
> As you said, do commits after 60000 seconds then it will be more expensive.
> If I don't encounter with **"overlapping searchers" warning messages**
> then I feel it seems to be okay. Is it?
>
>
>
>
> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Don't do it. Really, why do you want to do this? This seems like
>> an "XY" problem, you haven't explained why you need to commit
>> things so very quickly.
>>
>> I suspect you haven't tried _searching_ while committing at such
>> a rate, and you might as well turn all your top-level caches off
>> in solrconfig.xml since they won't be useful at all.
>>
>> Best,
>> Erick
>>
>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <ni...@gmail.com>
>> wrote:
>> > Hi,
>> >        If I do very very fast indexing(softcommit = 300 and hardcommit =
>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000) as
>> you
>> > both said. Will fast indexing fail to index some data?
>> > Any suggestion on this ?
>> >
>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
>> > andyetitmoves@gmail.com> wrote:
>> >
>> >> Yes, and doing so is painful and takes lots of people and hardware
>> >> resources to get there for large amounts of data and queries :)
>> >>
>> >> As Erick says, work backwards from 60s and first establish how high the
>> >> commit interval can be to satisfy your use case..
>> >> On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com>
>> wrote:
>> >>
>> >> > First start by lengthening your soft and hard commit intervals
>> >> > substantially. Start with 60000 and work backwards I'd say.
>> >> >
>> >> > Ramkumar has tuned the heck out of his installation to get the commit
>> >> > intervals to be that short ;).
>> >> >
>> >> > I'm betting that you'll see your RAM usage go way down, but that' s a
>> >> > guess until you test.
>> >> >
>> >> > Best,
>> >> > Erick
>> >> >
>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
>> nitinmlvya@gmail.com>
>> >> > wrote:
>> >> > > Hi Erick,
>> >> > >             You are saying correct. Something, **"overlapping
>> >> searchers"
>> >> > > warning messages** are coming in logs.
>> >> > > **numDocs numbers** are changing when documents are adding at the
>> time
>> >> of
>> >> > > indexing.
>> >> > > Any help?
>> >> > >
>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
>> >> > erickerickson@gmail.com>
>> >> > > wrote:
>> >> > >
>> >> > >> First, the soft commit interval is very short. Very, very, very,
>> very
>> >> > >> short. 300ms is
>> >> > >> just short of insane unless it's a typo ;).
>> >> > >>
>> >> > >> Here's a long background:
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >> > >>
>> >> > >> But the short form is that you're opening searchers every 300 ms.
>> The
>> >> > >> hard commit is better,
>> >> > >> but every 3 seconds is still far too short IMO. I'd start with
>> soft
>> >> > >> commits of 60000 and hard
>> >> > >> commits of 60000 (60 seconds), meaning that you're going to have
>> to
>> >> > >> wait 1 minute for
>> >> > >> docs to show up unless you explicitly commit.
>> >> > >>
>> >> > >> You're throwing away all the caches configured in solrconfig.xml
>> more
>> >> > >> than 3 times a second,
>> >> > >> executing autowarming, etc, etc, etc....
>> >> > >>
>> >> > >> Changing these to longer intervals might cure the problem, but if
>> not
>> >> > >> then, as Hoss would
>> >> > >> say, "details matter". I suspect you're also seeing "overlapping
>> >> > >> searchers" warning messages
>> >> > >> in your log, and it;s _possible_ that what's happening is that
>> you're
>> >> > >> just exceeding the
>> >> > >> max warming searchers and never opening a new searcher with the
>> >> > >> newly-indexed documents.
>> >> > >> But that's a total shot in the dark.
>> >> > >>
>> >> > >> How are you looking for docs (and not finding them)? Does the
>> numDocs
>> >> > >> number in
>> >> > >> the solr admin screen change?
>> >> > >>
>> >> > >>
>> >> > >> Best,
>> >> > >> Erick
>> >> > >>
>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
>> nitinmlvya@gmail.com
>> >> >
>> >> > >> wrote:
>> >> > >> > Hi Alexandre,
>> >> > >> >
>> >> > >> >
>> >> > >> > *Hard Commit* is :
>> >> > >> >
>> >> > >> >      <autoCommit>
>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>> >> > >> >        <openSearcher>false</openSearcher>
>> >> > >> >      </autoCommit>
>> >> > >> >
>> >> > >> > *Soft Commit* is :
>> >> > >> >
>> >> > >> > <autoSoftCommit>
>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>> >> > >> > </autoSoftCommit>
>> >> > >> >
>> >> > >> > And I am committing 20000 documents each time.
>> >> > >> > Is it good config for committing?
>> >> > >> > Or I am good something wrong ?
>> >> > >> >
>> >> > >> >
>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
>> >> > >> arafalov@gmail.com>
>> >> > >> > wrote:
>> >> > >> >
>> >> > >> >> What's your commit strategy? Explicit commits? Soft
>> commits/hard
>> >> > >> >> commits (in solrconfig.xml)?
>> >> > >> >>
>> >> > >> >> Regards,
>> >> > >> >>    Alex.
>> >> > >> >> ----
>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
>> newsletter:
>> >> > >> >> http://www.solr-start.com/
>> >> > >> >>
>> >> > >> >>
>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <nitinmlvya@gmail.com
>> >
>> >> > wrote:
>> >> > >> >> > Hello,
>> >> > >> >> >           I have written a python script to do 20000
>> documents
>> >> > >> indexing
>> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
>> >> > >> >> > When I started indexing, at that time 15 GB RAM was freed.
>> While
>> >> > >> >> indexing,
>> >> > >> >> > all RAM is consumed but **not** a single document is
>> indexed. Why
>> >> > so?
>> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service
>> Unavailable*
>> >> in
>> >> > >> python
>> >> > >> >> > script.
>> >> > >> >> > I think it is due to heavy load on Zookeeper by which all
>> nodes
>> >> > went
>> >> > >> >> down.
>> >> > >> >> > I am not sure about that. Any help please..
>> >> > >> >> > Or anything else is happening..
>> >> > >> >> > And how to overcome this issue.
>> >> > >> >> > Please assist me towards right path.
>> >> > >> >> > Thanks..
>> >> > >> >> >
>> >> > >> >> > Warm Regards,
>> >> > >> >> > Nitin Solanki
>> >> > >> >>
>> >> > >>
>> >> >
>> >>
>>
>
>

Re: Whole RAM consumed while Indexing.

Posted by Nitin Solanki <ni...@gmail.com>.
Hi Erick,
             I am just saying. I want to be sure on commits difference..
What if I do frequent commits or not? And why I am saying that I need to
commit things so very quickly because I have to index 28GB of data which
takes 7-8 hours(frequent commits).
As you said, do commits after 60000 seconds then it will be more expensive.
If I don't encounter with **"overlapping searchers" warning messages** then
I feel it seems to be okay. Is it?




On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <er...@gmail.com>
wrote:

> Don't do it. Really, why do you want to do this? This seems like
> an "XY" problem, you haven't explained why you need to commit
> things so very quickly.
>
> I suspect you haven't tried _searching_ while committing at such
> a rate, and you might as well turn all your top-level caches off
> in solrconfig.xml since they won't be useful at all.
>
> Best,
> Erick
>
> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <ni...@gmail.com>
> wrote:
> > Hi,
> >        If I do very very fast indexing(softcommit = 300 and hardcommit =
> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000) as
> you
> > both said. Will fast indexing fail to index some data?
> > Any suggestion on this ?
> >
> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
> > andyetitmoves@gmail.com> wrote:
> >
> >> Yes, and doing so is painful and takes lots of people and hardware
> >> resources to get there for large amounts of data and queries :)
> >>
> >> As Erick says, work backwards from 60s and first establish how high the
> >> commit interval can be to satisfy your use case..
> >> On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com> wrote:
> >>
> >> > First start by lengthening your soft and hard commit intervals
> >> > substantially. Start with 60000 and work backwards I'd say.
> >> >
> >> > Ramkumar has tuned the heck out of his installation to get the commit
> >> > intervals to be that short ;).
> >> >
> >> > I'm betting that you'll see your RAM usage go way down, but that' s a
> >> > guess until you test.
> >> >
> >> > Best,
> >> > Erick
> >> >
> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <nitinmlvya@gmail.com
> >
> >> > wrote:
> >> > > Hi Erick,
> >> > >             You are saying correct. Something, **"overlapping
> >> searchers"
> >> > > warning messages** are coming in logs.
> >> > > **numDocs numbers** are changing when documents are adding at the
> time
> >> of
> >> > > indexing.
> >> > > Any help?
> >> > >
> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
> >> > erickerickson@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> First, the soft commit interval is very short. Very, very, very,
> very
> >> > >> short. 300ms is
> >> > >> just short of insane unless it's a typo ;).
> >> > >>
> >> > >> Here's a long background:
> >> > >>
> >> > >>
> >> >
> >>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >> > >>
> >> > >> But the short form is that you're opening searchers every 300 ms.
> The
> >> > >> hard commit is better,
> >> > >> but every 3 seconds is still far too short IMO. I'd start with soft
> >> > >> commits of 60000 and hard
> >> > >> commits of 60000 (60 seconds), meaning that you're going to have to
> >> > >> wait 1 minute for
> >> > >> docs to show up unless you explicitly commit.
> >> > >>
> >> > >> You're throwing away all the caches configured in solrconfig.xml
> more
> >> > >> than 3 times a second,
> >> > >> executing autowarming, etc, etc, etc....
> >> > >>
> >> > >> Changing these to longer intervals might cure the problem, but if
> not
> >> > >> then, as Hoss would
> >> > >> say, "details matter". I suspect you're also seeing "overlapping
> >> > >> searchers" warning messages
> >> > >> in your log, and it;s _possible_ that what's happening is that
> you're
> >> > >> just exceeding the
> >> > >> max warming searchers and never opening a new searcher with the
> >> > >> newly-indexed documents.
> >> > >> But that's a total shot in the dark.
> >> > >>
> >> > >> How are you looking for docs (and not finding them)? Does the
> numDocs
> >> > >> number in
> >> > >> the solr admin screen change?
> >> > >>
> >> > >>
> >> > >> Best,
> >> > >> Erick
> >> > >>
> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
> nitinmlvya@gmail.com
> >> >
> >> > >> wrote:
> >> > >> > Hi Alexandre,
> >> > >> >
> >> > >> >
> >> > >> > *Hard Commit* is :
> >> > >> >
> >> > >> >      <autoCommit>
> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
> >> > >> >        <openSearcher>false</openSearcher>
> >> > >> >      </autoCommit>
> >> > >> >
> >> > >> > *Soft Commit* is :
> >> > >> >
> >> > >> > <autoSoftCommit>
> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
> >> > >> > </autoSoftCommit>
> >> > >> >
> >> > >> > And I am committing 20000 documents each time.
> >> > >> > Is it good config for committing?
> >> > >> > Or I am good something wrong ?
> >> > >> >
> >> > >> >
> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
> >> > >> arafalov@gmail.com>
> >> > >> > wrote:
> >> > >> >
> >> > >> >> What's your commit strategy? Explicit commits? Soft commits/hard
> >> > >> >> commits (in solrconfig.xml)?
> >> > >> >>
> >> > >> >> Regards,
> >> > >> >>    Alex.
> >> > >> >> ----
> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> > >> >> http://www.solr-start.com/
> >> > >> >>
> >> > >> >>
> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <ni...@gmail.com>
> >> > wrote:
> >> > >> >> > Hello,
> >> > >> >> >           I have written a python script to do 20000 documents
> >> > >> indexing
> >> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
> >> > >> >> > When I started indexing, at that time 15 GB RAM was freed.
> While
> >> > >> >> indexing,
> >> > >> >> > all RAM is consumed but **not** a single document is indexed.
> Why
> >> > so?
> >> > >> >> > And it through *HTTPError: HTTP Error 503: Service
> Unavailable*
> >> in
> >> > >> python
> >> > >> >> > script.
> >> > >> >> > I think it is due to heavy load on Zookeeper by which all
> nodes
> >> > went
> >> > >> >> down.
> >> > >> >> > I am not sure about that. Any help please..
> >> > >> >> > Or anything else is happening..
> >> > >> >> > And how to overcome this issue.
> >> > >> >> > Please assist me towards right path.
> >> > >> >> > Thanks..
> >> > >> >> >
> >> > >> >> > Warm Regards,
> >> > >> >> > Nitin Solanki
> >> > >> >>
> >> > >>
> >> >
> >>
>

Re: Whole RAM consumed while Indexing.

Posted by Erick Erickson <er...@gmail.com>.
Don't do it. Really, why do you want to do this? This seems like
an "XY" problem, you haven't explained why you need to commit
things so very quickly.

I suspect you haven't tried _searching_ while committing at such
a rate, and you might as well turn all your top-level caches off
in solrconfig.xml since they won't be useful at all.

Best,
Erick

On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <ni...@gmail.com> wrote:
> Hi,
>        If I do very very fast indexing(softcommit = 300 and hardcommit =
> 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000) as you
> both said. Will fast indexing fail to index some data?
> Any suggestion on this ?
>
> On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
> andyetitmoves@gmail.com> wrote:
>
>> Yes, and doing so is painful and takes lots of people and hardware
>> resources to get there for large amounts of data and queries :)
>>
>> As Erick says, work backwards from 60s and first establish how high the
>> commit interval can be to satisfy your use case..
>> On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com> wrote:
>>
>> > First start by lengthening your soft and hard commit intervals
>> > substantially. Start with 60000 and work backwards I'd say.
>> >
>> > Ramkumar has tuned the heck out of his installation to get the commit
>> > intervals to be that short ;).
>> >
>> > I'm betting that you'll see your RAM usage go way down, but that' s a
>> > guess until you test.
>> >
>> > Best,
>> > Erick
>> >
>> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <ni...@gmail.com>
>> > wrote:
>> > > Hi Erick,
>> > >             You are saying correct. Something, **"overlapping
>> searchers"
>> > > warning messages** are coming in logs.
>> > > **numDocs numbers** are changing when documents are adding at the time
>> of
>> > > indexing.
>> > > Any help?
>> > >
>> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
>> > erickerickson@gmail.com>
>> > > wrote:
>> > >
>> > >> First, the soft commit interval is very short. Very, very, very, very
>> > >> short. 300ms is
>> > >> just short of insane unless it's a typo ;).
>> > >>
>> > >> Here's a long background:
>> > >>
>> > >>
>> >
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> > >>
>> > >> But the short form is that you're opening searchers every 300 ms. The
>> > >> hard commit is better,
>> > >> but every 3 seconds is still far too short IMO. I'd start with soft
>> > >> commits of 60000 and hard
>> > >> commits of 60000 (60 seconds), meaning that you're going to have to
>> > >> wait 1 minute for
>> > >> docs to show up unless you explicitly commit.
>> > >>
>> > >> You're throwing away all the caches configured in solrconfig.xml more
>> > >> than 3 times a second,
>> > >> executing autowarming, etc, etc, etc....
>> > >>
>> > >> Changing these to longer intervals might cure the problem, but if not
>> > >> then, as Hoss would
>> > >> say, "details matter". I suspect you're also seeing "overlapping
>> > >> searchers" warning messages
>> > >> in your log, and it;s _possible_ that what's happening is that you're
>> > >> just exceeding the
>> > >> max warming searchers and never opening a new searcher with the
>> > >> newly-indexed documents.
>> > >> But that's a total shot in the dark.
>> > >>
>> > >> How are you looking for docs (and not finding them)? Does the numDocs
>> > >> number in
>> > >> the solr admin screen change?
>> > >>
>> > >>
>> > >> Best,
>> > >> Erick
>> > >>
>> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <nitinmlvya@gmail.com
>> >
>> > >> wrote:
>> > >> > Hi Alexandre,
>> > >> >
>> > >> >
>> > >> > *Hard Commit* is :
>> > >> >
>> > >> >      <autoCommit>
>> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>> > >> >        <openSearcher>false</openSearcher>
>> > >> >      </autoCommit>
>> > >> >
>> > >> > *Soft Commit* is :
>> > >> >
>> > >> > <autoSoftCommit>
>> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>> > >> > </autoSoftCommit>
>> > >> >
>> > >> > And I am committing 20000 documents each time.
>> > >> > Is it good config for committing?
>> > >> > Or I am good something wrong ?
>> > >> >
>> > >> >
>> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
>> > >> arafalov@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> >> What's your commit strategy? Explicit commits? Soft commits/hard
>> > >> >> commits (in solrconfig.xml)?
>> > >> >>
>> > >> >> Regards,
>> > >> >>    Alex.
>> > >> >> ----
>> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> > >> >> http://www.solr-start.com/
>> > >> >>
>> > >> >>
>> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <ni...@gmail.com>
>> > wrote:
>> > >> >> > Hello,
>> > >> >> >           I have written a python script to do 20000 documents
>> > >> indexing
>> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
>> > >> >> > When I started indexing, at that time 15 GB RAM was freed. While
>> > >> >> indexing,
>> > >> >> > all RAM is consumed but **not** a single document is indexed. Why
>> > so?
>> > >> >> > And it through *HTTPError: HTTP Error 503: Service Unavailable*
>> in
>> > >> python
>> > >> >> > script.
>> > >> >> > I think it is due to heavy load on Zookeeper by which all nodes
>> > went
>> > >> >> down.
>> > >> >> > I am not sure about that. Any help please..
>> > >> >> > Or anything else is happening..
>> > >> >> > And how to overcome this issue.
>> > >> >> > Please assist me towards right path.
>> > >> >> > Thanks..
>> > >> >> >
>> > >> >> > Warm Regards,
>> > >> >> > Nitin Solanki
>> > >> >>
>> > >>
>> >
>>

Re: Whole RAM consumed while Indexing.

Posted by Nitin Solanki <ni...@gmail.com>.
Hi,
       If I do very very fast indexing(softcommit = 300 and hardcommit =
3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000) as you
both said. Will fast indexing fail to index some data?
Any suggestion on this ?

On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
andyetitmoves@gmail.com> wrote:

> Yes, and doing so is painful and takes lots of people and hardware
> resources to get there for large amounts of data and queries :)
>
> As Erick says, work backwards from 60s and first establish how high the
> commit interval can be to satisfy your use case..
> On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com> wrote:
>
> > First start by lengthening your soft and hard commit intervals
> > substantially. Start with 60000 and work backwards I'd say.
> >
> > Ramkumar has tuned the heck out of his installation to get the commit
> > intervals to be that short ;).
> >
> > I'm betting that you'll see your RAM usage go way down, but that' s a
> > guess until you test.
> >
> > Best,
> > Erick
> >
> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <ni...@gmail.com>
> > wrote:
> > > Hi Erick,
> > >             You are saying correct. Something, **"overlapping
> searchers"
> > > warning messages** are coming in logs.
> > > **numDocs numbers** are changing when documents are adding at the time
> of
> > > indexing.
> > > Any help?
> > >
> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
> > erickerickson@gmail.com>
> > > wrote:
> > >
> > >> First, the soft commit interval is very short. Very, very, very, very
> > >> short. 300ms is
> > >> just short of insane unless it's a typo ;).
> > >>
> > >> Here's a long background:
> > >>
> > >>
> >
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> > >>
> > >> But the short form is that you're opening searchers every 300 ms. The
> > >> hard commit is better,
> > >> but every 3 seconds is still far too short IMO. I'd start with soft
> > >> commits of 60000 and hard
> > >> commits of 60000 (60 seconds), meaning that you're going to have to
> > >> wait 1 minute for
> > >> docs to show up unless you explicitly commit.
> > >>
> > >> You're throwing away all the caches configured in solrconfig.xml more
> > >> than 3 times a second,
> > >> executing autowarming, etc, etc, etc....
> > >>
> > >> Changing these to longer intervals might cure the problem, but if not
> > >> then, as Hoss would
> > >> say, "details matter". I suspect you're also seeing "overlapping
> > >> searchers" warning messages
> > >> in your log, and it;s _possible_ that what's happening is that you're
> > >> just exceeding the
> > >> max warming searchers and never opening a new searcher with the
> > >> newly-indexed documents.
> > >> But that's a total shot in the dark.
> > >>
> > >> How are you looking for docs (and not finding them)? Does the numDocs
> > >> number in
> > >> the solr admin screen change?
> > >>
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <nitinmlvya@gmail.com
> >
> > >> wrote:
> > >> > Hi Alexandre,
> > >> >
> > >> >
> > >> > *Hard Commit* is :
> > >> >
> > >> >      <autoCommit>
> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
> > >> >        <openSearcher>false</openSearcher>
> > >> >      </autoCommit>
> > >> >
> > >> > *Soft Commit* is :
> > >> >
> > >> > <autoSoftCommit>
> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
> > >> > </autoSoftCommit>
> > >> >
> > >> > And I am committing 20000 documents each time.
> > >> > Is it good config for committing?
> > >> > Or I am good something wrong ?
> > >> >
> > >> >
> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
> > >> arafalov@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> What's your commit strategy? Explicit commits? Soft commits/hard
> > >> >> commits (in solrconfig.xml)?
> > >> >>
> > >> >> Regards,
> > >> >>    Alex.
> > >> >> ----
> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > >> >> http://www.solr-start.com/
> > >> >>
> > >> >>
> > >> >> On 12 March 2015 at 23:19, Nitin Solanki <ni...@gmail.com>
> > wrote:
> > >> >> > Hello,
> > >> >> >           I have written a python script to do 20000 documents
> > >> indexing
> > >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
> > >> >> > When I started indexing, at that time 15 GB RAM was freed. While
> > >> >> indexing,
> > >> >> > all RAM is consumed but **not** a single document is indexed. Why
> > so?
> > >> >> > And it through *HTTPError: HTTP Error 503: Service Unavailable*
> in
> > >> python
> > >> >> > script.
> > >> >> > I think it is due to heavy load on Zookeeper by which all nodes
> > went
> > >> >> down.
> > >> >> > I am not sure about that. Any help please..
> > >> >> > Or anything else is happening..
> > >> >> > And how to overcome this issue.
> > >> >> > Please assist me towards right path.
> > >> >> > Thanks..
> > >> >> >
> > >> >> > Warm Regards,
> > >> >> > Nitin Solanki
> > >> >>
> > >>
> >
>

Re: Whole RAM consumed while Indexing.

Posted by "Ramkumar R. Aiyengar" <an...@gmail.com>.
Yes, and doing so is painful and takes lots of people and hardware
resources to get there for large amounts of data and queries :)

As Erick says, work backwards from 60s and first establish how high the
commit interval can be to satisfy your use case..
On 16 Mar 2015 16:04, "Erick Erickson" <er...@gmail.com> wrote:

> First start by lengthening your soft and hard commit intervals
> substantially. Start with 60000 and work backwards I'd say.
>
> Ramkumar has tuned the heck out of his installation to get the commit
> intervals to be that short ;).
>
> I'm betting that you'll see your RAM usage go way down, but that' s a
> guess until you test.
>
> Best,
> Erick
>
> On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <ni...@gmail.com>
> wrote:
> > Hi Erick,
> >             You are saying correct. Something, **"overlapping searchers"
> > warning messages** are coming in logs.
> > **numDocs numbers** are changing when documents are adding at the time of
> > indexing.
> > Any help?
> >
> > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
> erickerickson@gmail.com>
> > wrote:
> >
> >> First, the soft commit interval is very short. Very, very, very, very
> >> short. 300ms is
> >> just short of insane unless it's a typo ;).
> >>
> >> Here's a long background:
> >>
> >>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >> But the short form is that you're opening searchers every 300 ms. The
> >> hard commit is better,
> >> but every 3 seconds is still far too short IMO. I'd start with soft
> >> commits of 60000 and hard
> >> commits of 60000 (60 seconds), meaning that you're going to have to
> >> wait 1 minute for
> >> docs to show up unless you explicitly commit.
> >>
> >> You're throwing away all the caches configured in solrconfig.xml more
> >> than 3 times a second,
> >> executing autowarming, etc, etc, etc....
> >>
> >> Changing these to longer intervals might cure the problem, but if not
> >> then, as Hoss would
> >> say, "details matter". I suspect you're also seeing "overlapping
> >> searchers" warning messages
> >> in your log, and it;s _possible_ that what's happening is that you're
> >> just exceeding the
> >> max warming searchers and never opening a new searcher with the
> >> newly-indexed documents.
> >> But that's a total shot in the dark.
> >>
> >> How are you looking for docs (and not finding them)? Does the numDocs
> >> number in
> >> the solr admin screen change?
> >>
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <ni...@gmail.com>
> >> wrote:
> >> > Hi Alexandre,
> >> >
> >> >
> >> > *Hard Commit* is :
> >> >
> >> >      <autoCommit>
> >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
> >> >        <openSearcher>false</openSearcher>
> >> >      </autoCommit>
> >> >
> >> > *Soft Commit* is :
> >> >
> >> > <autoSoftCommit>
> >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
> >> > </autoSoftCommit>
> >> >
> >> > And I am committing 20000 documents each time.
> >> > Is it good config for committing?
> >> > Or I am good something wrong ?
> >> >
> >> >
> >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
> >> arafalov@gmail.com>
> >> > wrote:
> >> >
> >> >> What's your commit strategy? Explicit commits? Soft commits/hard
> >> >> commits (in solrconfig.xml)?
> >> >>
> >> >> Regards,
> >> >>    Alex.
> >> >> ----
> >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> >> http://www.solr-start.com/
> >> >>
> >> >>
> >> >> On 12 March 2015 at 23:19, Nitin Solanki <ni...@gmail.com>
> wrote:
> >> >> > Hello,
> >> >> >           I have written a python script to do 20000 documents
> >> indexing
> >> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
> >> >> > When I started indexing, at that time 15 GB RAM was freed. While
> >> >> indexing,
> >> >> > all RAM is consumed but **not** a single document is indexed. Why
> so?
> >> >> > And it through *HTTPError: HTTP Error 503: Service Unavailable* in
> >> python
> >> >> > script.
> >> >> > I think it is due to heavy load on Zookeeper by which all nodes
> went
> >> >> down.
> >> >> > I am not sure about that. Any help please..
> >> >> > Or anything else is happening..
> >> >> > And how to overcome this issue.
> >> >> > Please assist me towards right path.
> >> >> > Thanks..
> >> >> >
> >> >> > Warm Regards,
> >> >> > Nitin Solanki
> >> >>
> >>
>

Re: Whole RAM consumed while Indexing.

Posted by Erick Erickson <er...@gmail.com>.
First start by lengthening your soft and hard commit intervals
substantially. Start with 60000 and work backwards I'd say.

Ramkumar has tuned the heck out of his installation to get the commit
intervals to be that short ;).

I'm betting that you'll see your RAM usage go way down, but that' s a
guess until you test.

Best,
Erick

On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <ni...@gmail.com> wrote:
> Hi Erick,
>             You are saying correct. Something, **"overlapping searchers"
> warning messages** are coming in logs.
> **numDocs numbers** are changing when documents are adding at the time of
> indexing.
> Any help?
>
> On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> First, the soft commit interval is very short. Very, very, very, very
>> short. 300ms is
>> just short of insane unless it's a typo ;).
>>
>> Here's a long background:
>>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> But the short form is that you're opening searchers every 300 ms. The
>> hard commit is better,
>> but every 3 seconds is still far too short IMO. I'd start with soft
>> commits of 60000 and hard
>> commits of 60000 (60 seconds), meaning that you're going to have to
>> wait 1 minute for
>> docs to show up unless you explicitly commit.
>>
>> You're throwing away all the caches configured in solrconfig.xml more
>> than 3 times a second,
>> executing autowarming, etc, etc, etc....
>>
>> Changing these to longer intervals might cure the problem, but if not
>> then, as Hoss would
>> say, "details matter". I suspect you're also seeing "overlapping
>> searchers" warning messages
>> in your log, and it;s _possible_ that what's happening is that you're
>> just exceeding the
>> max warming searchers and never opening a new searcher with the
>> newly-indexed documents.
>> But that's a total shot in the dark.
>>
>> How are you looking for docs (and not finding them)? Does the numDocs
>> number in
>> the solr admin screen change?
>>
>>
>> Best,
>> Erick
>>
>> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <ni...@gmail.com>
>> wrote:
>> > Hi Alexandre,
>> >
>> >
>> > *Hard Commit* is :
>> >
>> >      <autoCommit>
>> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>> >        <openSearcher>false</openSearcher>
>> >      </autoCommit>
>> >
>> > *Soft Commit* is :
>> >
>> > <autoSoftCommit>
>> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>> > </autoSoftCommit>
>> >
>> > And I am committing 20000 documents each time.
>> > Is it good config for committing?
>> > Or I am good something wrong ?
>> >
>> >
>> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
>> arafalov@gmail.com>
>> > wrote:
>> >
>> >> What's your commit strategy? Explicit commits? Soft commits/hard
>> >> commits (in solrconfig.xml)?
>> >>
>> >> Regards,
>> >>    Alex.
>> >> ----
>> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> >> http://www.solr-start.com/
>> >>
>> >>
>> >> On 12 March 2015 at 23:19, Nitin Solanki <ni...@gmail.com> wrote:
>> >> > Hello,
>> >> >           I have written a python script to do 20000 documents
>> indexing
>> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
>> >> > When I started indexing, at that time 15 GB RAM was freed. While
>> >> indexing,
>> >> > all RAM is consumed but **not** a single document is indexed. Why so?
>> >> > And it through *HTTPError: HTTP Error 503: Service Unavailable* in
>> python
>> >> > script.
>> >> > I think it is due to heavy load on Zookeeper by which all nodes went
>> >> down.
>> >> > I am not sure about that. Any help please..
>> >> > Or anything else is happening..
>> >> > And how to overcome this issue.
>> >> > Please assist me towards right path.
>> >> > Thanks..
>> >> >
>> >> > Warm Regards,
>> >> > Nitin Solanki
>> >>
>>

Re: Whole RAM consumed while Indexing.

Posted by Nitin Solanki <ni...@gmail.com>.
Hi Erick,
            You are saying correct. Something, **"overlapping searchers"
warning messages** are coming in logs.
**numDocs numbers** are changing when documents are adding at the time of
indexing.
Any help?

On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <er...@gmail.com>
wrote:

> First, the soft commit interval is very short. Very, very, very, very
> short. 300ms is
> just short of insane unless it's a typo ;).
>
> Here's a long background:
>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> But the short form is that you're opening searchers every 300 ms. The
> hard commit is better,
> but every 3 seconds is still far too short IMO. I'd start with soft
> commits of 60000 and hard
> commits of 60000 (60 seconds), meaning that you're going to have to
> wait 1 minute for
> docs to show up unless you explicitly commit.
>
> You're throwing away all the caches configured in solrconfig.xml more
> than 3 times a second,
> executing autowarming, etc, etc, etc....
>
> Changing these to longer intervals might cure the problem, but if not
> then, as Hoss would
> say, "details matter". I suspect you're also seeing "overlapping
> searchers" warning messages
> in your log, and it;s _possible_ that what's happening is that you're
> just exceeding the
> max warming searchers and never opening a new searcher with the
> newly-indexed documents.
> But that's a total shot in the dark.
>
> How are you looking for docs (and not finding them)? Does the numDocs
> number in
> the solr admin screen change?
>
>
> Best,
> Erick
>
> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <ni...@gmail.com>
> wrote:
> > Hi Alexandre,
> >
> >
> > *Hard Commit* is :
> >
> >      <autoCommit>
> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
> >        <openSearcher>false</openSearcher>
> >      </autoCommit>
> >
> > *Soft Commit* is :
> >
> > <autoSoftCommit>
> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
> > </autoSoftCommit>
> >
> > And I am committing 20000 documents each time.
> > Is it good config for committing?
> > Or I am good something wrong ?
> >
> >
> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <
> arafalov@gmail.com>
> > wrote:
> >
> >> What's your commit strategy? Explicit commits? Soft commits/hard
> >> commits (in solrconfig.xml)?
> >>
> >> Regards,
> >>    Alex.
> >> ----
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 12 March 2015 at 23:19, Nitin Solanki <ni...@gmail.com> wrote:
> >> > Hello,
> >> >           I have written a python script to do 20000 documents
> indexing
> >> > each time on Solr. I have 28 GB RAM with 8 CPU.
> >> > When I started indexing, at that time 15 GB RAM was freed. While
> >> indexing,
> >> > all RAM is consumed but **not** a single document is indexed. Why so?
> >> > And it through *HTTPError: HTTP Error 503: Service Unavailable* in
> python
> >> > script.
> >> > I think it is due to heavy load on Zookeeper by which all nodes went
> >> down.
> >> > I am not sure about that. Any help please..
> >> > Or anything else is happening..
> >> > And how to overcome this issue.
> >> > Please assist me towards right path.
> >> > Thanks..
> >> >
> >> > Warm Regards,
> >> > Nitin Solanki
> >>
>

Re: Whole RAM consumed while Indexing.

Posted by Erick Erickson <er...@gmail.com>.
First, the soft commit interval is very short. Very, very, very, very
short. 300ms is
just short of insane unless it's a typo ;).

Here's a long background:
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

But the short form is that you're opening searchers every 300 ms. The
hard commit is better,
but every 3 seconds is still far too short IMO. I'd start with soft
commits of 60000 and hard
commits of 60000 (60 seconds), meaning that you're going to have to
wait 1 minute for
docs to show up unless you explicitly commit.

You're throwing away all the caches configured in solrconfig.xml more
than 3 times a second,
executing autowarming, etc, etc, etc....

Changing these to longer intervals might cure the problem, but if not
then, as Hoss would
say, "details matter". I suspect you're also seeing "overlapping
searchers" warning messages
in your log, and it;s _possible_ that what's happening is that you're
just exceeding the
max warming searchers and never opening a new searcher with the
newly-indexed documents.
But that's a total shot in the dark.

How are you looking for docs (and not finding them)? Does the numDocs number in
the solr admin screen change?


Best,
Erick

On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <ni...@gmail.com> wrote:
> Hi Alexandre,
>
>
> *Hard Commit* is :
>
>      <autoCommit>
>        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>        <openSearcher>false</openSearcher>
>      </autoCommit>
>
> *Soft Commit* is :
>
> <autoSoftCommit>
>     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
> </autoSoftCommit>
>
> And I am committing 20000 documents each time.
> Is it good config for committing?
> Or I am good something wrong ?
>
>
> On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
>
>> What's your commit strategy? Explicit commits? Soft commits/hard
>> commits (in solrconfig.xml)?
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 12 March 2015 at 23:19, Nitin Solanki <ni...@gmail.com> wrote:
>> > Hello,
>> >           I have written a python script to do 20000 documents indexing
>> > each time on Solr. I have 28 GB RAM with 8 CPU.
>> > When I started indexing, at that time 15 GB RAM was freed. While
>> indexing,
>> > all RAM is consumed but **not** a single document is indexed. Why so?
>> > And it through *HTTPError: HTTP Error 503: Service Unavailable* in python
>> > script.
>> > I think it is due to heavy load on Zookeeper by which all nodes went
>> down.
>> > I am not sure about that. Any help please..
>> > Or anything else is happening..
>> > And how to overcome this issue.
>> > Please assist me towards right path.
>> > Thanks..
>> >
>> > Warm Regards,
>> > Nitin Solanki
>>

Re: Whole RAM consumed while Indexing.

Posted by Nitin Solanki <ni...@gmail.com>.
Hi Alexandre,


*Hard Commit* is :

     <autoCommit>
       <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
       <openSearcher>false</openSearcher>
     </autoCommit>

*Soft Commit* is :

<autoSoftCommit>
    <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
</autoSoftCommit>

And I am committing 20000 documents each time.
Is it good config for committing?
Or I am good something wrong ?


On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> What's your commit strategy? Explicit commits? Soft commits/hard
> commits (in solrconfig.xml)?
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 12 March 2015 at 23:19, Nitin Solanki <ni...@gmail.com> wrote:
> > Hello,
> >           I have written a python script to do 20000 documents indexing
> > each time on Solr. I have 28 GB RAM with 8 CPU.
> > When I started indexing, at that time 15 GB RAM was freed. While
> indexing,
> > all RAM is consumed but **not** a single document is indexed. Why so?
> > And it through *HTTPError: HTTP Error 503: Service Unavailable* in python
> > script.
> > I think it is due to heavy load on Zookeeper by which all nodes went
> down.
> > I am not sure about that. Any help please..
> > Or anything else is happening..
> > And how to overcome this issue.
> > Please assist me towards right path.
> > Thanks..
> >
> > Warm Regards,
> > Nitin Solanki
>

Re: Whole RAM consumed while Indexing.

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
What's your commit strategy? Explicit commits? Soft commits/hard
commits (in solrconfig.xml)?

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 23:19, Nitin Solanki <ni...@gmail.com> wrote:
> Hello,
>           I have written a python script to do 20000 documents indexing
> each time on Solr. I have 28 GB RAM with 8 CPU.
> When I started indexing, at that time 15 GB RAM was freed. While indexing,
> all RAM is consumed but **not** a single document is indexed. Why so?
> And it through *HTTPError: HTTP Error 503: Service Unavailable* in python
> script.
> I think it is due to heavy load on Zookeeper by which all nodes went down.
> I am not sure about that. Any help please..
> Or anything else is happening..
> And how to overcome this issue.
> Please assist me towards right path.
> Thanks..
>
> Warm Regards,
> Nitin Solanki