You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nickolai Toupikov <ni...@deri.org> on 2008/11/19 17:36:03 UTC

Question about autocommit

Hello,
I would like some details on the autocommit mechanism. I tried to search 
the wiki, but found only the
standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs  and 
300000milis.
Indexing at around  200 docs per second (from multiple processes, using 
the CommonsHttpSolrServer class),
i would have expected autocommits to occur around every  40 seconds,
however the jvm log shows the following -  sometimes more than two calls 
per second:

$ tail -f jvm-default.log | grep "commit"
[16:18:15.862] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:16.788] {pool-2-thread-1} end_commit_flush
[16:18:21.721] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:22.073] {pool-2-thread-1} end_commit_flush
[16:18:36.047] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:36.468] {pool-2-thread-1} end_commit_flush
[16:18:36.886] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:37.017] {pool-2-thread-1} end_commit_flush
[16:18:37.867] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:38.448] {pool-2-thread-1} end_commit_flush
[16:18:44.375] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.016] {pool-2-thread-1} end_commit_flush
[16:18:47.154] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.287] {pool-2-thread-1} end_commit_flush
[16:18:50.399] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:51.283] {pool-2-thread-1} end_commit_flush
[16:19:13.782] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:14.664] {pool-2-thread-1} end_commit_flush
[16:19:15.081] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.215] {pool-2-thread-1} end_commit_flush
[16:19:15.357] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.955] {pool-2-thread-1} end_commit_flush
[16:19:16.421] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:19.791] {pool-2-thread-1} end_commit_flush
[16:19:50.594] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.098] {pool-2-thread-1} end_commit_flush
[16:19:52.236] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.368] {pool-2-thread-1} end_commit_flush
[16:19:52.917] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:53.479] {pool-2-thread-1} end_commit_flush
[16:19:54.920] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:55.079] {pool-2-thread-1} end_commit_flush


additionally, in the solr admin page , the update handler reports as 
many autocommits as commits -
so i assume it is not some commit(); line lost in my code.

I actually get the feeling that the commits are triggered more and more 
often - with not-so-nice
influence on indexing speed over time. Restarting resin seems to get the 
commit rate to the original level.
Optimizing has no effect.
Is there some other parameter influencing autocommit?

Thank you very much.

Nickolai

Re: Question about autocommit

Posted by Mark Miller <ma...@gmail.com>.
Could also go with the thread safety issues with pending and the 
deadlock that was reported the other day. All could pretty easily be 
related. Do we have a JIRA issue on it yet? Suppose I'll look...

Mark Miller wrote:
> Interesting...could go along with the earlier guys post about slow 
> indexing...
>
> Nickolai Toupikov wrote:
>> Hello,
>> I would like some details on the autocommit mechanism. I tried to 
>> search the wiki, but found only the
>> standard maxDoc/time settings.
>> i have set the autocommit parameters in solrconfig.xml to 8000 docs  
>> and 300000milis.
>> Indexing at around  200 docs per second (from multiple processes, 
>> using the CommonsHttpSolrServer class),
>> i would have expected autocommits to occur around every  40 seconds,
>> however the jvm log shows the following -  sometimes more than two 
>> calls per second:
>>
>> $ tail -f jvm-default.log | grep "commit"
>> [16:18:15.862] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:16.788] {pool-2-thread-1} end_commit_flush
>> [16:18:21.721] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:22.073] {pool-2-thread-1} end_commit_flush
>> [16:18:36.047] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:36.468] {pool-2-thread-1} end_commit_flush
>> [16:18:36.886] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:37.017] {pool-2-thread-1} end_commit_flush
>> [16:18:37.867] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:38.448] {pool-2-thread-1} end_commit_flush
>> [16:18:44.375] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:47.016] {pool-2-thread-1} end_commit_flush
>> [16:18:47.154] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:47.287] {pool-2-thread-1} end_commit_flush
>> [16:18:50.399] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:51.283] {pool-2-thread-1} end_commit_flush
>> [16:19:13.782] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:14.664] {pool-2-thread-1} end_commit_flush
>> [16:19:15.081] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:15.215] {pool-2-thread-1} end_commit_flush
>> [16:19:15.357] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:15.955] {pool-2-thread-1} end_commit_flush
>> [16:19:16.421] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:19.791] {pool-2-thread-1} end_commit_flush
>> [16:19:50.594] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:52.098] {pool-2-thread-1} end_commit_flush
>> [16:19:52.236] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:52.368] {pool-2-thread-1} end_commit_flush
>> [16:19:52.917] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:53.479] {pool-2-thread-1} end_commit_flush
>> [16:19:54.920] {pool-2-thread-1} start 
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:55.079] {pool-2-thread-1} end_commit_flush
>>
>>
>> additionally, in the solr admin page , the update handler reports as 
>> many autocommits as commits -
>> so i assume it is not some commit(); line lost in my code.
>>
>> I actually get the feeling that the commits are triggered more and 
>> more often - with not-so-nice
>> influence on indexing speed over time. Restarting resin seems to get 
>> the commit rate to the original level.
>> Optimizing has no effect.
>> Is there some other parameter influencing autocommit?
>>
>> Thank you very much.
>>
>> Nickolai
>


Re: Question about autocommit

Posted by Mark Miller <ma...@gmail.com>.
Interesting...could go along with the earlier guys post about slow 
indexing...

Nickolai Toupikov wrote:
> Hello,
> I would like some details on the autocommit mechanism. I tried to 
> search the wiki, but found only the
> standard maxDoc/time settings.
> i have set the autocommit parameters in solrconfig.xml to 8000 docs  
> and 300000milis.
> Indexing at around  200 docs per second (from multiple processes, 
> using the CommonsHttpSolrServer class),
> i would have expected autocommits to occur around every  40 seconds,
> however the jvm log shows the following -  sometimes more than two 
> calls per second:
>
> $ tail -f jvm-default.log | grep "commit"
> [16:18:15.862] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:16.788] {pool-2-thread-1} end_commit_flush
> [16:18:21.721] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:22.073] {pool-2-thread-1} end_commit_flush
> [16:18:36.047] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:36.468] {pool-2-thread-1} end_commit_flush
> [16:18:36.886] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:37.017] {pool-2-thread-1} end_commit_flush
> [16:18:37.867] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:38.448] {pool-2-thread-1} end_commit_flush
> [16:18:44.375] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:47.016] {pool-2-thread-1} end_commit_flush
> [16:18:47.154] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:47.287] {pool-2-thread-1} end_commit_flush
> [16:18:50.399] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:51.283] {pool-2-thread-1} end_commit_flush
> [16:19:13.782] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:14.664] {pool-2-thread-1} end_commit_flush
> [16:19:15.081] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:15.215] {pool-2-thread-1} end_commit_flush
> [16:19:15.357] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:15.955] {pool-2-thread-1} end_commit_flush
> [16:19:16.421] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:19.791] {pool-2-thread-1} end_commit_flush
> [16:19:50.594] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:52.098] {pool-2-thread-1} end_commit_flush
> [16:19:52.236] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:52.368] {pool-2-thread-1} end_commit_flush
> [16:19:52.917] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:53.479] {pool-2-thread-1} end_commit_flush
> [16:19:54.920] {pool-2-thread-1} start 
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:55.079] {pool-2-thread-1} end_commit_flush
>
>
> additionally, in the solr admin page , the update handler reports as 
> many autocommits as commits -
> so i assume it is not some commit(); line lost in my code.
>
> I actually get the feeling that the commits are triggered more and 
> more often - with not-so-nice
> influence on indexing speed over time. Restarting resin seems to get 
> the commit rate to the original level.
> Optimizing has no effect.
> Is there some other parameter influencing autocommit?
>
> Thank you very much.
>
> Nickolai


Re: Question about autocommit

Posted by Nickolai Toupikov <ni...@deri.org>.
I dont know. After reading my last email, i realized i did not say 
explicitly that by 'restarting' i merely meant 'restarting resin' . I 
did not restart indexing from scratch. And - if I understand correctly - 
if the merge factor was the culprit, restarting the servlet container 
would have had no effect.

Nguyen, Joe wrote:
> First it was fast, but after a couple of hours, it was slow down...
> Could mergeFactor affect the indexing speed since solr would take time
> to merge multiple segments into a single one?
>
> http://wiki.apache.org/solr/SolrPerformanceFactors#head-224d9a793c7c57d8
> 662d5351f955ddf8c0a3ebcd
>  
>
> -----Original Message-----
> From: Nickolai Toupikov [mailto:nickolai.toupikov@deri.org] 
> Sent: Wednesday, November 19, 2008 9:51 Joe
> To: solr-user@lucene.apache.org
> Subject: Re: Question about autocommit
>
> The documents have an average size of about a kilobyte i would say. 
> bigger ones can pop up,
>  but not nearly often enough to trigger memory-commits every couple of
> seconds.
> I dont have the exact figures, but i would expect the memory buffer
> limit to be far beyond the 8000 document  one in most of the cases.
> actually i have first started indexing with a 2000 document limit - a
> commit expected every ten seconds or so.
> in a couple of hours the speed of indexing choked down from over 200 to
> under 100 documents per second - and all the same i had several
> autocommits a second. so i restarted with a limit  at 8000. with the
> results i mentionned in the previous email.
>
> Nguyen, Joe wrote:
>   
>> As far as I know, commit could be triggered by
>>
>> Manually
>> 1.  invoke commit() method
>> Automatically
>> 2.  maxDoc
>> 3.  maxTime
>>
>> Since the document size is arbitrary and some document could be huge, 
>> could commit also be triggered by memory buffered size?
>>
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Wednesday, November 19, 2008 9:09 Joe
>> To: solr-user@lucene.apache.org
>> Subject: Re: Question about autocommit
>>
>> They are separate commits. ramBufferSizeMB controls when the 
>> underlying Lucene IndexWriter flushes ram to disk (this isnt like the 
>> IndexWriter commiting or closing). The solr autocommit controls when 
>> solr asks IndexWriter to commit what its done so far.
>>
>> Nguyen, Joe wrote:
>>   
>>     
>>> Could <ramBufferSizeMB> trigger the commit in this case?  
>>>
>>> -----Original Message-----
>>> From: Nickolai Toupikov [mailto:nickolai.toupikov@deri.org]
>>> Sent: Wednesday, November 19, 2008 8:36 Joe
>>> To: solr-user@lucene.apache.org
>>> Subject: Question about autocommit
>>>
>>> Hello,
>>> I would like some details on the autocommit mechanism. I tried to 
>>> search the wiki, but found only the standard maxDoc/time settings.
>>> i have set the autocommit parameters in solrconfig.xml to 8000 docs 
>>> and 300000milis.
>>> Indexing at around  200 docs per second (from multiple processes, 
>>> using the CommonsHttpSolrServer class), i would have expected 
>>> autocommits to occur around every  40 seconds, however the jvm log 
>>> shows the following
>>> -  sometimes more than two calls per second:
>>>
>>> $ tail -f jvm-default.log | grep "commit"
>>> [16:18:15.862] {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] 
>>> {pool-2-thread-1} start
>>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>>> [16:19:55.079] {pool-2-thread-1} end_commit_flush
>>>
>>>
>>> additionally, in the solr admin page , the update handler reports as 
>>> many autocommits as commits - so i assume it is not some commit(); 
>>> line lost in my code.
>>>
>>> I actually get the feeling that the commits are triggered more and 
>>> more often - with not-so-nice influence on indexing speed over time.
>>> Restarting resin seems to get the commit rate to the original level.
>>> Optimizing has no effect.
>>> Is there some other parameter influencing autocommit?
>>>
>>> Thank you very much.
>>>
>>> Nickolai
>>>   
>>>     
>>>       

RE: Question about autocommit

Posted by "Nguyen, Joe" <jn...@automotive.com>.
First it was fast, but after a couple of hours, it was slow down...
Could mergeFactor affect the indexing speed since solr would take time
to merge multiple segments into a single one?

http://wiki.apache.org/solr/SolrPerformanceFactors#head-224d9a793c7c57d8
662d5351f955ddf8c0a3ebcd
 

-----Original Message-----
From: Nickolai Toupikov [mailto:nickolai.toupikov@deri.org] 
Sent: Wednesday, November 19, 2008 9:51 Joe
To: solr-user@lucene.apache.org
Subject: Re: Question about autocommit

The documents have an average size of about a kilobyte i would say. 
bigger ones can pop up,
 but not nearly often enough to trigger memory-commits every couple of
seconds.
I dont have the exact figures, but i would expect the memory buffer
limit to be far beyond the 8000 document  one in most of the cases.
actually i have first started indexing with a 2000 document limit - a
commit expected every ten seconds or so.
in a couple of hours the speed of indexing choked down from over 200 to
under 100 documents per second - and all the same i had several
autocommits a second. so i restarted with a limit  at 8000. with the
results i mentionned in the previous email.

Nguyen, Joe wrote:
> As far as I know, commit could be triggered by
>
> Manually
> 1.  invoke commit() method
> Automatically
> 2.  maxDoc
> 3.  maxTime
>
> Since the document size is arbitrary and some document could be huge, 
> could commit also be triggered by memory buffered size?
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Wednesday, November 19, 2008 9:09 Joe
> To: solr-user@lucene.apache.org
> Subject: Re: Question about autocommit
>
> They are separate commits. ramBufferSizeMB controls when the 
> underlying Lucene IndexWriter flushes ram to disk (this isnt like the 
> IndexWriter commiting or closing). The solr autocommit controls when 
> solr asks IndexWriter to commit what its done so far.
>
> Nguyen, Joe wrote:
>   
>> Could <ramBufferSizeMB> trigger the commit in this case?  
>>
>> -----Original Message-----
>> From: Nickolai Toupikov [mailto:nickolai.toupikov@deri.org]
>> Sent: Wednesday, November 19, 2008 8:36 Joe
>> To: solr-user@lucene.apache.org
>> Subject: Question about autocommit
>>
>> Hello,
>> I would like some details on the autocommit mechanism. I tried to 
>> search the wiki, but found only the standard maxDoc/time settings.
>> i have set the autocommit parameters in solrconfig.xml to 8000 docs 
>> and 300000milis.
>> Indexing at around  200 docs per second (from multiple processes, 
>> using the CommonsHttpSolrServer class), i would have expected 
>> autocommits to occur around every  40 seconds, however the jvm log 
>> shows the following
>> -  sometimes more than two calls per second:
>>
>> $ tail -f jvm-default.log | grep "commit"
>> [16:18:15.862] {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:55.079] {pool-2-thread-1} end_commit_flush
>>
>>
>> additionally, in the solr admin page , the update handler reports as 
>> many autocommits as commits - so i assume it is not some commit(); 
>> line lost in my code.
>>
>> I actually get the feeling that the commits are triggered more and 
>> more often - with not-so-nice influence on indexing speed over time.
>> Restarting resin seems to get the commit rate to the original level.
>> Optimizing has no effect.
>> Is there some other parameter influencing autocommit?
>>
>> Thank you very much.
>>
>> Nickolai
>>   
>>     

Re: Question about autocommit

Posted by Nickolai Toupikov <ni...@deri.org>.
The documents have an average size of about a kilobyte i would say. 
bigger ones can pop up,
 but not nearly often enough to trigger memory-commits every couple of 
seconds.
I dont have the exact figures, but i would expect the memory buffer 
limit to be far beyond the 8000 document  one in most of the cases.
actually i have first started indexing with a 2000 document limit - a 
commit expected every ten seconds or so.
in a couple of hours the speed of indexing choked down from over 200 to 
under 100 documents per second - and all the same i had
several autocommits a second. so i restarted with a limit  at 8000. with 
the results i mentionned in the previous email.

Nguyen, Joe wrote:
> As far as I know, commit could be triggered by
>
> Manually
> 1.  invoke commit() method
> Automatically
> 2.  maxDoc
> 3.  maxTime
>
> Since the document size is arbitrary and some document could be huge,
> could commit also be triggered by memory buffered size?
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com] 
> Sent: Wednesday, November 19, 2008 9:09 Joe
> To: solr-user@lucene.apache.org
> Subject: Re: Question about autocommit
>
> They are separate commits. ramBufferSizeMB controls when the underlying
> Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter
> commiting or closing). The solr autocommit controls when solr asks
> IndexWriter to commit what its done so far.
>
> Nguyen, Joe wrote:
>   
>> Could <ramBufferSizeMB> trigger the commit in this case?  
>>
>> -----Original Message-----
>> From: Nickolai Toupikov [mailto:nickolai.toupikov@deri.org]
>> Sent: Wednesday, November 19, 2008 8:36 Joe
>> To: solr-user@lucene.apache.org
>> Subject: Question about autocommit
>>
>> Hello,
>> I would like some details on the autocommit mechanism. I tried to 
>> search the wiki, but found only the standard maxDoc/time settings.
>> i have set the autocommit parameters in solrconfig.xml to 8000 docs  
>> and 300000milis.
>> Indexing at around  200 docs per second (from multiple processes, 
>> using the CommonsHttpSolrServer class), i would have expected 
>> autocommits to occur around every  40 seconds, however the jvm log 
>> shows the following
>> -  sometimes more than two calls per second:
>>
>> $ tail -f jvm-default.log | grep "commit"
>> [16:18:15.862] {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] 
>> {pool-2-thread-1} start
>> commit(optimize=false,waitFlush=true,waitSearcher=true)
>> [16:19:55.079] {pool-2-thread-1} end_commit_flush
>>
>>
>> additionally, in the solr admin page , the update handler reports as 
>> many autocommits as commits - so i assume it is not some commit(); 
>> line lost in my code.
>>
>> I actually get the feeling that the commits are triggered more and 
>> more often - with not-so-nice influence on indexing speed over time.
>> Restarting resin seems to get the commit rate to the original level.
>> Optimizing has no effect.
>> Is there some other parameter influencing autocommit?
>>
>> Thank you very much.
>>
>> Nickolai
>>   
>>     

RE: Question about autocommit

Posted by "Nguyen, Joe" <jn...@automotive.com>.
As far as I know, commit could be triggered by

Manually
1.  invoke commit() method
Automatically
2.  maxDoc
3.  maxTime

Since the document size is arbitrary and some document could be huge,
could commit also be triggered by memory buffered size?

-----Original Message-----
From: Mark Miller [mailto:markrmiller@gmail.com] 
Sent: Wednesday, November 19, 2008 9:09 Joe
To: solr-user@lucene.apache.org
Subject: Re: Question about autocommit

They are separate commits. ramBufferSizeMB controls when the underlying
Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter
commiting or closing). The solr autocommit controls when solr asks
IndexWriter to commit what its done so far.

Nguyen, Joe wrote:
> Could <ramBufferSizeMB> trigger the commit in this case?  
>
> -----Original Message-----
> From: Nickolai Toupikov [mailto:nickolai.toupikov@deri.org]
> Sent: Wednesday, November 19, 2008 8:36 Joe
> To: solr-user@lucene.apache.org
> Subject: Question about autocommit
>
> Hello,
> I would like some details on the autocommit mechanism. I tried to 
> search the wiki, but found only the standard maxDoc/time settings.
> i have set the autocommit parameters in solrconfig.xml to 8000 docs  
> and 300000milis.
> Indexing at around  200 docs per second (from multiple processes, 
> using the CommonsHttpSolrServer class), i would have expected 
> autocommits to occur around every  40 seconds, however the jvm log 
> shows the following
> -  sometimes more than two calls per second:
>
> $ tail -f jvm-default.log | grep "commit"
> [16:18:15.862] {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] 
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:55.079] {pool-2-thread-1} end_commit_flush
>
>
> additionally, in the solr admin page , the update handler reports as 
> many autocommits as commits - so i assume it is not some commit(); 
> line lost in my code.
>
> I actually get the feeling that the commits are triggered more and 
> more often - with not-so-nice influence on indexing speed over time.
> Restarting resin seems to get the commit rate to the original level.
> Optimizing has no effect.
> Is there some other parameter influencing autocommit?
>
> Thank you very much.
>
> Nickolai
>   


Re: Question about autocommit

Posted by Mark Miller <ma...@gmail.com>.
They are separate commits. ramBufferSizeMB controls when the underlying 
Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter 
commiting or closing). The solr autocommit controls when solr asks 
IndexWriter to commit what its done so far.

Nguyen, Joe wrote:
> Could <ramBufferSizeMB> trigger the commit in this case?  
>
> -----Original Message-----
> From: Nickolai Toupikov [mailto:nickolai.toupikov@deri.org] 
> Sent: Wednesday, November 19, 2008 8:36 Joe
> To: solr-user@lucene.apache.org
> Subject: Question about autocommit
>
> Hello,
> I would like some details on the autocommit mechanism. I tried to search
> the wiki, but found only the standard maxDoc/time settings.
> i have set the autocommit parameters in solrconfig.xml to 8000 docs  and
> 300000milis.
> Indexing at around  200 docs per second (from multiple processes, using
> the CommonsHttpSolrServer class), i would have expected autocommits to
> occur around every  40 seconds, however the jvm log shows the following
> -  sometimes more than two calls per second:
>
> $ tail -f jvm-default.log | grep "commit"
> [16:18:15.862] {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920]
> {pool-2-thread-1} start
> commit(optimize=false,waitFlush=true,waitSearcher=true)
> [16:19:55.079] {pool-2-thread-1} end_commit_flush
>
>
> additionally, in the solr admin page , the update handler reports as
> many autocommits as commits - so i assume it is not some commit(); line
> lost in my code.
>
> I actually get the feeling that the commits are triggered more and more
> often - with not-so-nice influence on indexing speed over time.
> Restarting resin seems to get the commit rate to the original level.
> Optimizing has no effect.
> Is there some other parameter influencing autocommit?
>
> Thank you very much.
>
> Nickolai
>   


RE: Question about autocommit

Posted by "Nguyen, Joe" <jn...@automotive.com>.
Could <ramBufferSizeMB> trigger the commit in this case?  

-----Original Message-----
From: Nickolai Toupikov [mailto:nickolai.toupikov@deri.org] 
Sent: Wednesday, November 19, 2008 8:36 Joe
To: solr-user@lucene.apache.org
Subject: Question about autocommit

Hello,
I would like some details on the autocommit mechanism. I tried to search
the wiki, but found only the standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs  and
300000milis.
Indexing at around  200 docs per second (from multiple processes, using
the CommonsHttpSolrServer class), i would have expected autocommits to
occur around every  40 seconds, however the jvm log shows the following
-  sometimes more than two calls per second:

$ tail -f jvm-default.log | grep "commit"
[16:18:15.862] {pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:55.079] {pool-2-thread-1} end_commit_flush


additionally, in the solr admin page , the update handler reports as
many autocommits as commits - so i assume it is not some commit(); line
lost in my code.

I actually get the feeling that the commits are triggered more and more
often - with not-so-nice influence on indexing speed over time.
Restarting resin seems to get the commit rate to the original level.
Optimizing has no effect.
Is there some other parameter influencing autocommit?

Thank you very much.

Nickolai