You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2010/04/26 16:06:58 UTC

Lucene RAM buffer size limit

Hi

Today we limit the RAM buffer size to 2048. I was wondering why, until Mark
referenced me to that issue (
https://issues.apache.org/jira/browse/LUCENE-1995) where someone complained
about it as well, but I didn't find a reason to why it's limited. I mean, I
understand it's probably an int is used somewhere, but all the issue says
here (
https://issues.apache.org/jira/browse/LUCENE-1995?focusedCommentId=12767757&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767757)
is "Maybe someday Lucene will allow a larger RAM buffer than 2GB".

Why "maybe"? :)

I set up a benchmark run on a new machine I got, 4xQuad, 64GB RAM, and turns
out I can't really take advantage of all that RAM ... :).

So I don't mind to look at the code, but I guess since Mike and a bunch of
other smarter people were on that issue, there is probably something I need
to know before I set to change it?

Appreciate your thoughts on that. Today, more and more machines come w/ more
than 2GB ...

Shai

Re: Lucene RAM buffer size limit

Posted by Michael Busch <bu...@gmail.com>.
With DocumentsWriterPerThread we can allow 2GB per thread, so that 
should be a good step forward.

For realtime indexing on the RAM buffer I'm planning to remove even that 
per-thread limit, because then you really want to make use of all the 
RAM you have available on your machine.

  Michael

On 4/26/10 7:24 AM, Shai Erera wrote:
> Good point. I'll set the heap size larger.
>
> It will be interesting to measure if there is a noticeable improvement 
> from 2GB RAM buffer size vs say 100MB or 512MB ... I'll report the 
> results here when I have them
>
> But in the meantime, I think, unless it makes absolutely no sense and 
> does not improve anything, that we should not limit to 2GB just 
> because we use an int somewhere.
>
> Shai
>
> On Mon, Apr 26, 2010 at 5:18 PM, Mark Miller <markrmiller@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     On 4/26/10 10:06 AM, Shai Erera wrote:
>
>         Hi
>
>         Today we limit the RAM buffer size to 2048. I was wondering
>         why, until
>         Mark referenced me to that issue
>         (https://issues.apache.org/jira/browse/LUCENE-1995) where someone
>         complained about it as well, but I didn't find a reason to why
>         it's
>         limited. I mean, I understand it's probably an int is used
>         somewhere,
>         but all the issue says here
>         (https://issues.apache.org/jira/browse/LUCENE-1995?focusedCommentId=12767757&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767757
>         <https://issues.apache.org/jira/browse/LUCENE-1995?focusedCommentId=12767757&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767757>
>         <https://issues.apache.org/jira/browse/LUCENE-1995?focusedCommentId=12767757&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767757
>         <https://issues.apache.org/jira/browse/LUCENE-1995?focusedCommentId=12767757&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767757>>)
>         is "Maybe someday Lucene will allow a larger RAM buffer than 2GB".
>
>         Why "maybe"? :)
>
>         I set up a benchmark run on a new machine I got, 4xQuad, 64GB
>         RAM, and
>         turns out I can't really take advantage of all that RAM ... :).
>
>         So I don't mind to look at the code, but I guess since Mike
>         and a bunch
>         of other smarter people were on that issue, there is probably
>         something
>         I need to know before I set to change it?
>
>         Appreciate your thoughts on that. Today, more and more
>         machines come w/
>         more than 2GB ...
>
>         Shai
>
>
>     Keep in mind that you need a lot more than 2GB of ram to use a
>     really large RAM buffer:
>     http://search.lucidimagination.com/search/document/b18fbeaaf53ead6c/io_exception_during_merge_optimize
>
>     64 GB might cover it though ;)
>
>     -- 
>     - Mark
>
>     http://www.lucidimagination.com
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>


Re: Lucene RAM buffer size limit

Posted by Michael McCandless <lu...@mikemccandless.com>.
In the past we've seen diminishing returns after the RAM buffer was
larger than a few 100s of MB, I believe.  But it'd be great to do
these tests again :)

A very large RAM buffer means you're not committing very often which
means if things go south, you lose all those indexed docs.

IndexWriter also consumes sudden RAM when merging -- it opens a
SegmentReader per segment being merged (but does not load its terms
index, but does load its norms); it allocates 4 bytes per doc to remap
around deletions.

For resolving deletions, it opens a SegmentReader for every segment in
the index, and does load the terms index (which in your case Tom is
probably going to tie up alot of RAM!).  If you pool these readers
(either by using IW.getReader (NRT reader) or by allowing pooling via
IndexWriterConfig (new in trunk)), you need to budget RAM for them.

Also, for searching, be sure to leave RAM to the OS for allocation to
the IO cache...

Mike

On Tue, Apr 27, 2010 at 1:28 AM, Shai Erera <se...@gmail.com> wrote:
> Hi Tom
>
> I don't know of an easy way to understand the relationship between the max
> RAM and the buffer size. I ran the test w/ 8GB heap and 2048 MB RAM buffer.
> indexing 16M documents (roughly 288GB data) took 7400 seconds (by 8
> threads). I will post the full benchmark output when I finish indexing 25M
> documents w/ different RAM buffer sizes.
>
> My gut feeling (and after reading this
> http://www.ibm.com/developerworks/java/library/j-jtp09275.html) tells me
> that if I need N MB of RAM, I should allocate at least 2*N space on the
> heap. But that just takes the RAM buffer into consideration. Since there is
> other memory that is allocated, GC might wake up, so in order to avoid that
> (as much as possible), I allocate at least 3*N, if N is large enough.
>
> In the current example, I need 2GB for RAM buffer, so I'll allocate at least
> 4 for on the heap. Then if I assume that the rest of the app won't allocate
> a total of more than 2GB, I'll set the heap size to 6GB. Since I have lots
> of RAM and cannot use it w/ Lucene, I set the heap size to 8GB. I haven't
> though turned on any flags to determine if and when GC ran, so I don't know
> if I've hit any nasty GC issues. But, given the total indexing throughput
> ~(140GB / hour), I think these are good settings.
>
> BTW, I think that w/ parallel arrays
> (https://issues.apache.org/jira/browse/LUCENE-2329), the performance should
> be better if you use a lower heap size. You can also read there that Michael
> B. ran the test w/ 200 RAM buffer and 2GB heap (and also 256MB heap), which
> might give you another indication of the RAM buffer / heap size ratio.
>
> Hope this helps,
> Shai
>
> On Mon, Apr 26, 2010 at 8:26 PM, Tom Burton-West <tb...@gmail.com>
> wrote:
>>
>> I'm looking forward to your results Shai.
>>
>>
>> Once we get our new test server we will be running tests with different
>> RAM
>> buffer sizes.  We have 10 300GB indexes to re-index, so we need to
>> minimize
>> any merging/disk I/O.
>>
>> See also this related thread on the Solr list:
>>
>> http://lucene.472066.n3.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB-tc505964.html#a505964
>>
>> Is there any easy way to understand the relationship between the max RAM
>> buffer size and the total amount of memory you need to give the JVM ?
>>
>>
>> Tom Burton-West
>> www.hathitrust.org
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Lucene-RAM-buffer-size-limit-tp756752p757354.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene RAM buffer size limit

Posted by Shai Erera <se...@gmail.com>.
Hi Tom

I don't know of an easy way to understand the relationship between the max
RAM and the buffer size. I ran the test w/ 8GB heap and 2048 MB RAM buffer.
indexing 16M documents (roughly 288GB data) took 7400 seconds (by 8
threads). I will post the full benchmark output when I finish indexing 25M
documents w/ different RAM buffer sizes.

My gut feeling (and after reading this
http://www.ibm.com/developerworks/java/library/j-jtp09275.html) tells me
that if I need N MB of RAM, I should allocate at least 2*N space on the
heap. But that just takes the RAM buffer into consideration. Since there is
other memory that is allocated, GC might wake up, so in order to avoid that
(as much as possible), I allocate at least 3*N, if N is large enough.

In the current example, I need 2GB for RAM buffer, so I'll allocate at least
4 for on the heap. Then if I assume that the rest of the app won't allocate
a total of more than 2GB, I'll set the heap size to 6GB. Since I have lots
of RAM and cannot use it w/ Lucene, I set the heap size to 8GB. I haven't
though turned on any flags to determine if and when GC ran, so I don't know
if I've hit any nasty GC issues. But, given the total indexing throughput
~(140GB / hour), I think these are good settings.

BTW, I think that w/ parallel arrays (
https://issues.apache.org/jira/browse/LUCENE-2329), the performance should
be better if you use a lower heap size. You can also read there that Michael
B. ran the test w/ 200 RAM buffer and 2GB heap (and also 256MB heap), which
might give you another indication of the RAM buffer / heap size ratio.

Hope this helps,
Shai

On Mon, Apr 26, 2010 at 8:26 PM, Tom Burton-West <tb...@gmail.com>wrote:

>
> I'm looking forward to your results Shai.
>
>
> Once we get our new test server we will be running tests with different RAM
> buffer sizes.  We have 10 300GB indexes to re-index, so we need to minimize
> any merging/disk I/O.
>
> See also this related thread on the Solr list:
>
> http://lucene.472066.n3.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB-tc505964.html#a505964
>
> Is there any easy way to understand the relationship between the max RAM
> buffer size and the total amount of memory you need to give the JVM ?
>
>
> Tom Burton-West
> www.hathitrust.org
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-RAM-buffer-size-limit-tp756752p757354.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Lucene RAM buffer size limit

Posted by Tom Burton-West <tb...@gmail.com>.
I'm looking forward to your results Shai.  


Once we get our new test server we will be running tests with different RAM
buffer sizes.  We have 10 300GB indexes to re-index, so we need to minimize
any merging/disk I/O.

See also this related thread on the Solr list:
http://lucene.472066.n3.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB-tc505964.html#a505964

Is there any easy way to understand the relationship between the max RAM
buffer size and the total amount of memory you need to give the JVM ?


Tom Burton-West
www.hathitrust.org
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-RAM-buffer-size-limit-tp756752p757354.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene RAM buffer size limit

Posted by Shai Erera <se...@gmail.com>.
Good point. I'll set the heap size larger.

It will be interesting to measure if there is a noticeable improvement from
2GB RAM buffer size vs say 100MB or 512MB ... I'll report the results here
when I have them

But in the meantime, I think, unless it makes absolutely no sense and does
not improve anything, that we should not limit to 2GB just because we use an
int somewhere.

Shai

On Mon, Apr 26, 2010 at 5:18 PM, Mark Miller <ma...@gmail.com> wrote:

> On 4/26/10 10:06 AM, Shai Erera wrote:
>
>> Hi
>>
>> Today we limit the RAM buffer size to 2048. I was wondering why, until
>> Mark referenced me to that issue
>> (https://issues.apache.org/jira/browse/LUCENE-1995) where someone
>> complained about it as well, but I didn't find a reason to why it's
>> limited. I mean, I understand it's probably an int is used somewhere,
>> but all the issue says here
>> (
>> https://issues.apache.org/jira/browse/LUCENE-1995?focusedCommentId=12767757&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767757
>> <
>> https://issues.apache.org/jira/browse/LUCENE-1995?focusedCommentId=12767757&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767757
>> >)
>> is "Maybe someday Lucene will allow a larger RAM buffer than 2GB".
>>
>> Why "maybe"? :)
>>
>> I set up a benchmark run on a new machine I got, 4xQuad, 64GB RAM, and
>> turns out I can't really take advantage of all that RAM ... :).
>>
>> So I don't mind to look at the code, but I guess since Mike and a bunch
>> of other smarter people were on that issue, there is probably something
>> I need to know before I set to change it?
>>
>> Appreciate your thoughts on that. Today, more and more machines come w/
>> more than 2GB ...
>>
>> Shai
>>
>
> Keep in mind that you need a lot more than 2GB of ram to use a really large
> RAM buffer:
> http://search.lucidimagination.com/search/document/b18fbeaaf53ead6c/io_exception_during_merge_optimize
>
> 64 GB might cover it though ;)
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Lucene RAM buffer size limit

Posted by Mark Miller <ma...@gmail.com>.
On 4/26/10 10:06 AM, Shai Erera wrote:
> Hi
>
> Today we limit the RAM buffer size to 2048. I was wondering why, until
> Mark referenced me to that issue
> (https://issues.apache.org/jira/browse/LUCENE-1995) where someone
> complained about it as well, but I didn't find a reason to why it's
> limited. I mean, I understand it's probably an int is used somewhere,
> but all the issue says here
> (https://issues.apache.org/jira/browse/LUCENE-1995?focusedCommentId=12767757&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767757
> <https://issues.apache.org/jira/browse/LUCENE-1995?focusedCommentId=12767757&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767757>)
> is "Maybe someday Lucene will allow a larger RAM buffer than 2GB".
>
> Why "maybe"? :)
>
> I set up a benchmark run on a new machine I got, 4xQuad, 64GB RAM, and
> turns out I can't really take advantage of all that RAM ... :).
>
> So I don't mind to look at the code, but I guess since Mike and a bunch
> of other smarter people were on that issue, there is probably something
> I need to know before I set to change it?
>
> Appreciate your thoughts on that. Today, more and more machines come w/
> more than 2GB ...
>
> Shai

Keep in mind that you need a lot more than 2GB of ram to use a really 
large RAM buffer: 
http://search.lucidimagination.com/search/document/b18fbeaaf53ead6c/io_exception_during_merge_optimize

64 GB might cover it though ;)

-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org