You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Zhibin Mai <zb...@yahoo.com> on 2008/11/16 14:36:36 UTC

how to estimate how much memory is required to support the large index search

Hello,

I
am a beginner on using lucene. We developed an application to
create and search index using lucene 2.3.1. We would like to know how
to estimate how much memory is required to support 
the index search given an index.

Recently,
the size of the index has reached to about 200GB with 197M of documents
and 223M of terms. Our application starts having intermittent
"OutOfMemoryError: Java heap space" when we use
it to search the index. We use JProfiler to get the following memory allocation when we do one keyword search:

char[]                                                        332MB
org.apache.lucene.index.TermInfo            194MB
java.lang.String                                        146MB
org.apache.lucene.index.Term                99,823KB
org.apache.lucene.index.Term                24,956KB
org.apache.lucene.index.TermInfo[]        24,956KB

byte[]                                                    188MB
long[]                                                    49,912KB

The memory allocation for the first 6 types of objects does not change when we change the search criteria. Could you please give me some advice what major factors will affect the memory allocation
and how those factors will affect the memory usage precisely on search? Is it possible to reduce the memory usage on search?


Thank you,


Zhibin



      

Re: how to estimate how much memory is required to support the large index search

Posted by Michael McCandless <lu...@mikemccandless.com>.
BTW, upcoming changes in Lucene for flexible indexing should improve  
the RAM usage of the terms index substantially:

     https://issues.apache.org/jira/browse/LUCENE-1458

In the current (first) iteration on that patch, TermInfo is no longer  
used at all when loading the index.  I think for a typical index this  
will likely cut in half the RAM used by the terms index.

But... this won't be available for some time (it's still a work in  
progress).

Mike

Chris Lu wrote:

> So looks like you are not really doing much sorting? This index  
> divisor
> affects reader.terms(), but not too much with sorting.
>
> -- 
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per  
> request) got
> 2.6 Million Euro funding!
>
>
> On Mon, Nov 17, 2008 at 6:21 PM, Zhibin Mai <zb...@yahoo.com> wrote:
>
>> It is a cache tunning setting in IndexReader. It can be set via  
>> method
>> setTermInfosIndexDivisor(int).
>>
>> Thanks,
>>
>> Zhibin
>>
>>
>>
>>
>> ________________________________
>> From: Chris Lu <ch...@gmail.com>
>> To: java-user@lucene.apache.org
>> Sent: Monday, November 17, 2008 7:07:21 PM
>> Subject: Re: how to estimate how much memory is required to support  
>> the
>> large index search
>>
>> Calculation looks right. But what's the "Index divisor" that you  
>> mentioned?
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>>
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per  
>> request) got
>> 2.6 Million Euro funding!
>>
>> On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <zb...@yahoo.com> wrote:
>>
>>> Aleksander,
>>>
>>> I figured it out that most of heap was consumed by the Term cache.  
>>> In our
>>> case, the index has 233 millions of terms and 6.4 millions of them  
>>> were
>>> loaded into the cache when we did the search. I roughly did a  
>>> calculation
>>> that each term will need how much memory, it is about
>>> 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes  
>>> for
>>> String Object for term text + 2 * length(Char[]) for term text.
>>>
>>> In our case, the average length of term text is 25 characters,  
>>> that means
>>> each term needs at least 122 bytes. The cache for 6.4 millions of  
>>> terms
>>> needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for  
>>> cache
>> is
>>> larger than 980MB. We work around the cache issue for Terms by  
>>> setting
>> index
>>> divisor of the IndexReader to a higher value. Actually, the  
>>> performance
>> of
>>> search is good even using index divisor as 4.
>>>
>>> Thanks,
>>>
>>> Zhibin
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Aleksander M. Stensby <al...@integrasco.no>
>>> To: java-user@lucene.apache.org
>>> Sent: Monday, November 17, 2008 2:31:04 AM
>>> Subject: Re: how to estimate how much memory is required to  
>>> support the
>>> large index search
>>>
>>> One major factor that may result in heap space problems is if you  
>>> are
>> doing
>>> any form of sorting when searching. Do you have any form of  
>>> default sort
>> in
>>> your application? Also, the type of field used for sorting is  
>>> important
>> with
>>> regard to memory consumption.
>>>
>>> This issue has been discussed before on the list. (You can search  
>>> the
>>> archive for sorting and memory consumption.)
>>>
>>> - Aleksander
>>>
>>> On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zb...@yahoo.com>  
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I
>>>> am a beginner on using lucene. We developed an application to
>>>> create and search index using lucene 2.3.1. We would like to know  
>>>> how
>>>> to estimate how much memory is required to support
>>>> the index search given an index.
>>>>
>>>> Recently,
>>>> the size of the index has reached to about 200GB with 197M of  
>>>> documents
>>>> and 223M of terms. Our application starts having intermittent
>>>> "OutOfMemoryError: Java heap space" when we use
>>>> it to search the index. We use JProfiler to get the following  
>>>> memory
>>> allocation when we do one keyword search:
>>>>
>>>> char[]                                                        332MB
>>>> org.apache.lucene.index.TermInfo            194MB
>>>> java.lang.String                                        146MB
>>>> org.apache.lucene.index.Term                99,823KB
>>>> org.apache.lucene.index.Term                24,956KB
>>>> org.apache.lucene.index.TermInfo[]        24,956KB
>>>>
>>>> byte[]                                                    188MB
>>>> long[]                                                    49,912KB
>>>>
>>>> The memory allocation for the first 6 types of objects does not  
>>>> change
>>> when we change the search criteria. Could you please give me some  
>>> advice
>>> what major factors will affect the memory allocation
>>>> and how those factors will affect the memory usage precisely on  
>>>> search?
>>> Is it possible to reduce the memory usage on search?
>>>>
>>>>
>>>> Thank you,
>>>>
>>>>
>>>> Zhibin
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --Aleksander M. Stensby
>>> Senior software developer
>>> Integrasco A/S
>>> www.integrasco.no
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to estimate how much memory is required to support the large index search

Posted by Zhibin Mai <zb...@yahoo.com>.
You are right.

Cheers,

Zhibin




________________________________
From: Chris Lu <ch...@gmail.com>
To: java-user@lucene.apache.org
Sent: Monday, November 17, 2008 11:13:44 PM
Subject: Re: how to estimate how much memory is required to support the large index search

So looks like you are not really doing much sorting? This index divisor
affects reader.terms(), but not too much with sorting.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!


On Mon, Nov 17, 2008 at 6:21 PM, Zhibin Mai <zb...@yahoo.com> wrote:

> It is a cache tunning setting in IndexReader. It can be set via method
> setTermInfosIndexDivisor(int).
>
> Thanks,
>
> Zhibin
>
>
>
>
> ________________________________
> From: Chris Lu <ch...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Monday, November 17, 2008 7:07:21 PM
> Subject: Re: how to estimate how much memory is required to support the
> large index search
>
> Calculation looks right. But what's the "Index divisor" that you mentioned?
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
> On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <zb...@yahoo.com> wrote:
>
> > Aleksander,
> >
> > I figured it out that most of heap was consumed by the Term cache. In our
> > case, the index has 233 millions of terms and 6.4 millions of them were
> > loaded into the cache when we did the search. I roughly did a calculation
> > that each term will need how much memory, it is about
> > 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for
> > String Object for term text + 2 * length(Char[]) for term text.
> >
> > In our case, the average length of term text is 25 characters, that means
> > each term needs at least 122 bytes. The cache for 6.4 millions of terms
> > needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for cache
> is
> > larger than 980MB. We work around the cache issue for Terms by setting
> index
> > divisor of the IndexReader to a higher value. Actually, the performance
> of
> > search is good even using index divisor as 4.
> >
> > Thanks,
> >
> > Zhibin
> >
> >
> >
> >
> > ________________________________
> > From: Aleksander M. Stensby <al...@integrasco.no>
> > To: java-user@lucene.apache.org
> > Sent: Monday, November 17, 2008 2:31:04 AM
> > Subject: Re: how to estimate how much memory is required to support the
> > large index search
> >
> > One major factor that may result in heap space problems is if you are
> doing
> > any form of sorting when searching. Do you have any form of default sort
> in
> > your application? Also, the type of field used for sorting is important
> with
> > regard to memory consumption.
> >
> > This issue has been discussed before on the list. (You can search the
> > archive for sorting and memory consumption.)
> >
> > - Aleksander
> >
> > On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zb...@yahoo.com> wrote:
> >
> > > Hello,
> > >
> > > I
> > > am a beginner on using lucene. We developed an application to
> > > create and search index using lucene 2.3.1. We would like to know how
> > > to estimate how much memory is required to support
> > > the index search given an index.
> > >
> > > Recently,
> > > the size of the index has reached to about 200GB with 197M of documents
> > > and 223M of terms. Our application starts having intermittent
> > > "OutOfMemoryError: Java heap space" when we use
> > > it to search the index. We use JProfiler to get the following memory
> > allocation when we do one keyword search:
> > >
> > > char[]                                                        332MB
> > > org.apache.lucene.index.TermInfo            194MB
> > > java.lang.String                                        146MB
> > > org.apache.lucene.index.Term                99,823KB
> > > org.apache.lucene.index.Term                24,956KB
> > > org.apache.lucene.index.TermInfo[]        24,956KB
> > >
> > > byte[]                                                    188MB
> > > long[]                                                    49,912KB
> > >
> > > The memory allocation for the first 6 types of objects does not change
> > when we change the search criteria. Could you please give me some advice
> > what major factors will affect the memory allocation
> > > and how those factors will affect the memory usage precisely on search?
> > Is it possible to reduce the memory usage on search?
> > >
> > >
> > > Thank you,
> > >
> > >
> > > Zhibin
> > >
> > >
> > >
> >
> >
> >
> > --Aleksander M. Stensby
> > Senior software developer
> > Integrasco A/S
> > www.integrasco.no
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
>
>
>
>
>



      

Re: how to estimate how much memory is required to support the large index search

Posted by Chris Lu <ch...@gmail.com>.
So looks like you are not really doing much sorting? This index divisor
affects reader.terms(), but not too much with sorting.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!


On Mon, Nov 17, 2008 at 6:21 PM, Zhibin Mai <zb...@yahoo.com> wrote:

> It is a cache tunning setting in IndexReader. It can be set via method
> setTermInfosIndexDivisor(int).
>
> Thanks,
>
> Zhibin
>
>
>
>
> ________________________________
> From: Chris Lu <ch...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Monday, November 17, 2008 7:07:21 PM
> Subject: Re: how to estimate how much memory is required to support the
> large index search
>
> Calculation looks right. But what's the "Index divisor" that you mentioned?
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
> On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <zb...@yahoo.com> wrote:
>
> > Aleksander,
> >
> > I figured it out that most of heap was consumed by the Term cache. In our
> > case, the index has 233 millions of terms and 6.4 millions of them were
> > loaded into the cache when we did the search. I roughly did a calculation
> > that each term will need how much memory, it is about
> > 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for
> > String Object for term text + 2 * length(Char[]) for term text.
> >
> > In our case, the average length of term text is 25 characters, that means
> > each term needs at least 122 bytes. The cache for 6.4 millions of terms
> > needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for cache
> is
> > larger than 980MB. We work around the cache issue for Terms by setting
> index
> > divisor of the IndexReader to a higher value. Actually, the performance
> of
> > search is good even using index divisor as 4.
> >
> > Thanks,
> >
> > Zhibin
> >
> >
> >
> >
> > ________________________________
> > From: Aleksander M. Stensby <al...@integrasco.no>
> > To: java-user@lucene.apache.org
> > Sent: Monday, November 17, 2008 2:31:04 AM
> > Subject: Re: how to estimate how much memory is required to support the
> > large index search
> >
> > One major factor that may result in heap space problems is if you are
> doing
> > any form of sorting when searching. Do you have any form of default sort
> in
> > your application? Also, the type of field used for sorting is important
> with
> > regard to memory consumption.
> >
> > This issue has been discussed before on the list. (You can search the
> > archive for sorting and memory consumption.)
> >
> > - Aleksander
> >
> > On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zb...@yahoo.com> wrote:
> >
> > > Hello,
> > >
> > > I
> > > am a beginner on using lucene. We developed an application to
> > > create and search index using lucene 2.3.1. We would like to know how
> > > to estimate how much memory is required to support
> > > the index search given an index.
> > >
> > > Recently,
> > > the size of the index has reached to about 200GB with 197M of documents
> > > and 223M of terms. Our application starts having intermittent
> > > "OutOfMemoryError: Java heap space" when we use
> > > it to search the index. We use JProfiler to get the following memory
> > allocation when we do one keyword search:
> > >
> > > char[]                                                        332MB
> > > org.apache.lucene.index.TermInfo            194MB
> > > java.lang.String                                        146MB
> > > org.apache.lucene.index.Term                99,823KB
> > > org.apache.lucene.index.Term                24,956KB
> > > org.apache.lucene.index.TermInfo[]        24,956KB
> > >
> > > byte[]                                                    188MB
> > > long[]                                                    49,912KB
> > >
> > > The memory allocation for the first 6 types of objects does not change
> > when we change the search criteria. Could you please give me some advice
> > what major factors will affect the memory allocation
> > > and how those factors will affect the memory usage precisely on search?
> > Is it possible to reduce the memory usage on search?
> > >
> > >
> > > Thank you,
> > >
> > >
> > > Zhibin
> > >
> > >
> > >
> >
> >
> >
> > --Aleksander M. Stensby
> > Senior software developer
> > Integrasco A/S
> > www.integrasco.no
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
>
>
>
>
>

Re: how to estimate how much memory is required to support the large index search

Posted by Zhibin Mai <zb...@yahoo.com>.
It is a cache tunning setting in IndexReader. It can be set via method setTermInfosIndexDivisor(int). 

Thanks,

Zhibin




________________________________
From: Chris Lu <ch...@gmail.com>
To: java-user@lucene.apache.org
Sent: Monday, November 17, 2008 7:07:21 PM
Subject: Re: how to estimate how much memory is required to support the large index search

Calculation looks right. But what's the "Index divisor" that you mentioned?

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <zb...@yahoo.com> wrote:

> Aleksander,
>
> I figured it out that most of heap was consumed by the Term cache. In our
> case, the index has 233 millions of terms and 6.4 millions of them were
> loaded into the cache when we did the search. I roughly did a calculation
> that each term will need how much memory, it is about
> 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for
> String Object for term text + 2 * length(Char[]) for term text.
>
> In our case, the average length of term text is 25 characters, that means
> each term needs at least 122 bytes. The cache for 6.4 millions of terms
> needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for cache is
> larger than 980MB. We work around the cache issue for Terms by setting index
> divisor of the IndexReader to a higher value. Actually, the performance of
> search is good even using index divisor as 4.
>
> Thanks,
>
> Zhibin
>
>
>
>
> ________________________________
> From: Aleksander M. Stensby <al...@integrasco.no>
> To: java-user@lucene.apache.org
> Sent: Monday, November 17, 2008 2:31:04 AM
> Subject: Re: how to estimate how much memory is required to support the
> large index search
>
> One major factor that may result in heap space problems is if you are doing
> any form of sorting when searching. Do you have any form of default sort in
> your application? Also, the type of field used for sorting is important with
> regard to memory consumption.
>
> This issue has been discussed before on the list. (You can search the
> archive for sorting and memory consumption.)
>
> - Aleksander
>
> On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zb...@yahoo.com> wrote:
>
> > Hello,
> >
> > I
> > am a beginner on using lucene. We developed an application to
> > create and search index using lucene 2.3.1. We would like to know how
> > to estimate how much memory is required to support
> > the index search given an index.
> >
> > Recently,
> > the size of the index has reached to about 200GB with 197M of documents
> > and 223M of terms. Our application starts having intermittent
> > "OutOfMemoryError: Java heap space" when we use
> > it to search the index. We use JProfiler to get the following memory
> allocation when we do one keyword search:
> >
> > char[]                                                        332MB
> > org.apache.lucene.index.TermInfo            194MB
> > java.lang.String                                        146MB
> > org.apache.lucene.index.Term                99,823KB
> > org.apache.lucene.index.Term                24,956KB
> > org.apache.lucene.index.TermInfo[]        24,956KB
> >
> > byte[]                                                    188MB
> > long[]                                                    49,912KB
> >
> > The memory allocation for the first 6 types of objects does not change
> when we change the search criteria. Could you please give me some advice
> what major factors will affect the memory allocation
> > and how those factors will affect the memory usage precisely on search?
> Is it possible to reduce the memory usage on search?
> >
> >
> > Thank you,
> >
> >
> > Zhibin
> >
> >
> >
>
>
>
> --Aleksander M. Stensby
> Senior software developer
> Integrasco A/S
> www.integrasco.no
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>



      

Re: how to estimate how much memory is required to support the large index search

Posted by Chris Lu <ch...@gmail.com>.
Calculation looks right. But what's the "Index divisor" that you mentioned?

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <zb...@yahoo.com> wrote:

> Aleksander,
>
> I figured it out that most of heap was consumed by the Term cache. In our
> case, the index has 233 millions of terms and 6.4 millions of them were
> loaded into the cache when we did the search. I roughly did a calculation
> that each term will need how much memory, it is about
> 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for
> String Object for term text + 2 * length(Char[]) for term text.
>
> In our case, the average length of term text is 25 characters, that means
> each term needs at least 122 bytes. The cache for 6.4 millions of terms
> needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for cache is
> larger than 980MB. We work around the cache issue for Terms by setting index
> divisor of the IndexReader to a higher value. Actually, the performance of
> search is good even using index divisor as 4.
>
> Thanks,
>
> Zhibin
>
>
>
>
> ________________________________
> From: Aleksander M. Stensby <al...@integrasco.no>
> To: java-user@lucene.apache.org
> Sent: Monday, November 17, 2008 2:31:04 AM
> Subject: Re: how to estimate how much memory is required to support the
> large index search
>
> One major factor that may result in heap space problems is if you are doing
> any form of sorting when searching. Do you have any form of default sort in
> your application? Also, the type of field used for sorting is important with
> regard to memory consumption.
>
> This issue has been discussed before on the list. (You can search the
> archive for sorting and memory consumption.)
>
> - Aleksander
>
> On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zb...@yahoo.com> wrote:
>
> > Hello,
> >
> > I
> > am a beginner on using lucene. We developed an application to
> > create and search index using lucene 2.3.1. We would like to know how
> > to estimate how much memory is required to support
> > the index search given an index.
> >
> > Recently,
> > the size of the index has reached to about 200GB with 197M of documents
> > and 223M of terms. Our application starts having intermittent
> > "OutOfMemoryError: Java heap space" when we use
> > it to search the index. We use JProfiler to get the following memory
> allocation when we do one keyword search:
> >
> > char[]                                                        332MB
> > org.apache.lucene.index.TermInfo            194MB
> > java.lang.String                                        146MB
> > org.apache.lucene.index.Term                99,823KB
> > org.apache.lucene.index.Term                24,956KB
> > org.apache.lucene.index.TermInfo[]        24,956KB
> >
> > byte[]                                                    188MB
> > long[]                                                    49,912KB
> >
> > The memory allocation for the first 6 types of objects does not change
> when we change the search criteria. Could you please give me some advice
> what major factors will affect the memory allocation
> > and how those factors will affect the memory usage precisely on search?
> Is it possible to reduce the memory usage on search?
> >
> >
> > Thank you,
> >
> >
> > Zhibin
> >
> >
> >
>
>
>
> --Aleksander M. Stensby
> Senior software developer
> Integrasco A/S
> www.integrasco.no
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>

Re: how to estimate how much memory is required to support the large index search

Posted by Zhibin Mai <zb...@yahoo.com>.
Aleksander,

I figured it out that most of heap was consumed by the Term cache. In our case, the index has 233 millions of terms and 6.4 millions of them were loaded into the cache when we did the search. I roughly did a calculation that each term will need how much memory, it is about
16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for String Object for term text + 2 * length(Char[]) for term text.

In our case, the average length of term text is 25 characters, that means each term needs at least 122 bytes. The cache for 6.4 millions of terms needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for cache is larger than 980MB. We work around the cache issue for Terms by setting index divisor of the IndexReader to a higher value. Actually, the performance of search is good even using index divisor as 4.

Thanks,

Zhibin




________________________________
From: Aleksander M. Stensby <al...@integrasco.no>
To: java-user@lucene.apache.org
Sent: Monday, November 17, 2008 2:31:04 AM
Subject: Re: how to estimate how much memory is required to support the large index search

One major factor that may result in heap space problems is if you are doing any form of sorting when searching. Do you have any form of default sort in your application? Also, the type of field used for sorting is important with regard to memory consumption.

This issue has been discussed before on the list. (You can search the archive for sorting and memory consumption.)

- Aleksander

On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zb...@yahoo.com> wrote:

> Hello,
> 
> I
> am a beginner on using lucene. We developed an application to
> create and search index using lucene 2.3.1. We would like to know how
> to estimate how much memory is required to support
> the index search given an index.
> 
> Recently,
> the size of the index has reached to about 200GB with 197M of documents
> and 223M of terms. Our application starts having intermittent
> "OutOfMemoryError: Java heap space" when we use
> it to search the index. We use JProfiler to get the following memory allocation when we do one keyword search:
> 
> char[]                                                        332MB
> org.apache.lucene.index.TermInfo            194MB
> java.lang.String                                        146MB
> org.apache.lucene.index.Term                99,823KB
> org.apache.lucene.index.Term                24,956KB
> org.apache.lucene.index.TermInfo[]        24,956KB
> 
> byte[]                                                    188MB
> long[]                                                    49,912KB
> 
> The memory allocation for the first 6 types of objects does not change when we change the search criteria. Could you please give me some advice what major factors will affect the memory allocation
> and how those factors will affect the memory usage precisely on search? Is it possible to reduce the memory usage on search?
> 
> 
> Thank you,
> 
> 
> Zhibin
> 
> 
> 



--Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

Re: how to estimate how much memory is required to support the large index search

Posted by "Aleksander M. Stensby" <al...@integrasco.no>.
One major factor that may result in heap space problems is if you are  
doing any form of sorting when searching. Do you have any form of default  
sort in your application? Also, the type of field used for sorting is  
important with regard to memory consumption.

This issue has been discussed before on the list. (You can search the  
archive for sorting and memory consumption.)

- Aleksander

On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zb...@yahoo.com> wrote:

> Hello,
>
> I
> am a beginner on using lucene. We developed an application to
> create and search index using lucene 2.3.1. We would like to know how
> to estimate how much memory is required to support
> the index search given an index.
>
> Recently,
> the size of the index has reached to about 200GB with 197M of documents
> and 223M of terms. Our application starts having intermittent
> "OutOfMemoryError: Java heap space" when we use
> it to search the index. We use JProfiler to get the following memory  
> allocation when we do one keyword search:
>
> char[]                                                        332MB
> org.apache.lucene.index.TermInfo            194MB
> java.lang.String                                        146MB
> org.apache.lucene.index.Term                99,823KB
> org.apache.lucene.index.Term                24,956KB
> org.apache.lucene.index.TermInfo[]        24,956KB
>
> byte[]                                                    188MB
> long[]                                                    49,912KB
>
> The memory allocation for the first 6 types of objects does not change  
> when we change the search criteria. Could you please give me some advice  
> what major factors will affect the memory allocation
> and how those factors will affect the memory usage precisely on search?  
> Is it possible to reduce the memory usage on search?
>
>
> Thank you,
>
>
> Zhibin
>
>
>



-- 
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org