You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by James Dunn <ja...@yahoo.com> on 2004/05/26 21:02:04 UTC

Memory usage

Hello,

I was wondering if anyone has had problems with memory
usage and MultiSearcher.

My index is composed of two sub-indexes that I search
with a MultiSearcher.  The total size of the index is
about 3.7GB with the larger sub-index being 3.6GB and
the smaller being 117MB.

I am using Lucene 1.3 Final with the compound file
format.

Also I search across about 50 fields but I don't use
wildcard or range queries. 

Doing repeated searches in this way seems to
eventually chew up about 500MB of memory which seems
excessive to me.

Does anyone have any ideas where I could look to
reduce the memory my queries consume?

Thanks,

Jim


	
		
__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Memory usage

Posted by James Dunn <ja...@yahoo.com>.

Otis,

My app does run within Tomcat.  But when I started
getting these OutOfMemoryErrors I wrote a little unit
test to watch the memory usage without Tomcat in the
middle and I still see the memory usage.

Thanks,

Jim
--- Otis Gospodnetic <ot...@yahoo.com>
wrote:
> Sorry if I'm stating the obvious.  Is this happening
> in some
> stand-alone unit tests, or are you running things
> from some application
> and in some environment, like Tomcat, Jetty or in
> some non-web app?
> 
> Your queries are pretty big (although I recall some
> people using even
> bigger ones... but it all depends on the hardware
> they had), but are
> you sure running out of memory is due to Lucene, or
> could it be a leak
> in the app from which you are running queries?
> 
> Otis
> 
> 
> --- James Dunn <ja...@yahoo.com> wrote:
> > Doug,
> > 
> > We only search on analyzed text fields.  There are
> a
> > couple of additional fields in the index like
> > OBJECT_ID that are keywords but we don't search
> > against those, we only use them once we get a
> result
> > back to find the thing that document represents.
> > 
> > Thanks,
> > 
> > Jim
> > 
> > --- Doug Cutting <cu...@apache.org> wrote:
> > > It is cached by the IndexReader and lives until
> the
> > > index reader is 
> > > garbage collected.  50-70 searchable fields is a
> > > *lot*.  How many are 
> > > analyzed text, and how many are simply keywords?
> > > 
> > > Doug
> > > 
> > > James Dunn wrote:
> > > > Doug,
> > > > 
> > > > Thanks!  
> > > > 
> > > > I just asked a question regarding how to
> calculate
> > > the
> > > > memory requirements for a search.  Does this
> > > memory
> > > > only get used only during the search operation
> > > itself,
> > > > or is it referenced by the Hits object or
> anything
> > > > else after the actual search completes?
> > > > 
> > > > Thanks again,
> > > > 
> > > > Jim
> > > > 
> > > > 
> > > > --- Doug Cutting <cu...@apache.org> wrote:
> > > > 
> > > >>James Dunn wrote:
> > > >>
> > > >>>Also I search across about 50 fields but I
> don't
> > > >>
> > > >>use
> > > >>
> > > >>>wildcard or range queries. 
> > > >>
> > > >>Lucene uses one byte of RAM per document per
> > > >>searched field, to hold the 
> > > >>normalization values.  So if you search a 10M
> > > >>document collection with 
> > > >>50 fields, then you'll end up using 500MB of
> RAM.
> > > >>
> > > >>If you're using unanalyzed fields, then an
> easy
> > > >>workaround to reduce the 
> > > >>number of fields is to combine many in a
> single
> > > >>field.  So, instead of, 
> > > >>e.g., using an "f1" field with value "abc",
> and an
> > > >>"f2" field with value 
> > > >>"efg", use a single field named "f" with
> values
> > > >>"1_abc" and "2_efg".
> > > >>
> > > >>We could optimize this in Lucene.  If no
> values of
> > > >>an indexed field are 
> > > >>analyzed, then we could store no norms for the
> > > field
> > > >>and hence read none 
> > > >>into memory.  This wouldn't be too hard to
> > > >>implement...
> > > >>
> > > >>Doug
> > > >>
> > > >>
> > > > 
> > > >
> > >
> >
>
---------------------------------------------------------------------
> > > > 
> > > >>To unsubscribe, e-mail:
> > > >>lucene-user-unsubscribe@jakarta.apache.org
> > > >>For additional commands, e-mail:
> > > >>lucene-user-help@jakarta.apache.org
> > > >>
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 	
> > > > 		
> > > > __________________________________
> > > > Do you Yahoo!?
> > > > Friends.  Fun.  Try the all-new Yahoo!
> Messenger.
> > > > http://messenger.yahoo.com/ 
> > > > 
> > > >
> > >
> >
>
---------------------------------------------------------------------
> > > > To unsubscribe, e-mail:
> > > lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail:
> > > lucene-user-help@jakarta.apache.org
> > > > 
> > > 
> > >
> >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> > > lucene-user-help@jakarta.apache.org
> > > 
> > 
> > 
> > 
> > 	
> > 		
> > __________________________________
> > Do you Yahoo!?
> > Friends.  Fun.  Try the all-new Yahoo! Messenger.
> > http://messenger.yahoo.com/ 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 



	
		
__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Memory usage

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Sorry if I'm stating the obvious.  Is this happening in some
stand-alone unit tests, or are you running things from some application
and in some environment, like Tomcat, Jetty or in some non-web app?

Your queries are pretty big (although I recall some people using even
bigger ones... but it all depends on the hardware they had), but are
you sure running out of memory is due to Lucene, or could it be a leak
in the app from which you are running queries?

Otis


--- James Dunn <ja...@yahoo.com> wrote:
> Doug,
> 
> We only search on analyzed text fields.  There are a
> couple of additional fields in the index like
> OBJECT_ID that are keywords but we don't search
> against those, we only use them once we get a result
> back to find the thing that document represents.
> 
> Thanks,
> 
> Jim
> 
> --- Doug Cutting <cu...@apache.org> wrote:
> > It is cached by the IndexReader and lives until the
> > index reader is 
> > garbage collected.  50-70 searchable fields is a
> > *lot*.  How many are 
> > analyzed text, and how many are simply keywords?
> > 
> > Doug
> > 
> > James Dunn wrote:
> > > Doug,
> > > 
> > > Thanks!  
> > > 
> > > I just asked a question regarding how to calculate
> > the
> > > memory requirements for a search.  Does this
> > memory
> > > only get used only during the search operation
> > itself,
> > > or is it referenced by the Hits object or anything
> > > else after the actual search completes?
> > > 
> > > Thanks again,
> > > 
> > > Jim
> > > 
> > > 
> > > --- Doug Cutting <cu...@apache.org> wrote:
> > > 
> > >>James Dunn wrote:
> > >>
> > >>>Also I search across about 50 fields but I don't
> > >>
> > >>use
> > >>
> > >>>wildcard or range queries. 
> > >>
> > >>Lucene uses one byte of RAM per document per
> > >>searched field, to hold the 
> > >>normalization values.  So if you search a 10M
> > >>document collection with 
> > >>50 fields, then you'll end up using 500MB of RAM.
> > >>
> > >>If you're using unanalyzed fields, then an easy
> > >>workaround to reduce the 
> > >>number of fields is to combine many in a single
> > >>field.  So, instead of, 
> > >>e.g., using an "f1" field with value "abc", and an
> > >>"f2" field with value 
> > >>"efg", use a single field named "f" with values
> > >>"1_abc" and "2_efg".
> > >>
> > >>We could optimize this in Lucene.  If no values of
> > >>an indexed field are 
> > >>analyzed, then we could store no norms for the
> > field
> > >>and hence read none 
> > >>into memory.  This wouldn't be too hard to
> > >>implement...
> > >>
> > >>Doug
> > >>
> > >>
> > > 
> > >
> >
> ---------------------------------------------------------------------
> > > 
> > >>To unsubscribe, e-mail:
> > >>lucene-user-unsubscribe@jakarta.apache.org
> > >>For additional commands, e-mail:
> > >>lucene-user-help@jakarta.apache.org
> > >>
> > > 
> > > 
> > > 
> > > 
> > > 	
> > > 		
> > > __________________________________
> > > Do you Yahoo!?
> > > Friends.  Fun.  Try the all-new Yahoo! Messenger.
> > > http://messenger.yahoo.com/ 
> > > 
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> > lucene-user-help@jakarta.apache.org
> > > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> > lucene-user-help@jakarta.apache.org
> > 
> 
> 
> 
> 	
> 		
> __________________________________
> Do you Yahoo!?
> Friends.  Fun.  Try the all-new Yahoo! Messenger.
> http://messenger.yahoo.com/ 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Memory usage

Posted by James Dunn <ja...@yahoo.com>.

Doug,

We only search on analyzed text fields.  There are a
couple of additional fields in the index like
OBJECT_ID that are keywords but we don't search
against those, we only use them once we get a result
back to find the thing that document represents.

Thanks,

Jim

--- Doug Cutting <cu...@apache.org> wrote:
> It is cached by the IndexReader and lives until the
> index reader is 
> garbage collected.  50-70 searchable fields is a
> *lot*.  How many are 
> analyzed text, and how many are simply keywords?
> 
> Doug
> 
> James Dunn wrote:
> > Doug,
> > 
> > Thanks!  
> > 
> > I just asked a question regarding how to calculate
> the
> > memory requirements for a search.  Does this
> memory
> > only get used only during the search operation
> itself,
> > or is it referenced by the Hits object or anything
> > else after the actual search completes?
> > 
> > Thanks again,
> > 
> > Jim
> > 
> > 
> > --- Doug Cutting <cu...@apache.org> wrote:
> > 
> >>James Dunn wrote:
> >>
> >>>Also I search across about 50 fields but I don't
> >>
> >>use
> >>
> >>>wildcard or range queries. 
> >>
> >>Lucene uses one byte of RAM per document per
> >>searched field, to hold the 
> >>normalization values.  So if you search a 10M
> >>document collection with 
> >>50 fields, then you'll end up using 500MB of RAM.
> >>
> >>If you're using unanalyzed fields, then an easy
> >>workaround to reduce the 
> >>number of fields is to combine many in a single
> >>field.  So, instead of, 
> >>e.g., using an "f1" field with value "abc", and an
> >>"f2" field with value 
> >>"efg", use a single field named "f" with values
> >>"1_abc" and "2_efg".
> >>
> >>We could optimize this in Lucene.  If no values of
> >>an indexed field are 
> >>analyzed, then we could store no norms for the
> field
> >>and hence read none 
> >>into memory.  This wouldn't be too hard to
> >>implement...
> >>
> >>Doug
> >>
> >>
> > 
> >
>
---------------------------------------------------------------------
> > 
> >>To unsubscribe, e-mail:
> >>lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail:
> >>lucene-user-help@jakarta.apache.org
> >>
> > 
> > 
> > 
> > 
> > 	
> > 		
> > __________________________________
> > Do you Yahoo!?
> > Friends.  Fun.  Try the all-new Yahoo! Messenger.
> > http://messenger.yahoo.com/ 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 



	
		
__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Memory usage

Posted by Doug Cutting <cu...@apache.org>.

It is cached by the IndexReader and lives until the index reader is 
garbage collected.  50-70 searchable fields is a *lot*.  How many are 
analyzed text, and how many are simply keywords?

Doug

James Dunn wrote:
> Doug,
> 
> Thanks!  
> 
> I just asked a question regarding how to calculate the
> memory requirements for a search.  Does this memory
> only get used only during the search operation itself,
> or is it referenced by the Hits object or anything
> else after the actual search completes?
> 
> Thanks again,
> 
> Jim
> 
> 
> --- Doug Cutting <cu...@apache.org> wrote:
> 
>>James Dunn wrote:
>>
>>>Also I search across about 50 fields but I don't
>>
>>use
>>
>>>wildcard or range queries. 
>>
>>Lucene uses one byte of RAM per document per
>>searched field, to hold the 
>>normalization values.  So if you search a 10M
>>document collection with 
>>50 fields, then you'll end up using 500MB of RAM.
>>
>>If you're using unanalyzed fields, then an easy
>>workaround to reduce the 
>>number of fields is to combine many in a single
>>field.  So, instead of, 
>>e.g., using an "f1" field with value "abc", and an
>>"f2" field with value 
>>"efg", use a single field named "f" with values
>>"1_abc" and "2_efg".
>>
>>We could optimize this in Lucene.  If no values of
>>an indexed field are 
>>analyzed, then we could store no norms for the field
>>and hence read none 
>>into memory.  This wouldn't be too hard to
>>implement...
>>
>>Doug
>>
>>
> 
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail:
>>lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail:
>>lucene-user-help@jakarta.apache.org
>>
> 
> 
> 
> 
> 	
> 		
> __________________________________
> Do you Yahoo!?
> Friends.  Fun.  Try the all-new Yahoo! Messenger.
> http://messenger.yahoo.com/ 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Memory usage

Posted by James Dunn <ja...@yahoo.com>.

Doug,

Thanks!  

I just asked a question regarding how to calculate the
memory requirements for a search.  Does this memory
only get used only during the search operation itself,
or is it referenced by the Hits object or anything
else after the actual search completes?

Thanks again,

Jim


--- Doug Cutting <cu...@apache.org> wrote:
> James Dunn wrote:
> > Also I search across about 50 fields but I don't
> use
> > wildcard or range queries. 
> 
> Lucene uses one byte of RAM per document per
> searched field, to hold the 
> normalization values.  So if you search a 10M
> document collection with 
> 50 fields, then you'll end up using 500MB of RAM.
> 
> If you're using unanalyzed fields, then an easy
> workaround to reduce the 
> number of fields is to combine many in a single
> field.  So, instead of, 
> e.g., using an "f1" field with value "abc", and an
> "f2" field with value 
> "efg", use a single field named "f" with values
> "1_abc" and "2_efg".
> 
> We could optimize this in Lucene.  If no values of
> an indexed field are 
> analyzed, then we could store no norms for the field
> and hence read none 
> into memory.  This wouldn't be too hard to
> implement...
> 
> Doug
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 



	
		
__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Memory usage

Posted by Doug Cutting <cu...@apache.org>.

James Dunn wrote:
> Also I search across about 50 fields but I don't use
> wildcard or range queries. 

Lucene uses one byte of RAM per document per searched field, to hold the 
normalization values.  So if you search a 10M document collection with 
50 fields, then you'll end up using 500MB of RAM.

If you're using unanalyzed fields, then an easy workaround to reduce the 
number of fields is to combine many in a single field.  So, instead of, 
e.g., using an "f1" field with value "abc", and an "f2" field with value 
"efg", use a single field named "f" with values "1_abc" and "2_efg".

We could optimize this in Lucene.  If no values of an indexed field are 
analyzed, then we could store no norms for the field and hence read none 
into memory.  This wouldn't be too hard to implement...

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org