You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2008/11/03 19:43:50 UTC

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

: I'm having a similar problem with my application, although we are using
: lucene 2.3.2. The problem we have is that we are required to sort on most of
: the fields (20 at least). Is there any way of changing the cache being used?

there is a patch in Jira that takes a completley different approach
towards dealing with sorting by using the *stored* values of the documents
that match...

https://issues.apache.org/jira/browse/LUCENE-769

...in theory it's "better" for use cases where:

  1) you are confident you are going to get back a "small" result set in
     proportion to the total size of your index.
  2) you need to support sorting on "lots" of fields and don't have enough
     ram for all of the FieldCaches of those fields

So far the only person to test it out (and post comments) is the patch
submitter, but if other people report success it might be worth adding as
a contrib.

(Disclaimer: it looks like the patch was updated after my last 
review/comments.  i have not read, nor claim any opinion about the 
currently attached patch)




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemory Problems Lucene 2.4 / Tomcat

Posted by Pablo Saavedra <pa...@gmail.com>.
I hope that helps, if you find anything interesting do post it somewhere.
I'm afraid I'm a little bit far away from New Orleans at the moment.

Regards.

2008/11/4 Todd Benge <to...@gmail.com>

> Thanks Pablo.
>
> I'll be flying to New Orleans tomorrow for ApacheCon and would love
> the opportunity to talk with others about architectures others are
> using.
>
> Todd
>
>
>
> On 11/4/08, PabloS <pa...@gmail.com> wrote:
> >
> > Sure Todd,
> >
> > the idea basically consist in the following:
> >
> > - Subclassing FIeldSortedHitQueue and calling support with an empty
> > SortField array: this disables caching because the comparators are
> retrieved
> > during construction
> > - Creating a new SortComparatorSource that creates the sort comparators
> > similarly to FieldCacheImpl but without performing any caching
> > - Creating a new HitCollector that behaves just like TopFieldDocCollector
> > but uses my custom FieldSortedHitQueue. The tricky part was creating a
> > TopFieldDoc instance (which has default visibility constructor. I managed
> to
> > do it creating an empty TopFieldDocCollector, asking the TopFieldDocs and
> > modifiying that instance.
> > - Overriding the Searcher's search(Weight, Filter, int, Sort) so it uses
> my
> > HitCollector (I use the Hits class, which internally calls that method)
> >
> > That way, you completely override field caching and you'll suffer the
> > performance consequences. What I did was making the SortComparatorSource
> a
> > spring bean and adding a caching interceptor to it. You an plug your
> > favorite cache library in, in my case ehcache.
> >
> > I hope you find this useful.
> > Regards.
> >
> >
> > tbenge wrote:
> >>
> >> Pablo,
> >>
> >> Would you mind adding a little more detail about how you're working
> >> around the problem?
> >>
> >> I'm still evaluating our different options so am interested in what you
> >> did.
> >>
> >> Todd
> >>
> >> On Mon, Nov 3, 2008 at 2:37 PM, PabloS <pa...@gmail.com>
> wrote:
> >>>
> >>> Thanks hossman, but I've already 'solved' the problem without the need
> to
> >>> patch lucene. I had to code a bit around Lucene's visibility
> restrictions
> >>> but I've managed to completely skip the field caching mechanism and add
> >>> ehcache to it.
> >>>
> >>> At the moment it seems to be working quite well, although not as fast
> as
> >>> it
> >>> was when lucene performed the caching.
> >>>
> >>> Thanks a lot for the info anyway.
> >>> Regards.
> >>>
> >>> hossman wrote:
> >>>>
> >>>>
> >>>> : I'm having a similar problem with my application, although we are
> >>>> using
> >>>> : lucene 2.3.2. The problem we have is that we are required to sort
> on
> >>>> most of
> >>>> : the fields (20 at least). Is there any way of changing the cache
> being
> >>>> used?
> >>>>
> >>>> there is a patch in Jira that takes a completley different approach
> >>>> towards dealing with sorting by using the *stored* values of the
> >>>> documents
> >>>> that match...
> >>>>
> >>>> https://issues.apache.org/jira/browse/LUCENE-769
> >>>>
> >>>> ...in theory it's "better" for use cases where:
> >>>>
> >>>>   1) you are confident you are going to get back a "small" result set
> in
> >>>>      proportion to the total size of your index.
> >>>>   2) you need to support sorting on "lots" of fields and don't have
> >>>> enough
> >>>>      ram for all of the FieldCaches of those fields
> >>>>
> >>>> So far the only person to test it out (and post comments) is the patch
> >>>> submitter, but if other people report success it might be worth adding
> >>>> as
> >>>> a contrib.
> >>>>
> >>>> (Disclaimer: it looks like the patch was updated after my last
> >>>> review/comments.  i have not read, nor claim any opinion about the
> >>>> currently attached patch)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -Hoss
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://www.nabble.com/OutOfMemory-Problems-Lucene-2.4---Tomcat-tp20236834p20311386.html
> >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/OutOfMemory-Problems-Lucene-2.4---Tomcat-tp20236834p20326783.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> --
> Sent from Gmail for mobile | mobile.google.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

Posted by Todd Benge <to...@gmail.com>.
Thanks Pablo.

I'll be flying to New Orleans tomorrow for ApacheCon and would love
the opportunity to talk with others about architectures others are
using.

Todd



On 11/4/08, PabloS <pa...@gmail.com> wrote:
>
> Sure Todd,
>
> the idea basically consist in the following:
>
> - Subclassing FIeldSortedHitQueue and calling support with an empty
> SortField array: this disables caching because the comparators are retrieved
> during construction
> - Creating a new SortComparatorSource that creates the sort comparators
> similarly to FieldCacheImpl but without performing any caching
> - Creating a new HitCollector that behaves just like TopFieldDocCollector
> but uses my custom FieldSortedHitQueue. The tricky part was creating a
> TopFieldDoc instance (which has default visibility constructor. I managed to
> do it creating an empty TopFieldDocCollector, asking the TopFieldDocs and
> modifiying that instance.
> - Overriding the Searcher's search(Weight, Filter, int, Sort) so it uses my
> HitCollector (I use the Hits class, which internally calls that method)
>
> That way, you completely override field caching and you'll suffer the
> performance consequences. What I did was making the SortComparatorSource a
> spring bean and adding a caching interceptor to it. You an plug your
> favorite cache library in, in my case ehcache.
>
> I hope you find this useful.
> Regards.
>
>
> tbenge wrote:
>>
>> Pablo,
>>
>> Would you mind adding a little more detail about how you're working
>> around the problem?
>>
>> I'm still evaluating our different options so am interested in what you
>> did.
>>
>> Todd
>>
>> On Mon, Nov 3, 2008 at 2:37 PM, PabloS <pa...@gmail.com> wrote:
>>>
>>> Thanks hossman, but I've already 'solved' the problem without the need to
>>> patch lucene. I had to code a bit around Lucene's visibility restrictions
>>> but I've managed to completely skip the field caching mechanism and add
>>> ehcache to it.
>>>
>>> At the moment it seems to be working quite well, although not as fast as
>>> it
>>> was when lucene performed the caching.
>>>
>>> Thanks a lot for the info anyway.
>>> Regards.
>>>
>>> hossman wrote:
>>>>
>>>>
>>>> : I'm having a similar problem with my application, although we are
>>>> using
>>>> : lucene 2.3.2. The problem we have is that we are required to sort on
>>>> most of
>>>> : the fields (20 at least). Is there any way of changing the cache being
>>>> used?
>>>>
>>>> there is a patch in Jira that takes a completley different approach
>>>> towards dealing with sorting by using the *stored* values of the
>>>> documents
>>>> that match...
>>>>
>>>> https://issues.apache.org/jira/browse/LUCENE-769
>>>>
>>>> ...in theory it's "better" for use cases where:
>>>>
>>>>   1) you are confident you are going to get back a "small" result set in
>>>>      proportion to the total size of your index.
>>>>   2) you need to support sorting on "lots" of fields and don't have
>>>> enough
>>>>      ram for all of the FieldCaches of those fields
>>>>
>>>> So far the only person to test it out (and post comments) is the patch
>>>> submitter, but if other people report success it might be worth adding
>>>> as
>>>> a contrib.
>>>>
>>>> (Disclaimer: it looks like the patch was updated after my last
>>>> review/comments.  i have not read, nor claim any opinion about the
>>>> currently attached patch)
>>>>
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/OutOfMemory-Problems-Lucene-2.4---Tomcat-tp20236834p20311386.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/OutOfMemory-Problems-Lucene-2.4---Tomcat-tp20236834p20326783.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

-- 
Sent from Gmail for mobile | mobile.google.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemory Problems Lucene 2.4 / Tomcat

Posted by PabloS <pa...@gmail.com>.
Sure Todd,

the idea basically consist in the following:

- Subclassing FIeldSortedHitQueue and calling support with an empty
SortField array: this disables caching because the comparators are retrieved
during construction
- Creating a new SortComparatorSource that creates the sort comparators
similarly to FieldCacheImpl but without performing any caching
- Creating a new HitCollector that behaves just like TopFieldDocCollector
but uses my custom FieldSortedHitQueue. The tricky part was creating a
TopFieldDoc instance (which has default visibility constructor. I managed to
do it creating an empty TopFieldDocCollector, asking the TopFieldDocs and
modifiying that instance.
- Overriding the Searcher's search(Weight, Filter, int, Sort) so it uses my
HitCollector (I use the Hits class, which internally calls that method)

That way, you completely override field caching and you'll suffer the
performance consequences. What I did was making the SortComparatorSource a
spring bean and adding a caching interceptor to it. You an plug your
favorite cache library in, in my case ehcache.

I hope you find this useful.
Regards.


tbenge wrote:
> 
> Pablo,
> 
> Would you mind adding a little more detail about how you're working
> around the problem?
> 
> I'm still evaluating our different options so am interested in what you
> did.
> 
> Todd
> 
> On Mon, Nov 3, 2008 at 2:37 PM, PabloS <pa...@gmail.com> wrote:
>>
>> Thanks hossman, but I've already 'solved' the problem without the need to
>> patch lucene. I had to code a bit around Lucene's visibility restrictions
>> but I've managed to completely skip the field caching mechanism and add
>> ehcache to it.
>>
>> At the moment it seems to be working quite well, although not as fast as
>> it
>> was when lucene performed the caching.
>>
>> Thanks a lot for the info anyway.
>> Regards.
>>
>> hossman wrote:
>>>
>>>
>>> : I'm having a similar problem with my application, although we are
>>> using
>>> : lucene 2.3.2. The problem we have is that we are required to sort on
>>> most of
>>> : the fields (20 at least). Is there any way of changing the cache being
>>> used?
>>>
>>> there is a patch in Jira that takes a completley different approach
>>> towards dealing with sorting by using the *stored* values of the
>>> documents
>>> that match...
>>>
>>> https://issues.apache.org/jira/browse/LUCENE-769
>>>
>>> ...in theory it's "better" for use cases where:
>>>
>>>   1) you are confident you are going to get back a "small" result set in
>>>      proportion to the total size of your index.
>>>   2) you need to support sorting on "lots" of fields and don't have
>>> enough
>>>      ram for all of the FieldCaches of those fields
>>>
>>> So far the only person to test it out (and post comments) is the patch
>>> submitter, but if other people report success it might be worth adding
>>> as
>>> a contrib.
>>>
>>> (Disclaimer: it looks like the patch was updated after my last
>>> review/comments.  i have not read, nor claim any opinion about the
>>> currently attached patch)
>>>
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/OutOfMemory-Problems-Lucene-2.4---Tomcat-tp20236834p20311386.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/OutOfMemory-Problems-Lucene-2.4---Tomcat-tp20236834p20326783.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemory Problems Lucene 2.4 / Tomcat

Posted by Todd Benge <to...@gmail.com>.
Pablo,

Would you mind adding a little more detail about how you're working
around the problem?

I'm still evaluating our different options so am interested in what you did.

Todd

On Mon, Nov 3, 2008 at 2:37 PM, PabloS <pa...@gmail.com> wrote:
>
> Thanks hossman, but I've already 'solved' the problem without the need to
> patch lucene. I had to code a bit around Lucene's visibility restrictions
> but I've managed to completely skip the field caching mechanism and add
> ehcache to it.
>
> At the moment it seems to be working quite well, although not as fast as it
> was when lucene performed the caching.
>
> Thanks a lot for the info anyway.
> Regards.
>
> hossman wrote:
>>
>>
>> : I'm having a similar problem with my application, although we are using
>> : lucene 2.3.2. The problem we have is that we are required to sort on
>> most of
>> : the fields (20 at least). Is there any way of changing the cache being
>> used?
>>
>> there is a patch in Jira that takes a completley different approach
>> towards dealing with sorting by using the *stored* values of the documents
>> that match...
>>
>> https://issues.apache.org/jira/browse/LUCENE-769
>>
>> ...in theory it's "better" for use cases where:
>>
>>   1) you are confident you are going to get back a "small" result set in
>>      proportion to the total size of your index.
>>   2) you need to support sorting on "lots" of fields and don't have enough
>>      ram for all of the FieldCaches of those fields
>>
>> So far the only person to test it out (and post comments) is the patch
>> submitter, but if other people report success it might be worth adding as
>> a contrib.
>>
>> (Disclaimer: it looks like the patch was updated after my last
>> review/comments.  i have not read, nor claim any opinion about the
>> currently attached patch)
>>
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Problems-Lucene-2.4---Tomcat-tp20236834p20311386.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemory Problems Lucene 2.4 / Tomcat

Posted by PabloS <pa...@gmail.com>.
Thanks hossman, but I've already 'solved' the problem without the need to
patch lucene. I had to code a bit around Lucene's visibility restrictions
but I've managed to completely skip the field caching mechanism and add
ehcache to it.

At the moment it seems to be working quite well, although not as fast as it
was when lucene performed the caching.

Thanks a lot for the info anyway.
Regards.

hossman wrote:
> 
> 
> : I'm having a similar problem with my application, although we are using
> : lucene 2.3.2. The problem we have is that we are required to sort on
> most of
> : the fields (20 at least). Is there any way of changing the cache being
> used?
> 
> there is a patch in Jira that takes a completley different approach
> towards dealing with sorting by using the *stored* values of the documents
> that match...
> 
> https://issues.apache.org/jira/browse/LUCENE-769
> 
> ...in theory it's "better" for use cases where:
> 
>   1) you are confident you are going to get back a "small" result set in
>      proportion to the total size of your index.
>   2) you need to support sorting on "lots" of fields and don't have enough
>      ram for all of the FieldCaches of those fields
> 
> So far the only person to test it out (and post comments) is the patch
> submitter, but if other people report success it might be worth adding as
> a contrib.
> 
> (Disclaimer: it looks like the patch was updated after my last 
> review/comments.  i have not read, nor claim any opinion about the 
> currently attached patch)
> 
> 
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/OutOfMemory-Problems-Lucene-2.4---Tomcat-tp20236834p20311386.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org