You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Aleksey Serba <as...@gmail.com> on 2006/07/17 15:54:01 UTC

Re: question about custom sort method

Hi!

Peter, I have exactly the same situation described below.
- I have DistanceComparatorSource to sort results by distance from
specified spatial coordinates point. Point is different for each
query.
- I do not close Searcher after each query.
- I get "java.lang.OutOfMemoryError: Java heap space" after several
request. Using www.yourkit.com's profiler and research lucene source
code i've found sort comparators caching and there's no any options.

Peter, did you find the solution do not cache these sorting values?


Yonik, thank you for your suggestion, we use solr codebase already :)
To tell the truth, first time i thought this is solr caching problem (
i've modify SolrIndexSearcher to get lucene Searcher and search
directly without any solr caching )

I can't figure out how to use FunctionQuery - is there any wiki pages
/ examples or something?

Thanks

Alex

On 5/18/06, Peter Keegan <pe...@gmail.com> wrote:
> Suppose I have a custom sorting 'DocScoreComparator' for computing distances
> on each search hit from a specified coordinate (similar to the
> DistanceComparatorSource example in LIA). Assume that the 'specified
> coordinate' is different for each query. This means a new custom comparator
> must be created for each query, which is ok. However, Lucene caches the
> comparator even though it will never be reused. This could result in heavy
> memory usage if many queries are performed before the IndexReader is
> updated.
>
> Is there any way to avoid having lucene cache the custom sorting objects?
>
>

On 5/12/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> Yes, it does compute these distances for all the terms for the field
> specified, but only once (per IndexReader).  This is where the
> techniques Solr employs comes in real handy... warming up caches by
> running searches and sorts before putting a index into service.


On 5/12/06, Urvashi Gadi <ug...@emory.edu> wrote:
> I am looking at DistanceComparatorSource class (for csutom sorting) and
> looks like it calculates the distance for each record in the index and
> not just the records returned from search, making the system very slow.
>
> Is my observation correct? Are there ways to optimize this process?
>
> Thanks,
> Urvashi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: question about custom sort method

Posted by Aleksey Serba <as...@gmail.com>.
Erik,

You can reproduce OutOfMemory easily. I've attach test files - this is
altered DistanceSortingTest example from LIA book. Also you can
profile it and see caching of distances arrays.

I'll try to investigate the problem, make patch to trunk version
(probably non caching option) and get back to you later.

Thanks


On 7/17/06, Aleksey Serba <as...@gmail.com> wrote:
> Erik,
>
> I think Brian have the problem with continuous caching the same
> sorting values, i.e. he has a few points to calc distance from.
> In such case you can implement equals and hashCode methods based on
> point value and you'll have one cached comparator per different center
> point value.
>
>
> On 7/17/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> > There is a known issue with the DistanceComparatorSource in the
> > "Lucene in Action" source code:
> >
> >         <http://www.lucenebook.com/blog/errata/2006/03/01/memory_leak.html>
> >
> > Maybe this advice will help fix the issue you're having?
> >
> >         Erik
> >
> >
> > On Jul 17, 2006, at 9:54 AM, Aleksey Serba wrote:
> >
> > > Hi!
> > >
> > > Peter, I have exactly the same situation described below.
> > > - I have DistanceComparatorSource to sort results by distance from
> > > specified spatial coordinates point. Point is different for each
> > > query.
> > > - I do not close Searcher after each query.
> > > - I get "java.lang.OutOfMemoryError: Java heap space" after several
> > > request. Using www.yourkit.com's profiler and research lucene source
> > > code i've found sort comparators caching and there's no any options.
> > >
> > > Peter, did you find the solution do not cache these sorting values?
> > >
> > >
> > > Yonik, thank you for your suggestion, we use solr codebase already :)
> > > To tell the truth, first time i thought this is solr caching problem (
> > > i've modify SolrIndexSearcher to get lucene Searcher and search
> > > directly without any solr caching )
> > >
> > > I can't figure out how to use FunctionQuery - is there any wiki pages
> > > / examples or something?
> > >
> > > Thanks
> > >
> > > Alex
> > >
> > > On 5/18/06, Peter Keegan <pe...@gmail.com> wrote:
> > >> Suppose I have a custom sorting 'DocScoreComparator' for computing
> > >> distances
> > >> on each search hit from a specified coordinate (similar to the
> > >> DistanceComparatorSource example in LIA). Assume that the 'specified
> > >> coordinate' is different for each query. This means a new custom
> > >> comparator
> > >> must be created for each query, which is ok. However, Lucene
> > >> caches the
> > >> comparator even though it will never be reused. This could result
> > >> in heavy
> > >> memory usage if many queries are performed before the IndexReader is
> > >> updated.
> > >>
> > >> Is there any way to avoid having lucene cache the custom sorting
> > >> objects?
> > >>
> > >>
> > >
> > > On 5/12/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> > >> Yes, it does compute these distances for all the terms for the field
> > >> specified, but only once (per IndexReader).  This is where the
> > >> techniques Solr employs comes in real handy... warming up caches by
> > >> running searches and sorts before putting a index into service.
> > >
> > >
> > > On 5/12/06, Urvashi Gadi <ug...@emory.edu> wrote:
> > >> I am looking at DistanceComparatorSource class (for csutom
> > >> sorting) and
> > >> looks like it calculates the distance for each record in the index
> > >> and
> > >> not just the records returned from search, making the system very
> > >> slow.
> > >>
> > >> Is my observation correct? Are there ways to optimize this process?
> > >>
> > >> Thanks,
> > >> Urvashi
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>


Re: question about custom sort method

Posted by Aleksey Serba <as...@gmail.com>.
Erik,

I think Brian have the problem with continuous caching the same
sorting values, i.e. he has a few points to calc distance from.
In such case you can implement equals and hashCode methods based on
point value and you'll have one cached comparator per different center
point value.


On 7/17/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> There is a known issue with the DistanceComparatorSource in the
> "Lucene in Action" source code:
>
>         <http://www.lucenebook.com/blog/errata/2006/03/01/memory_leak.html>
>
> Maybe this advice will help fix the issue you're having?
>
>         Erik
>
>
> On Jul 17, 2006, at 9:54 AM, Aleksey Serba wrote:
>
> > Hi!
> >
> > Peter, I have exactly the same situation described below.
> > - I have DistanceComparatorSource to sort results by distance from
> > specified spatial coordinates point. Point is different for each
> > query.
> > - I do not close Searcher after each query.
> > - I get "java.lang.OutOfMemoryError: Java heap space" after several
> > request. Using www.yourkit.com's profiler and research lucene source
> > code i've found sort comparators caching and there's no any options.
> >
> > Peter, did you find the solution do not cache these sorting values?
> >
> >
> > Yonik, thank you for your suggestion, we use solr codebase already :)
> > To tell the truth, first time i thought this is solr caching problem (
> > i've modify SolrIndexSearcher to get lucene Searcher and search
> > directly without any solr caching )
> >
> > I can't figure out how to use FunctionQuery - is there any wiki pages
> > / examples or something?
> >
> > Thanks
> >
> > Alex
> >
> > On 5/18/06, Peter Keegan <pe...@gmail.com> wrote:
> >> Suppose I have a custom sorting 'DocScoreComparator' for computing
> >> distances
> >> on each search hit from a specified coordinate (similar to the
> >> DistanceComparatorSource example in LIA). Assume that the 'specified
> >> coordinate' is different for each query. This means a new custom
> >> comparator
> >> must be created for each query, which is ok. However, Lucene
> >> caches the
> >> comparator even though it will never be reused. This could result
> >> in heavy
> >> memory usage if many queries are performed before the IndexReader is
> >> updated.
> >>
> >> Is there any way to avoid having lucene cache the custom sorting
> >> objects?
> >>
> >>
> >
> > On 5/12/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> >> Yes, it does compute these distances for all the terms for the field
> >> specified, but only once (per IndexReader).  This is where the
> >> techniques Solr employs comes in real handy... warming up caches by
> >> running searches and sorts before putting a index into service.
> >
> >
> > On 5/12/06, Urvashi Gadi <ug...@emory.edu> wrote:
> >> I am looking at DistanceComparatorSource class (for csutom
> >> sorting) and
> >> looks like it calculates the distance for each record in the index
> >> and
> >> not just the records returned from search, making the system very
> >> slow.
> >>
> >> Is my observation correct? Are there ways to optimize this process?
> >>
> >> Thanks,
> >> Urvashi
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: question about custom sort method

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
There is a known issue with the DistanceComparatorSource in the  
"Lucene in Action" source code:

	<http://www.lucenebook.com/blog/errata/2006/03/01/memory_leak.html>

Maybe this advice will help fix the issue you're having?

	Erik


On Jul 17, 2006, at 9:54 AM, Aleksey Serba wrote:

> Hi!
>
> Peter, I have exactly the same situation described below.
> - I have DistanceComparatorSource to sort results by distance from
> specified spatial coordinates point. Point is different for each
> query.
> - I do not close Searcher after each query.
> - I get "java.lang.OutOfMemoryError: Java heap space" after several
> request. Using www.yourkit.com's profiler and research lucene source
> code i've found sort comparators caching and there's no any options.
>
> Peter, did you find the solution do not cache these sorting values?
>
>
> Yonik, thank you for your suggestion, we use solr codebase already :)
> To tell the truth, first time i thought this is solr caching problem (
> i've modify SolrIndexSearcher to get lucene Searcher and search
> directly without any solr caching )
>
> I can't figure out how to use FunctionQuery - is there any wiki pages
> / examples or something?
>
> Thanks
>
> Alex
>
> On 5/18/06, Peter Keegan <pe...@gmail.com> wrote:
>> Suppose I have a custom sorting 'DocScoreComparator' for computing  
>> distances
>> on each search hit from a specified coordinate (similar to the
>> DistanceComparatorSource example in LIA). Assume that the 'specified
>> coordinate' is different for each query. This means a new custom  
>> comparator
>> must be created for each query, which is ok. However, Lucene  
>> caches the
>> comparator even though it will never be reused. This could result  
>> in heavy
>> memory usage if many queries are performed before the IndexReader is
>> updated.
>>
>> Is there any way to avoid having lucene cache the custom sorting  
>> objects?
>>
>>
>
> On 5/12/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> Yes, it does compute these distances for all the terms for the field
>> specified, but only once (per IndexReader).  This is where the
>> techniques Solr employs comes in real handy... warming up caches by
>> running searches and sorts before putting a index into service.
>
>
> On 5/12/06, Urvashi Gadi <ug...@emory.edu> wrote:
>> I am looking at DistanceComparatorSource class (for csutom  
>> sorting) and
>> looks like it calculates the distance for each record in the index  
>> and
>> not just the records returned from search, making the system very  
>> slow.
>>
>> Is my observation correct? Are there ways to optimize this process?
>>
>> Thanks,
>> Urvashi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: question about custom sort method

Posted by Yonik Seeley <ys...@gmail.com>.
On 7/17/06, Aleksey Serba <as...@gmail.com> wrote:
> Yonik, thank you for your suggestion, we use solr codebase already :)
> To tell the truth, first time i thought this is solr caching problem (
> i've modify SolrIndexSearcher to get lucene Searcher and search
> directly without any solr caching )
>
> I can't figure out how to use FunctionQuery - is there any wiki pages
> / examples or something?

For using it in Java code, there is the JavaDoc
http://incubator.apache.org/solr/docs/api/org/apache/solr/search/function/FunctionQuery.html

You would need to create your own ValueSource that would calculate the
distance between other ValueSources.

There is also a hack in SolrQueryParser that I never documented,
because it is a bit of a hack.  It can create FunctionQuerys for
built-in functions when it sees the magic field name _val_

So _val_:"myfield"  evaluates to the numeric value of myfield.
and you can do things like
_val_:"recip(rord(mydatefield),1000,1000,1)" to boost more recent
documents by a date field.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org