You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Timo Nentwig <lu...@nitwit.de> on 2008/04/13 16:45:37 UTC

Sorting consumes hundreds of MBytes RAM

Hi!

I found that when sorting the search result -depending on the amount of data 
in the field to sort by - this can easily lead to FieldCacheImpl to allocate 
hundreds of MByte RAM.

How does this work internally? It seems as if all data for this field found in 
the entire index is read into memory (?).

And question #2: what am I going to do against it? Index  sharding?

Thanks,
Timo

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Sorting consumes hundreds of MBytes RAM

Posted by Eran Sevi <er...@gmail.com>.

If you read the payloads in sequence they're not arranged by their original
position whereas when you use a stored field you get the terms in the
correct order.
If you need to sort the values it doesn't matter of course.
On Fri, Apr 25, 2008 at 5:42 PM, Nadav Har'El <ny...@math.technion.ac.il>
wrote:

> On Mon, Apr 14, 2008, Chris Hostetter wrote about "Re: Sorting consumes
> hundreds of MBytes RAM":
> > : And question #2: what am I going to do against it? Index  sharding?
> >
> > The only suggestion i can offer is to take a look at LUCENE-769 ... it
> > takes a completley differnet appraoch of using a FieldSelector to access
> > the *stored* field and sort on it ... the memory usage of FieldCache is
> > eliminatedand the expense of longer search times ... in cases where you
> > expect queries to match on a very small subset of the total index, it
> > could be worth using.
>
> Instead of using a stored field, I would recommend using *payloads*.
> If you store the field's valye as payload on a custom term, you basically
> get a posting-list of the field value, which can be (theoretically, at
> least)
> efficiently skipped on one hand - and read in sequence on the other hand.
>
> --
> Nadav Har'El                        |       Friday, Apr 25 2008, 20 Nisan
> 5768
> IBM Haifa Research Lab
>  |-----------------------------------------
>                                    |Business jargon is the art of saying
> http://nadav.harel.org.il           |nothing while appearing to say a lot.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Sorting consumes hundreds of MBytes RAM

Posted by Nadav Har'El <ny...@math.technion.ac.il>.

On Mon, Apr 14, 2008, Chris Hostetter wrote about "Re: Sorting consumes hundreds of MBytes RAM":
> : And question #2: what am I going to do against it? Index  sharding?
> 
> The only suggestion i can offer is to take a look at LUCENE-769 ... it 
> takes a completley differnet appraoch of using a FieldSelector to access 
> the *stored* field and sort on it ... the memory usage of FieldCache is 
> eliminatedand the expense of longer search times ... in cases where you 
> expect queries to match on a very small subset of the total index, it 
> could be worth using.

Instead of using a stored field, I would recommend using *payloads*.
If you store the field's valye as payload on a custom term, you basically
get a posting-list of the field value, which can be (theoretically, at least)
efficiently skipped on one hand - and read in sequence on the other hand.

-- 
Nadav Har'El                        |       Friday, Apr 25 2008, 20 Nisan 5768
IBM Haifa Research Lab              |-----------------------------------------
                                    |Business jargon is the art of saying
http://nadav.harel.org.il           |nothing while appearing to say a lot.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Sorting consumes hundreds of MBytes RAM

Posted by Chris Hostetter <ho...@fucit.org>.

: How does this work internally? It seems as if all data for this field found in 
: the entire index is read into memory (?).

You can think of it as an "inverted-inverted index"  Lucene needs a data 
structure it can usefor fast lookups where the key is the docId and the 
value is something "comparable" for sorting the documents.

: And question #2: what am I going to do against it? Index  sharding?

The only suggestion i can offer is to take a look at LUCENE-769 ... it 
takes a completley differnet appraoch of using a FieldSelector to access 
the *stored* field and sort on it ... the memory usage of FieldCache is 
eliminatedand the expense of longer search times ... in cases where you 
expect queries to match on a very small subset of the total index, it 
could be worth using.

If people try out the patch and like it and report back success with it, 
it's more likely to get commited at some point.  (allthough at this point, 
i'm starting to suspect "column stride fields" is the wave of the future 
for stuff like this ... see LUCENE-1231 for more details, butat this 
point it's totally theoretical)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org