You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ganesh <em...@yahoo.co.in> on 2009/07/17 11:12:43 UTC

Sorting field contating NULL values consumes field cache memory

I am doing sorting on DateTime with minute resolution. I am having 90 million of records and sorting is consuming nearly 500 MB. 30% records are not part of primary result set and they don't have sort field. But field cache memory (4 * IndexReader.maxDoc() * (# of different fields actually used to sort)) is consumed eventhough 30% of records are not part of sort.

I want to avoid the 30% of records not to be loaded in field cache. How could i achieve this. Any idea are greatly appreciated?

Regards
Ganesh 
Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorting field contating NULL values consumes field cache memory

Posted by Shai Erera <se...@gmail.com>.
FWIW, I had implemented a sort-by-payload feature which performs quite well.
It has a very small memory footprint (actually close to 0), and reads values
from a payload. Payloads, at least from my experience, perform better than
stored fields.

On a comparison I've once made, the sort-by-payload feature performed better
than the FieldCache solution, for the first search. The reason is that
FieldCache reads the values from the stored fields, which is slower than
payload. However subsequent sorts performed better using the FieldCache. For
very large indices though, FieldCache is not an option.

If it's interesting enough, I can do some work to contribute it to Lucene.
It's not a very big package, but not a small one either. I also think that
if this feature will go into Lucene, we can improve FieldCache to read
values from the payload rather than stored fields.

Shai

On Tue, Jul 21, 2009 at 8:17 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : Right now, you can't really do anything about it. In the future, with the
> : new FieldCache API that may go in, you could plug in a custom
> implementation
> : that makes tradeoffs for a sparse array of some kind. The docid is
> currently
> : the index into the array, but with a custom impl you may be able to use a
> : sparse array object.
> : Thats a ways off though.
>
> I have no idea if this patch still applies...
>
> https://issues.apache.org/jira/browse/LUCENE-769
>
> ...but this thread jogged my memory of it.  the lsat time i looked at it
> it still needed some documentation improvements, but it seemed to have
> some potential value for people who have too much data and too little RAM
> to build up a FieldCache for sorting, and were willing to take the
> time/space tradeoff for sorting using stored fields.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Sorting field contating NULL values consumes field cache memory

Posted by Chris Hostetter <ho...@fucit.org>.
: Right now, you can't really do anything about it. In the future, with the
: new FieldCache API that may go in, you could plug in a custom implementation
: that makes tradeoffs for a sparse array of some kind. The docid is currently
: the index into the array, but with a custom impl you may be able to use a
: sparse array object.
: Thats a ways off though.

I have no idea if this patch still applies...

https://issues.apache.org/jira/browse/LUCENE-769

...but this thread jogged my memory of it.  the lsat time i looked at it 
it still needed some documentation improvements, but it seemed to have 
some potential value for people who have too much data and too little RAM 
to build up a FieldCache for sorting, and were willing to take the 
time/space tradeoff for sorting using stored fields.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorting field contating NULL values consumes field cache memory

Posted by Ganesh <em...@yahoo.co.in>.
Thanks. We could this feature be expected 2.9 OR 3.0?

This is a good feature. Mostly Users store sparse data. All records may not have data for all fields. This will reduce the memory consumption to a large extent. In my case almost 30% of records just store information of reference file pointers and it is not part of initial search. 

Regards
Ganesh

----- Original Message ----- 
From: "Mark Miller" <ma...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Monday, July 20, 2009 10:21 PM
Subject: Re: Sorting field contating NULL values consumes field cache memory


> Right now, you can't really do anything about it. In the future, with the
> new FieldCache API that may go in, you could plug in a custom implementation
> that makes tradeoffs for a sparse array of some kind. The docid is currently
> the index into the array, but with a custom impl you may be able to use a
> sparse array object.
> Thats a ways off though.
> 
> - Mark
> 
> On Mon, Jul 20, 2009 at 8:38 AM, Ganesh <em...@yahoo.co.in> wrote:
> 
>> Any ideas on this??
>>
>> Regards
>> Ganesh
>>
>> ----- Original Message -----
>> From: "Ganesh" <em...@yahoo.co.in>
>> To: <ja...@lucene.apache.org>
>> Sent: Friday, July 17, 2009 2:42 PM
>> Subject: Sorting field contating NULL values consumes field cache memory
>>
>>
>> I am doing sorting on DateTime with minute resolution. I am having 90
>> million of records and sorting is consuming nearly 500 MB. 30% records are
>> not part of primary result set and they don't have sort field. But field
>> cache memory (4 * IndexReader.maxDoc() * (# of different fields actually
>> used to sort)) is consumed eventhough 30% of records are not part of sort.
>>
>> I want to avoid the 30% of records not to be loaded in field cache. How
>> could i achieve this. Any idea are greatly appreciated?
>>
>> Regards
>> Ganesh
>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 
> -- 
> -- 
> - Mark
> 
> http://www.lucidimagination.com
>
Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorting field contating NULL values consumes field cache memory

Posted by Mark Miller <ma...@gmail.com>.
Right now, you can't really do anything about it. In the future, with the
new FieldCache API that may go in, you could plug in a custom implementation
that makes tradeoffs for a sparse array of some kind. The docid is currently
the index into the array, but with a custom impl you may be able to use a
sparse array object.
Thats a ways off though.

- Mark

On Mon, Jul 20, 2009 at 8:38 AM, Ganesh <em...@yahoo.co.in> wrote:

> Any ideas on this??
>
> Regards
> Ganesh
>
> ----- Original Message -----
> From: "Ganesh" <em...@yahoo.co.in>
> To: <ja...@lucene.apache.org>
> Sent: Friday, July 17, 2009 2:42 PM
> Subject: Sorting field contating NULL values consumes field cache memory
>
>
> I am doing sorting on DateTime with minute resolution. I am having 90
> million of records and sorting is consuming nearly 500 MB. 30% records are
> not part of primary result set and they don't have sort field. But field
> cache memory (4 * IndexReader.maxDoc() * (# of different fields actually
> used to sort)) is consumed eventhough 30% of records are not part of sort.
>
> I want to avoid the 30% of records not to be loaded in field cache. How
> could i achieve this. Any idea are greatly appreciated?
>
> Regards
> Ganesh
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
-- 
- Mark

http://www.lucidimagination.com

Re: Sorting field contating NULL values consumes field cache memory

Posted by Ganesh <em...@yahoo.co.in>.
Any ideas on this??

Regards
Ganesh

----- Original Message ----- 
From: "Ganesh" <em...@yahoo.co.in>
To: <ja...@lucene.apache.org>
Sent: Friday, July 17, 2009 2:42 PM
Subject: Sorting field contating NULL values consumes field cache memory


I am doing sorting on DateTime with minute resolution. I am having 90 million of records and sorting is consuming nearly 500 MB. 30% records are not part of primary result set and they don't have sort field. But field cache memory (4 * IndexReader.maxDoc() * (# of different fields actually used to sort)) is consumed eventhough 30% of records are not part of sort.

I want to avoid the 30% of records not to be loaded in field cache. How could i achieve this. Any idea are greatly appreciated?

Regards
Ganesh 
Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org