You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Jason Rutherglen <ja...@gmail.com> on 2011/02/02 19:03:41 UTC

Storing an ID alongside a document

I'm curious if there's a new way (using flex or term states) to store
IDs alongside a document and retrieve the IDs of the top N results?
The goal would be to minimize HD seeks, and not use field caches
(because they consume too much heap space) or the doc stores (which
require two seeks).  One possible way using the pre-flex system is to
place the IDs into a payload posting that would match all documents,
and then [somehow] retrieve the payload only when needed.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Storing an ID alongside a document

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, Feb 2, 2011 at 9:23 PM, Jason Rutherglen <jason.rutherglen@gmail.com
> wrote:

> Is it?  I thought it would load the values into heap RAM like the
> field cache and in addition save the values to disk? Does it also
> read the values directly from disk?
>

Loading into memory is a separate optional part (i.e. loading a fieldcache
entry), that should use the APIs that read directly from the index.

-Yonik
http://lucidimagination.com

Re: Storing an ID alongside a document

Posted by Jason Rutherglen <ja...@gmail.com>.

> there is a entire RAM resident part and a Iterator API that reads /
> streams data directly from disk.
> look at DocValuesEnum vs, Source

Nice, thanks!

On Thu, Feb 3, 2011 at 12:20 AM, Simon Willnauer
<si...@googlemail.com> wrote:
> On Thu, Feb 3, 2011 at 3:23 AM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>> Is it?  I thought it would load the values into heap RAM like the
>> field cache and in addition save the values to disk?  Does it also
>> read the values directly from disk?
>
> there is a entire RAM resident part and a Iterator API that reads /
> streams data directly from disk.
> look at DocValuesEnum vs, Source
>
> simon
>>
>> On Wed, Feb 2, 2011 at 2:00 PM, Yonik Seeley <yo...@lucidimagination.com> wrote:
>>> That's exactly what the CSF feature is for, right?  (docvalues branch)
>>>
>>> -Yonik
>>> http://lucidimagination.com
>>>
>>>
>>> On Wed, Feb 2, 2011 at 1:03 PM, Jason Rutherglen <jason.rutherglen@gmail.com
>>>> wrote:
>>>
>>>> I'm curious if there's a new way (using flex or term states) to store
>>>> IDs alongside a document and retrieve the IDs of the top N results?
>>>> The goal would be to minimize HD seeks, and not use field caches
>>>> (because they consume too much heap space) or the doc stores (which
>>>> require two seeks).  One possible way using the pre-flex system is to
>>>> place the IDs into a payload posting that would match all documents,
>>>> and then [somehow] retrieve the payload only when needed.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Storing an ID alongside a document

Posted by Simon Willnauer <si...@googlemail.com>.

On Thu, Feb 3, 2011 at 3:23 AM, Jason Rutherglen
<ja...@gmail.com> wrote:
> Is it?  I thought it would load the values into heap RAM like the
> field cache and in addition save the values to disk?  Does it also
> read the values directly from disk?

there is a entire RAM resident part and a Iterator API that reads /
streams data directly from disk.
look at DocValuesEnum vs, Source

simon
>
> On Wed, Feb 2, 2011 at 2:00 PM, Yonik Seeley <yo...@lucidimagination.com> wrote:
>> That's exactly what the CSF feature is for, right?  (docvalues branch)
>>
>> -Yonik
>> http://lucidimagination.com
>>
>>
>> On Wed, Feb 2, 2011 at 1:03 PM, Jason Rutherglen <jason.rutherglen@gmail.com
>>> wrote:
>>
>>> I'm curious if there's a new way (using flex or term states) to store
>>> IDs alongside a document and retrieve the IDs of the top N results?
>>> The goal would be to minimize HD seeks, and not use field caches
>>> (because they consume too much heap space) or the doc stores (which
>>> require two seeks).  One possible way using the pre-flex system is to
>>> place the IDs into a payload posting that would match all documents,
>>> and then [somehow] retrieve the payload only when needed.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Storing an ID alongside a document

Posted by Jason Rutherglen <ja...@gmail.com>.

Is it?  I thought it would load the values into heap RAM like the
field cache and in addition save the values to disk?  Does it also
read the values directly from disk?

On Wed, Feb 2, 2011 at 2:00 PM, Yonik Seeley <yo...@lucidimagination.com> wrote:
> That's exactly what the CSF feature is for, right?  (docvalues branch)
>
> -Yonik
> http://lucidimagination.com
>
>
> On Wed, Feb 2, 2011 at 1:03 PM, Jason Rutherglen <jason.rutherglen@gmail.com
>> wrote:
>
>> I'm curious if there's a new way (using flex or term states) to store
>> IDs alongside a document and retrieve the IDs of the top N results?
>> The goal would be to minimize HD seeks, and not use field caches
>> (because they consume too much heap space) or the doc stores (which
>> require two seeks).  One possible way using the pre-flex system is to
>> place the IDs into a payload posting that would match all documents,
>> and then [somehow] retrieve the payload only when needed.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Storing an ID alongside a document

Posted by Yonik Seeley <yo...@lucidimagination.com>.

That's exactly what the CSF feature is for, right?  (docvalues branch)

-Yonik
http://lucidimagination.com


On Wed, Feb 2, 2011 at 1:03 PM, Jason Rutherglen <jason.rutherglen@gmail.com
> wrote:

> I'm curious if there's a new way (using flex or term states) to store
> IDs alongside a document and retrieve the IDs of the top N results?
> The goal would be to minimize HD seeks, and not use field caches
> (because they consume too much heap space) or the doc stores (which
> require two seeks).  One possible way using the pre-flex system is to
> place the IDs into a payload posting that would match all documents,
> and then [somehow] retrieve the payload only when needed.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>