You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/09/23 15:40:51 UTC

Modifying payloads

Has anyone done any work on modifying payloads "inline" in the index?   
The idea being that if you know the length of the payload isn't  
changing, you can modify it w/o reindexing.  Some concerns that come  
to mind are thread-safety, etc.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Modifying payloads

Posted by Jason Rutherglen <ja...@gmail.com>.
> read of such a CSF is not sequential then anymore, because we
> need to seek to each of those file in case of updates.

I think here we'd keep the newly updated blocks in RAM (kind of
like LUCENE-1313 NRT). Where ParallelIndexWriter.commit flushes
the blocks to disk as needed or at a given RAM usage threshold
(similar to IW.maxBufferSize).

Also, now that we're adding PFOR as a postings format, it seemed
when I looked at the Kamikaze implementation, that PFOR uses
blocks in a way that's easily replaceable in RAM compared to
delta encoding. So I think as flex indexing gets further along,
writing the parallel NRT index becomes somewhat clearer. Then we
can more easily support payload updates and parallel updateable
untokenized indexes.

On Wed, Sep 23, 2009 at 7:17 AM, Michael Busch <bu...@gmail.com> wrote:
> I guess that's the big wish we all have! :)
>
> One other big problem with doing inline updates is that we have to
> write-once model in Lucene, so we never touch a file after it was written.
>
> My goal is to be able to achieve what you have in mind with CSF and parallel
> incremental indexing. When we have CSFs we will be able to define a
> fixed-length CSF.  With parallel indexing you will be able to create a
> parallel index containing only that field and you can write a new version of
> it containing the update. This works very efficiently if you want to change
> a large number of field values. However, if you often change only a single
> value, then we should work on a solution optimized for that use case,
> something like Yonik pointed out recently on java-dev.
>
> The other possible performance problem is that when you end up with multiple
> files (parallel indexes) on disk each containing only a few updates, that a
> read of such a CSF is not sequential then anymore, because we need to seek
> to each of those file in case of updates.
>
> With two-dimensional merge policies (mentioned on the parallel indexing wiki
> page) we should be able to balance merge and read overhead by controlling
> the number of parallel indexes we have, similar to what is done with
> "normal" segment merges today.
>
> Also the perf problems of frequent small updates can probably be improved by
> have a buffer and writing the new parallel index in the background, while
> serving read requests from the buffer (for NRT readers).
>
>  Michael
>
> On Wed, Sep 23, 2009 at 3:40 PM, Grant Ingersoll <gs...@apache.org>
> wrote:
>>
>> Has anyone done any work on modifying payloads "inline" in the index?  The
>> idea being that if you know the length of the payload isn't changing, you
>> can modify it w/o reindexing.  Some concerns that come to mind are
>> thread-safety, etc.
>>
>> -Grant
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Modifying payloads

Posted by Michael Busch <bu...@gmail.com>.
I guess that's the big wish we all have! :)

One other big problem with doing inline updates is that we have to
write-once model in Lucene, so we never touch a file after it was written.

My goal is to be able to achieve what you have in mind with CSF and parallel
incremental indexing. When we have CSFs we will be able to define a
fixed-length CSF.  With parallel indexing you will be able to create a
parallel index containing only that field and you can write a new version of
it containing the update. This works very efficiently if you want to change
a large number of field values. However, if you often change only a single
value, then we should work on a solution optimized for that use case,
something like Yonik pointed out recently on java-dev.

The other possible performance problem is that when you end up with multiple
files (parallel indexes) on disk each containing only a few updates, that a
read of such a CSF is not sequential then anymore, because we need to seek
to each of those file in case of updates.

With two-dimensional merge policies (mentioned on the parallel indexing wiki
page) we should be able to balance merge and read overhead by controlling
the number of parallel indexes we have, similar to what is done with
"normal" segment merges today.

Also the perf problems of frequent small updates can probably be improved by
have a buffer and writing the new parallel index in the background, while
serving read requests from the buffer (for NRT readers).

 Michael

On Wed, Sep 23, 2009 at 3:40 PM, Grant Ingersoll <gs...@apache.org>wrote:

> Has anyone done any work on modifying payloads "inline" in the index?  The
> idea being that if you know the length of the payload isn't changing, you
> can modify it w/o reindexing.  Some concerns that come to mind are
> thread-safety, etc.
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>