You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2009/07/10 07:02:14 UTC

[jira] Created: (LUCENE-1738) Expand IndexWriter to allow for replicating segments in near realtime

Expand IndexWriter to allow for replicating segments in near realtime
---------------------------------------------------------------------

                 Key: LUCENE-1738
                 URL: https://issues.apache.org/jira/browse/LUCENE-1738
             Project: Lucene - Java
          Issue Type: Improvement
    Affects Versions: 2.4.1
            Reporter: Jason Rutherglen
            Priority: Minor
             Fix For: 3.0


When LUCENE-1313 is completed, it would be good to have a way to
replicate segments from one IndexWriter to another.

* Callback on successful flush (maybe for other events as well?)

* Ability to access files for a segment (which would presumably
be read from the IW ramdir), then copy them to a temporary
serialized ramdir (or equivalent as ramdir uses extra space in
blocks, whereas we'll already know the size of the files before
we write them).

* On the receiving end, we may be able to use
addIndexesNoOptimize(Directory[]), however this would entail
each directory having an extraneous segment_N file for each
replicated update (so we may want another format). 

* It will rely on having a new public version of SegmentInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1738) IndexWriter.addIndexes without syncing

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1738:
-------------------------------------

    Attachment: LUCENE-1738.patch

Very basic start at the patch.  The not IW.dir check is removed in the DirectoryReader ctor called by readerPool.  This conflicts with the way addIndexes currently works.  I guess we could add a parameter to segmentInfo indicating it's ok to include the segmentInfo in getReader?   

* Added IW.addIndexesNoSync which doesn't stop indexing during the method, nor does it synchronously copy the indexes over.  The new indexes are scheduled as merges.  

* commit and close call resolveExternalSegments.

* I think we'll want a boolean parameter that synchronously copies the indexes over but does not start any merging.  This is for copying from a filesystem index.  In the replication use case, we're adding ramDirs so we don't need to immediately merge/copy them over.

* Needs more unit tests

> IndexWriter.addIndexes without syncing
> --------------------------------------
>
>                 Key: LUCENE-1738
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1738
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1738.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When LUCENE-1313 is completed, it would be good to have a way to
> replicate segments from one IndexWriter to another.
> * Callback on successful flush (maybe for other events as well?)
> * Ability to access files for a segment (which would presumably
> be read from the IW ramdir), then copy them to a temporary
> serialized ramdir (or equivalent as ramdir uses extra space in
> blocks, whereas we'll already know the size of the files before
> we write them).
> * On the receiving end, we may be able to use
> addIndexesNoOptimize(Directory[]), however this would entail
> each directory having an extraneous segment_N file for each
> replicated update (so we may want another format). 
> * It will rely on having a new public version of SegmentInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1738) IndexWriter.addIndexes without syncing

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1738:
-------------------------------------

    Summary: IndexWriter.addIndexes without syncing  (was: Expand IndexWriter to allow for replicating segments in near realtime)

Changed because we can use addIndexes for replication (hopefully) .

> IndexWriter.addIndexes without syncing
> --------------------------------------
>
>                 Key: LUCENE-1738
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1738
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When LUCENE-1313 is completed, it would be good to have a way to
> replicate segments from one IndexWriter to another.
> * Callback on successful flush (maybe for other events as well?)
> * Ability to access files for a segment (which would presumably
> be read from the IW ramdir), then copy them to a temporary
> serialized ramdir (or equivalent as ramdir uses extra space in
> blocks, whereas we'll already know the size of the files before
> we write them).
> * On the receiving end, we may be able to use
> addIndexesNoOptimize(Directory[]), however this would entail
> each directory having an extraneous segment_N file for each
> replicated update (so we may want another format). 
> * It will rely on having a new public version of SegmentInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1738) Expand IndexWriter to allow for replicating segments in near realtime

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1738:
---------------------------------------

    Fix Version/s:     (was: 3.0)
                   3.1

Moving fix version to 3.1... 3.0 will be just a mechanical release (no new features), quickly following 2.9.

> Expand IndexWriter to allow for replicating segments in near realtime
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-1738
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1738
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When LUCENE-1313 is completed, it would be good to have a way to
> replicate segments from one IndexWriter to another.
> * Callback on successful flush (maybe for other events as well?)
> * Ability to access files for a segment (which would presumably
> be read from the IW ramdir), then copy them to a temporary
> serialized ramdir (or equivalent as ramdir uses extra space in
> blocks, whereas we'll already know the size of the files before
> we write them).
> * On the receiving end, we may be able to use
> addIndexesNoOptimize(Directory[]), however this would entail
> each directory having an extraneous segment_N file for each
> replicated update (so we may want another format). 
> * It will rely on having a new public version of SegmentInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1738) IndexWriter.addIndexes without syncing

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1738:
-------------------------------------

    Attachment: LUCENE-1738.patch

* Added copy parameter that calls resolveExternalSegments 

* Next up is a test case showing the merge exception problem

> IndexWriter.addIndexes without syncing
> --------------------------------------
>
>                 Key: LUCENE-1738
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1738
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1738.patch, LUCENE-1738.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When LUCENE-1313 is completed, it would be good to have a way to
> replicate segments from one IndexWriter to another.
> * Callback on successful flush (maybe for other events as well?)
> * Ability to access files for a segment (which would presumably
> be read from the IW ramdir), then copy them to a temporary
> serialized ramdir (or equivalent as ramdir uses extra space in
> blocks, whereas we'll already know the size of the files before
> we write them).
> * On the receiving end, we may be able to use
> addIndexesNoOptimize(Directory[]), however this would entail
> each directory having an extraneous segment_N file for each
> replicated update (so we may want another format). 
> * It will rely on having a new public version of SegmentInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org