You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2009/07/10 07:02:14 UTC
[jira] Created: (LUCENE-1738) Expand IndexWriter to allow for
replicating segments in near realtime
Expand IndexWriter to allow for replicating segments in near realtime
---------------------------------------------------------------------
Key: LUCENE-1738
URL: https://issues.apache.org/jira/browse/LUCENE-1738
Project: Lucene - Java
Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
Fix For: 3.0
When LUCENE-1313 is completed, it would be good to have a way to
replicate segments from one IndexWriter to another.
* Callback on successful flush (maybe for other events as well?)
* Ability to access files for a segment (which would presumably
be read from the IW ramdir), then copy them to a temporary
serialized ramdir (or equivalent as ramdir uses extra space in
blocks, whereas we'll already know the size of the files before
we write them).
* On the receiving end, we may be able to use
addIndexesNoOptimize(Directory[]), however this would entail
each directory having an extraneous segment_N file for each
replicated update (so we may want another format).
* It will rely on having a new public version of SegmentInfo.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-1738) IndexWriter.addIndexes without
syncing
Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Rutherglen updated LUCENE-1738:
-------------------------------------
Attachment: LUCENE-1738.patch
Very basic start at the patch. The not IW.dir check is removed in the DirectoryReader ctor called by readerPool. This conflicts with the way addIndexes currently works. I guess we could add a parameter to segmentInfo indicating it's ok to include the segmentInfo in getReader?
* Added IW.addIndexesNoSync which doesn't stop indexing during the method, nor does it synchronously copy the indexes over. The new indexes are scheduled as merges.
* commit and close call resolveExternalSegments.
* I think we'll want a boolean parameter that synchronously copies the indexes over but does not start any merging. This is for copying from a filesystem index. In the replication use case, we're adding ramDirs so we don't need to immediately merge/copy them over.
* Needs more unit tests
> IndexWriter.addIndexes without syncing
> --------------------------------------
>
> Key: LUCENE-1738
> URL: https://issues.apache.org/jira/browse/LUCENE-1738
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1738.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> When LUCENE-1313 is completed, it would be good to have a way to
> replicate segments from one IndexWriter to another.
> * Callback on successful flush (maybe for other events as well?)
> * Ability to access files for a segment (which would presumably
> be read from the IW ramdir), then copy them to a temporary
> serialized ramdir (or equivalent as ramdir uses extra space in
> blocks, whereas we'll already know the size of the files before
> we write them).
> * On the receiving end, we may be able to use
> addIndexesNoOptimize(Directory[]), however this would entail
> each directory having an extraneous segment_N file for each
> replicated update (so we may want another format).
> * It will rely on having a new public version of SegmentInfo.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-1738) IndexWriter.addIndexes without
syncing
Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Rutherglen updated LUCENE-1738:
-------------------------------------
Summary: IndexWriter.addIndexes without syncing (was: Expand IndexWriter to allow for replicating segments in near realtime)
Changed because we can use addIndexes for replication (hopefully) .
> IndexWriter.addIndexes without syncing
> --------------------------------------
>
> Key: LUCENE-1738
> URL: https://issues.apache.org/jira/browse/LUCENE-1738
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> When LUCENE-1313 is completed, it would be good to have a way to
> replicate segments from one IndexWriter to another.
> * Callback on successful flush (maybe for other events as well?)
> * Ability to access files for a segment (which would presumably
> be read from the IW ramdir), then copy them to a temporary
> serialized ramdir (or equivalent as ramdir uses extra space in
> blocks, whereas we'll already know the size of the files before
> we write them).
> * On the receiving end, we may be able to use
> addIndexesNoOptimize(Directory[]), however this would entail
> each directory having an extraneous segment_N file for each
> replicated update (so we may want another format).
> * It will rely on having a new public version of SegmentInfo.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-1738) Expand IndexWriter to allow for
replicating segments in near realtime
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1738:
---------------------------------------
Fix Version/s: (was: 3.0)
3.1
Moving fix version to 3.1... 3.0 will be just a mechanical release (no new features), quickly following 2.9.
> Expand IndexWriter to allow for replicating segments in near realtime
> ---------------------------------------------------------------------
>
> Key: LUCENE-1738
> URL: https://issues.apache.org/jira/browse/LUCENE-1738
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> When LUCENE-1313 is completed, it would be good to have a way to
> replicate segments from one IndexWriter to another.
> * Callback on successful flush (maybe for other events as well?)
> * Ability to access files for a segment (which would presumably
> be read from the IW ramdir), then copy them to a temporary
> serialized ramdir (or equivalent as ramdir uses extra space in
> blocks, whereas we'll already know the size of the files before
> we write them).
> * On the receiving end, we may be able to use
> addIndexesNoOptimize(Directory[]), however this would entail
> each directory having an extraneous segment_N file for each
> replicated update (so we may want another format).
> * It will rely on having a new public version of SegmentInfo.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-1738) IndexWriter.addIndexes without
syncing
Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Rutherglen updated LUCENE-1738:
-------------------------------------
Attachment: LUCENE-1738.patch
* Added copy parameter that calls resolveExternalSegments
* Next up is a test case showing the merge exception problem
> IndexWriter.addIndexes without syncing
> --------------------------------------
>
> Key: LUCENE-1738
> URL: https://issues.apache.org/jira/browse/LUCENE-1738
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1738.patch, LUCENE-1738.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> When LUCENE-1313 is completed, it would be good to have a way to
> replicate segments from one IndexWriter to another.
> * Callback on successful flush (maybe for other events as well?)
> * Ability to access files for a segment (which would presumably
> be read from the IW ramdir), then copy them to a temporary
> serialized ramdir (or equivalent as ramdir uses extra space in
> blocks, whereas we'll already know the size of the files before
> we write them).
> * On the receiving end, we may be able to use
> addIndexesNoOptimize(Directory[]), however this would entail
> each directory having an extraneous segment_N file for each
> replicated update (so we may want another format).
> * It will rely on having a new public version of SegmentInfo.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org