You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steven Rowe (JIRA)" <ji...@apache.org> on 2010/12/17 17:50:01 UTC

[jira] Issue Comment Edited: (LUCENE-2814) stop writing shared doc stores across segments

    [ https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972543#action_12972543 ] 

Steven Rowe edited comment on LUCENE-2814 at 12/17/10 11:49 AM:
----------------------------------------------------------------

bq. Steven, is that on a wiki page?

I don't know, I never put it anywhere, just discussed on dev@l.a.o.  Feel free to do so if you like.

bq. The usage seems a little slim? http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2010-12-17;raw=on

Yeah, it's very rarely used.  

Several Lucene people who use #lucene are strongly against logging, so I set up #lucene-dev as a place to host on-the-record Lucene conversations.  I mentioned it because this is what you want.

      was (Author: steve_rowe):
    bq. Steven, is that on a wiki page?

bq. The usage seems a little slim? http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2010-12-17;raw=on

Yeah, it's very rarely used.  

Several Lucene people who use #lucene are strongly against logging, so I set up #lucene-dev as a place to host on-the-record Lucene conversations.  I mentioned it because this is what you want.
  
> stop writing shared doc stores across segments
> ----------------------------------------------
>
>                 Key: LUCENE-2814
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2814
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch
>
>
> Shared doc stores enables the files for stored fields and term vectors to be shared across multiple segments.  We've had this optimization since 2.1 I think.
> It works best against a new index, where you open an IW, add lots of docs, and then close it.  In that case all of the written segments will reference slices a single shared doc store segment.
> This was a good optimization because it means we never need to merge these files.  But, when you open another IW on that index, it writes a new set of doc stores, and then whenever merges take place across doc stores, they must now be merged.
> However, since we switched to shared doc stores, there have been two optimizations for merging the stores.  First, we now bulk-copy the bytes in these files if the field name/number assignment is "congruent".  Second, we now force congruent field name/number mapping in IndexWriter.  This means this optimization is much less potent than it used to be.
> Furthermore, the optimization adds *a lot* of hair to IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over time, and causes odd behavior like a merge possibly forcing a flush when it starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent flushing, we can no longer share doc stores.
> So, I think we should turn off the write-side of shared doc stores to pave the path for DWPT to land on trunk and simplify IW/DW.  We still must support reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org