You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2009/07/19 07:50:14 UTC

[jira] Commented: (LUCENE-1750) Create a MergePolicy that limits the maximum size of it's segments

    [ https://issues.apache.org/jira/browse/LUCENE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732974#action_12732974 ] 

Shai Erera commented on LUCENE-1750:
------------------------------------

What happens after several such large segments are created? Wouldn't you want them to be merged into an even larger segment? Or, you'll have many such segments and search performance will degrade.

I guess I never thought this is a problem. If I have enough disk space, and my index size reaches 600 GB (which is a huge index), and is split across 10 different segments of size 60GB each, I guess I'd want them to be merged into one larger 600GB segment. It will take ions until I'll accumulate another such 600 GB segment, no?

Maybe we can have two merge factors: 1) for small segments, or up to a set size threshold, where we do the merges regularly. 2) Then, for really large segments we say the marge factor is different. For example, we can say that up to 1GB the merge factor is 10, and beyond the merge factor is 20. That will postpone the large IO merges until enough such segments accumulate.

Also, w/ the current proposal, how will optimize work? Will it skip the very large segments, or will they be included too?

> Create a MergePolicy that limits the maximum size of it's segments
> ------------------------------------------------------------------
>
>                 Key: LUCENE-1750
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1750
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1750.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Basically I'm trying to create largish 2-4GB shards using
> LogByteSizeMergePolicy, however I've found in the attached unit
> test segments that exceed maxMergeMB.
> The goal is for segments to be merged up to 2GB, then all
> merging to that segment stops, and then another 2GB segment is
> created. This helps when replicating in Solr where if a single
> optimized 60GB segment is created, the machine stops working due
> to IO and CPU starvation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org