You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2016/06/08 20:43:19 UTC

Re: SortingMergePolicy in Lucene 6

Sorry for the slow response: this one almost fell past the event horizon of
my todo list ;)

Do you mean you are using document blocks (IW.addDocuments) and block
grouping (BlockGroupingCollector)?

Any merge policy is fine with that (merging cannot break up document
blocks), but with index time sorting, you'll need to sort primarily by X
(where X is indexed with the same value in parent and child documents), and
secondarily by "blockID" where blockID is a unique long doc value indexed
on each document in the block.  That should preserve your blocks?

Mike McCandless

http://blog.mikemccandless.com

On Wed, May 25, 2016 at 8:26 PM, Sheng <sh...@gmail.com> wrote:

> Michael - That is a great article to read - thank you for the detailed
> explanation of the situation! We are not in production yet, so I am ok to
> wait a bit until 6 is in a more mature shape. For now, I am going to use a
> LogMergePolicy instead. That does bring up another question. As we are
> using Lucene-group, and as document suggests, index each group of documents
> in a single commit, so they can be in the same segment. Does it place any
> limitation on the merge policy we can use, as it is merging the segments
> and might break the deal ? Thanks again!
>
> On Wednesday, May 25, 2016, Michael McCandless <lu...@mikemccandless.com>
> wrote:
>
>> Sorry, yes, dimensional points and SlowCompositeReaderWrapper are not
>> compatible.
>>
>> This class (SlowCompositeReaderWrapper) is a terrible class that we have
>> been gradually (past 7 years) phasing out of Lucene.  It's a leaky
>> abstraction (
>> http://www.joelonsoftware.com/articles/LeakyAbstractions.html) that
>> pretended your index has one segment when it doesn't, and it limited our
>> freedoms when developing new features.
>>
>> Finally just today, for 7.0 anyways, we succeeded:
>> https://issues.apache.org/jira/browse/LUCENE-7283
>>
>> That said, we have also fixed index time sorting to no longer use
>> SlowCompositeReaderWrapper:
>> https://issues.apache.org/jira/browse/LUCENE-6766
>>
>> Right now this is a 7.0 (master) only change but I plan to backport for
>> 6.2 once we get 6.1 released.  Maybe you could test Lucene's current master
>> and confirm points and index-time sorting work correctly for you?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Wed, May 25, 2016 at 1:10 PM, Sheng <sh...@gmail.com> wrote:
>>
>>> It makes a call to SlowCompositeReaderWrapper in line 103, which checks
>>> if
>>> field hasPointValues in line 68. If yes, it throws an exception "cannot
>>> wrap points". Does this essentially mean SortingMergePolicy cannot be
>>> used
>>> for index that has point values. If yes, what is the rationale behind it
>>> ?
>>>
>>
>>

Re: SortingMergePolicy in Lucene 6

Posted by Sheng <sh...@gmail.com>.
Thanks a lot for the explanation. That's all I want to know :)

Cheers,

On Wednesday, June 8, 2016, Michael McCandless <lu...@mikemccandless.com>
wrote:

> Sorry for the slow response: this one almost fell past the event horizon
> of my todo list ;)
>
> Do you mean you are using document blocks (IW.addDocuments) and block
> grouping (BlockGroupingCollector)?
>
> Any merge policy is fine with that (merging cannot break up document
> blocks), but with index time sorting, you'll need to sort primarily by X
> (where X is indexed with the same value in parent and child documents), and
> secondarily by "blockID" where blockID is a unique long doc value indexed
> on each document in the block.  That should preserve your blocks?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, May 25, 2016 at 8:26 PM, Sheng <shengcer@gmail.com
> <javascript:_e(%7B%7D,'cvml','shengcer@gmail.com');>> wrote:
>
>> Michael - That is a great article to read - thank you for the detailed
>> explanation of the situation! We are not in production yet, so I am ok to
>> wait a bit until 6 is in a more mature shape. For now, I am going to use a
>> LogMergePolicy instead. That does bring up another question. As we are
>> using Lucene-group, and as document suggests, index each group of documents
>> in a single commit, so they can be in the same segment. Does it place any
>> limitation on the merge policy we can use, as it is merging the segments
>> and might break the deal ? Thanks again!
>>
>> On Wednesday, May 25, 2016, Michael McCandless <lucene@mikemccandless.com
>> <javascript:_e(%7B%7D,'cvml','lucene@mikemccandless.com');>> wrote:
>>
>>> Sorry, yes, dimensional points and SlowCompositeReaderWrapper are not
>>> compatible.
>>>
>>> This class (SlowCompositeReaderWrapper) is a terrible class that we have
>>> been gradually (past 7 years) phasing out of Lucene.  It's a leaky
>>> abstraction (
>>> http://www.joelonsoftware.com/articles/LeakyAbstractions.html) that
>>> pretended your index has one segment when it doesn't, and it limited our
>>> freedoms when developing new features.
>>>
>>> Finally just today, for 7.0 anyways, we succeeded:
>>> https://issues.apache.org/jira/browse/LUCENE-7283
>>>
>>> That said, we have also fixed index time sorting to no longer use
>>> SlowCompositeReaderWrapper:
>>> https://issues.apache.org/jira/browse/LUCENE-6766
>>>
>>> Right now this is a 7.0 (master) only change but I plan to backport for
>>> 6.2 once we get 6.1 released.  Maybe you could test Lucene's current master
>>> and confirm points and index-time sorting work correctly for you?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Wed, May 25, 2016 at 1:10 PM, Sheng <sh...@gmail.com> wrote:
>>>
>>>> It makes a call to SlowCompositeReaderWrapper in line 103, which checks
>>>> if
>>>> field hasPointValues in line 68. If yes, it throws an exception "cannot
>>>> wrap points". Does this essentially mean SortingMergePolicy cannot be
>>>> used
>>>> for index that has point values. If yes, what is the rationale behind
>>>> it ?
>>>>
>>>
>>>
>