You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by s1monw <gi...@git.apache.org> on 2018/12/09 18:20:02 UTC

[GitHub] lucene-solr pull request #521: LUCENE-8598: Improve field updates packed val...

GitHub user s1monw opened a pull request:

    https://github.com/apache/lucene-solr/pull/521

    LUCENE-8598: Improve field updates packed values

    DocValuesFieldUpdats are using compact settings for packet ints that causes
    dramatic slowdowns when the updates are finished and sorted. Moving to the default
    accepted overhead ratio yields up to 4x improvements in applying updates. This change
    also improves the packing of numeric values since we know the value range in advance and
    can choose a different packing scheme in such a case.
    Overall this change yields a good performance improvement since 99% of the times of applying
    DV field updates are spend in the sort method which essentially makes applying the updates
    4x faster.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/s1monw/lucene-solr improve_packing

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/521.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #521
    
----
commit e6aecc1994f4116f92d51a3841f2cda01e73e70b
Author: Simon Willnauer <si...@...>
Date:   2018-12-09T18:13:20Z

    LUCENE-8598: Improve field updates packed values
    
    DocValuesFieldUpdats are using compact settings for packet ints that causes
    dramatic slowdowns when the updates are finished and sorted. Moving to the default
    accepted overhead ratio yields up to 4x improvements in applying updates. This change
    also improves the packing of numeric values since we know the value range in advance and
    can choose a different packing scheme in such a case.
    Overall this change yields a good performance improvement since 99% of the times of applying
    DV field updates are spend in the sort method which essentially makes applying the updates
    4x faster.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #521: LUCENE-8598: Improve field updates packed val...

Posted by jpountz <gi...@git.apache.org>.
Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/521#discussion_r240294342
  
    --- Diff: lucene/CHANGES.txt ---
    @@ -268,6 +268,9 @@ Optimizations
       can safe up to 80% of the heap used compared to the previous implementation and uses non-object
       based datastructures. (Simon Willnauer, Mike McCandless, Shai Erera, Adrien Grant)
     
    +* LUCENE-8598: Moved to the default accepted overhead ratio for packet ints in DocValuesFieldUpdats
    +  yields an up-to 4x performance improvement when applying doc values updates. (Simon Willnauer, Adrien Grant)
    --- End diff --
    
    typo in my name


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #521: LUCENE-8598: Improve field updates packed val...

Posted by jpountz <gi...@git.apache.org>.
Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/521#discussion_r240152341
  
    --- Diff: lucene/core/src/java/org/apache/lucene/index/NumericDocValuesFieldUpdates.java ---
    @@ -53,14 +55,28 @@ BytesRef binaryValue() {
     
         @Override
         protected void set(long idx) {
    -      value = values.get(idx);
    +      value = values.get(idx) + minValue;
         }
       }
    -  private PagedGrowableWriter values;
    +  private AbstractPagedMutable values;
    +  private final long minValue;
    +
    +  NumericDocValuesFieldUpdates(long delGen, String field, int maxDoc) {
    +   this(delGen, field, Long.MIN_VALUE, Long.MAX_VALUE, maxDoc);
    +  }
     
    -  public NumericDocValuesFieldUpdates(long delGen, String field, int maxDoc) {
    +  NumericDocValuesFieldUpdates(long delGen, String field, long minValue, long maxValue, int maxDoc) {
         super(maxDoc, delGen, field, DocValuesType.NUMERIC);
    -    values = new PagedGrowableWriter(1, PAGE_SIZE, 1, PackedInts.FAST);
    +    assert minValue <= maxValue : "minValue must be <= maxValue [" + minValue + " > " + maxValue + "]";
    +    long max = maxValue - minValue;
    +    if (max < 0) { // pretty large range - fall back to growable writer
    +      values = new PagedGrowableWriter(1, PAGE_SIZE, 1, PackedInts.DEFAULT);
    +      this.minValue = 0;
    +    } else {
    +      int bitsPerValue = PackedInts.bitsRequired(max);
    +      values = new PagedMutable(1, PAGE_SIZE, bitsPerValue, PackedInts.DEFAULT);
    --- End diff --
    
    since we are sorting in the end, all pages will likely have the same number of bits per value in the end anyway, should we move to PackedInts.getMutable instead?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #521: LUCENE-8598: Improve field updates packed val...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/lucene-solr/pull/521


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #521: LUCENE-8598: Improve field updates packed val...

Posted by s1monw <gi...@git.apache.org>.
Github user s1monw commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/521#discussion_r240234021
  
    --- Diff: lucene/core/src/java/org/apache/lucene/index/NumericDocValuesFieldUpdates.java ---
    @@ -53,14 +55,28 @@ BytesRef binaryValue() {
     
         @Override
         protected void set(long idx) {
    -      value = values.get(idx);
    +      value = values.get(idx) + minValue;
         }
       }
    -  private PagedGrowableWriter values;
    +  private AbstractPagedMutable values;
    +  private final long minValue;
    +
    +  NumericDocValuesFieldUpdates(long delGen, String field, int maxDoc) {
    +   this(delGen, field, Long.MIN_VALUE, Long.MAX_VALUE, maxDoc);
    +  }
     
    -  public NumericDocValuesFieldUpdates(long delGen, String field, int maxDoc) {
    +  NumericDocValuesFieldUpdates(long delGen, String field, long minValue, long maxValue, int maxDoc) {
         super(maxDoc, delGen, field, DocValuesType.NUMERIC);
    -    values = new PagedGrowableWriter(1, PAGE_SIZE, 1, PackedInts.FAST);
    +    assert minValue <= maxValue : "minValue must be <= maxValue [" + minValue + " > " + maxValue + "]";
    +    long max = maxValue - minValue;
    +    if (max < 0) { // pretty large range - fall back to growable writer
    +      values = new PagedGrowableWriter(1, PAGE_SIZE, 1, PackedInts.DEFAULT);
    +      this.minValue = 0;
    +    } else {
    +      int bitsPerValue = PackedInts.bitsRequired(max);
    +      values = new PagedMutable(1, PAGE_SIZE, bitsPerValue, PackedInts.DEFAULT);
    --- End diff --
    
    not sure we can since we don't know how many values we actually have?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #521: LUCENE-8598: Improve field updates packed val...

Posted by jpountz <gi...@git.apache.org>.
Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/521#discussion_r240152127
  
    --- Diff: lucene/core/src/java/org/apache/lucene/index/NumericDocValuesFieldUpdates.java ---
    @@ -53,14 +55,28 @@ BytesRef binaryValue() {
     
         @Override
         protected void set(long idx) {
    -      value = values.get(idx);
    +      value = values.get(idx) + minValue;
         }
       }
    -  private PagedGrowableWriter values;
    +  private AbstractPagedMutable values;
    +  private final long minValue;
    +
    +  NumericDocValuesFieldUpdates(long delGen, String field, int maxDoc) {
    +   this(delGen, field, Long.MIN_VALUE, Long.MAX_VALUE, maxDoc);
    +  }
     
    -  public NumericDocValuesFieldUpdates(long delGen, String field, int maxDoc) {
    +  NumericDocValuesFieldUpdates(long delGen, String field, long minValue, long maxValue, int maxDoc) {
         super(maxDoc, delGen, field, DocValuesType.NUMERIC);
    -    values = new PagedGrowableWriter(1, PAGE_SIZE, 1, PackedInts.FAST);
    +    assert minValue <= maxValue : "minValue must be <= maxValue [" + minValue + " > " + maxValue + "]";
    +    long max = maxValue - minValue;
    +    if (max < 0) { // pretty large range - fall back to growable writer
    --- End diff --
    
    It feels a bit weird to optimize when using 64 bits per value but not 63, or 62? Not sure if you are aware but `PackedInts.unsignedBitsRequired` can help avoid special-casing when max-min overflows.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #521: LUCENE-8598: Improve field updates packed val...

Posted by s1monw <gi...@git.apache.org>.
Github user s1monw commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/521#discussion_r240047328
  
    --- Diff: lucene/core/src/test/org/apache/lucene/index/Benchmark.java ---
    @@ -0,0 +1,94 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.lucene.index;
    +
    +import java.io.IOException;
    +import java.nio.file.Files;
    +import java.nio.file.Path;
    +import java.nio.file.Paths;
    +import java.util.Random;
    +import java.util.concurrent.TimeUnit;
    +
    +import org.apache.lucene.document.Document;
    +import org.apache.lucene.document.Field;
    +import org.apache.lucene.document.NumericDocValuesField;
    +import org.apache.lucene.document.StringField;
    +import org.apache.lucene.store.Directory;
    +import org.apache.lucene.store.FSDirectory;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.lucene.util.NullInfoStream;
    +import org.apache.lucene.util.TestUtil;
    +
    +public class Benchmark {
    --- End diff --
    
    NOTE: I included this benchmark in the initial version to show what I benchmarked


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #521: LUCENE-8598: Improve field updates packed values

Posted by s1monw <gi...@git.apache.org>.
Github user s1monw commented on the issue:

    https://github.com/apache/lucene-solr/pull/521
  
    > t's interesting that you found out that most of the time is spent with packed ints when sorting, likely because of swaps: the sorting impl that is being used (InPlaceMergeSorter) is the impl that performs the higher number of swaps due to the constraint to run in-place. Maybe we should look into spending a bit more memory on sorting to avoid having so many swaps, eg. with TimSorter (see ArrayTimSorter for an example).
    
    regarding TimSorter, it looks pretty complex and I wonder how well it would work with the subclasses and what needs to be implemented. Maybe we can explore this in a separate change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org