You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/08 06:04:56 UTC

[GitHub] [druid] imply-cheddar commented on a diff in pull request #12846: specialized FixedIndexed implementations for java value types

imply-cheddar commented on code in PR #12846:
URL: https://github.com/apache/druid/pull/12846#discussion_r965528921


##########
processing/src/main/java/org/apache/druid/segment/data/FixedIndexedDoubles.java:
##########
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.google.common.base.Preconditions;
+import it.unimi.dsi.fastutil.doubles.DoubleComparator;
+import it.unimi.dsi.fastutil.doubles.DoubleComparators;
+import it.unimi.dsi.fastutil.doubles.DoubleIterator;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.monomorphicprocessing.HotLoopCallee;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Iterator;
+
+/**
+ * Specialized implementation for {@link FixedIndexed<Double>} which does not contain any null values, allowing it to
+ * deal in java double value types instead of {@link Double} objects, and utilize specialized {@link ByteBuffer} methods
+ * to more efficiently read data.
+ */
+public final class FixedIndexedDoubles implements Indexed<Double>, HotLoopCallee
+{
+  public static FixedIndexedDoubles read(ByteBuffer bb, ByteOrder byteOrder)
+  {
+    final ByteBuffer buffer = bb.asReadOnlyBuffer().order(byteOrder);
+    final byte version = buffer.get();
+    Preconditions.checkState(version == 0, "Unknown version [%s]", version);
+    final byte flags = buffer.get();
+    final boolean hasNull = (flags & NullHandling.IS_NULL_BYTE) == NullHandling.IS_NULL_BYTE ? true : false;
+    final boolean isSorted = (flags & FixedIndexed.IS_SORTED_MASK) == FixedIndexed.IS_SORTED_MASK ? true : false;
+    Preconditions.checkState(!hasNull, "Cannot use FixedIndexedInts for FixedIndex with null values");
+    Preconditions.checkState(!(hasNull && !isSorted), "cannot have null values if not sorted");

Review Comment:
   I'm pretty sure that this precondition will always be true if the precondition of `!hasNull` from the line above is true.



##########
processing/src/main/java/org/apache/druid/segment/data/FixedIndexedInts.java:
##########
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.google.common.base.Preconditions;
+import it.unimi.dsi.fastutil.ints.IntComparator;
+import it.unimi.dsi.fastutil.ints.IntComparators;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.monomorphicprocessing.HotLoopCallee;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Iterator;
+
+/**
+ * Specialized implementation for {@link FixedIndexed<Integer>} which does not contain any null values, allowing it to
+ * deal in java int value types instead of {@link Integer} objects, and utilize specialized {@link ByteBuffer} methods
+ * to more efficiently read data.
+ */
+public final class FixedIndexedInts implements Indexed<Integer>, HotLoopCallee
+{
+  public static FixedIndexedInts read(ByteBuffer bb, ByteOrder byteOrder)
+  {
+    final ByteBuffer buffer = bb.asReadOnlyBuffer().order(byteOrder);
+    final byte version = buffer.get();
+    Preconditions.checkState(version == 0, "Unknown version [%s]", version);
+    final byte flags = buffer.get();
+    final boolean hasNull = (flags & NullHandling.IS_NULL_BYTE) == NullHandling.IS_NULL_BYTE ? true : false;
+    final boolean isSorted = (flags & FixedIndexed.IS_SORTED_MASK) == FixedIndexed.IS_SORTED_MASK ? true : false;
+    Preconditions.checkState(!hasNull, "Cannot use FixedIndexedInts for FixedIndex with null values");
+    Preconditions.checkState(!(hasNull && !isSorted), "cannot have null values if not sorted");
+    final int size = buffer.getInt() + (hasNull ? 1 : 0);
+    final int valuesOffset = buffer.position();
+    final FixedIndexedInts fixedIndexed = new FixedIndexedInts(
+        buffer,
+        isSorted,
+        size,
+        valuesOffset
+    );
+    bb.position(buffer.position() + (Integer.BYTES * size));
+    return fixedIndexed;
+  }
+
+  private final ByteBuffer buffer;
+  private final int size;
+  private final int valuesOffset;
+  private final boolean isSorted;
+  private final IntComparator comparator;
+
+  private FixedIndexedInts(
+      ByteBuffer buffer,
+      boolean isSorted,
+      int size,
+      int valuesOffset
+  )
+  {
+    this.buffer = buffer;
+    this.size = size;
+    this.valuesOffset = valuesOffset;
+    this.isSorted = isSorted;
+    this.comparator = IntComparators.NATURAL_COMPARATOR;
+  }
+
+  @Override
+  public int size()
+  {
+    return size;
+  }
+
+  @Nullable
+  @Override
+  public Integer get(int index)
+  {
+    return getInt(index);
+  }
+
+  @Override
+  public int indexOf(@Nullable Integer value)
+  {
+    if (value == null) {
+      return -1;
+    }
+    return indexOf(value.intValue());
+  }
+
+  public int getInt(int index)
+  {
+    return buffer.getInt(valuesOffset + (index * Integer.BYTES));
+  }
+
+  public int indexOf(int value)
+  {
+    if (!isSorted) {
+      throw new UnsupportedOperationException("Reverse lookup not allowed.");
+    }
+    int minIndex = 0;
+    int maxIndex = size - 1;
+    while (minIndex <= maxIndex) {
+      int currIndex = (minIndex + maxIndex) >>> 1;
+
+      int currValue = getInt(currIndex);
+      int comparison = comparator.compare(currValue, value);
+      if (comparison == 0) {
+        return currIndex;
+      }
+
+      if (comparison < 0) {
+        minIndex = currIndex + 1;
+      } else {
+        maxIndex = currIndex - 1;
+      }
+    }
+
+    return -(minIndex + 1);
+  }
+
+  public IntIterator intIterator()
+  {
+    final ByteBuffer copy = buffer.asReadOnlyBuffer().order(buffer.order());
+    copy.position(valuesOffset);
+    copy.limit(valuesOffset + (size * Integer.BYTES));
+    return new IntIterator()
+    {
+      @Override
+      public int nextInt()
+      {
+        return copy.getInt();
+      }
+
+      @Override
+      public boolean hasNext()
+      {
+        return copy.hasRemaining();
+      }
+    };
+  }
+
+  @Override
+  public Iterator<Integer> iterator()

Review Comment:
   It looks like `IntIterator` already implements `Iterator<Integer>` so you don't need to wrap it again.



##########
processing/src/main/java/org/apache/druid/segment/data/FixedIndexedDoubleWriter.java:
##########
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import it.unimi.dsi.fastutil.doubles.DoubleIterator;
+import org.apache.druid.io.Channels;
+import org.apache.druid.java.util.common.io.smoosh.FileSmoosher;
+import org.apache.druid.segment.serde.Serializer;
+import org.apache.druid.segment.writeout.SegmentWriteOutMedium;
+import org.apache.druid.segment.writeout.WriteOutBytes;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.WritableByteChannel;
+
+/**
+ * Specialized version of {@link FixedIndexedWriter} for writing double value types, with no support for null values,
+ * and no verification that data is actually sorted. The resulting data can be read into either
+ * {@link FixedIndexedDoubles} or a {@link FixedIndexed<Double>}, since the format is identical.
+ *
+ * Callers should be certain that the data written is in fact sorted if specifying it as such. If null values need
+ * to be stored then the generic {@link FixedIndexedWriter} should be used instead.
+ */
+public class FixedIndexedDoubleWriter implements Serializer
+{
+  private static final int PAGE_SIZE = 4096;
+  private final SegmentWriteOutMedium segmentWriteOutMedium;
+  private final ByteBuffer scratch;
+  private int numWritten;
+  @Nullable
+  private WriteOutBytes valuesOut = null;
+
+  private final boolean isSorted;
+
+  public FixedIndexedDoubleWriter(SegmentWriteOutMedium segmentWriteOutMedium, boolean sorted)
+  {
+    this.segmentWriteOutMedium = segmentWriteOutMedium;
+    // this is a matter of faith, nothing checks
+    this.isSorted = sorted;

Review Comment:
   Do we need to know if it's sorted at this level?  If we are being told that it is supposed to be sorted, then I think that we should verify that it's sorted.  We can do that relatively easily by just having the most-recent value, checking that the next one is greater and then storing that.
   
   Or, if we are trusting something external from us to know that it is already sorted, then we shouldn't even bother asking it to tell us if we are sorted or not and instead just expect the external code to do the right thing.



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralColumnIndexSupplier.java:
##########
@@ -622,7 +630,7 @@ public Iterable<ImmutableBitmap> getBitmapIterable()
             final DruidLongPredicate longPredicate = matcherFactory.makeLongPredicate();
 
             // in the future, this could use an int iterator
-            final Iterator<Integer> iterator = dictionary.iterator();
+            final Iterator<Integer> iterator = dictionary.intIterator();

Review Comment:
   you switched to an intIterator, but are still using an `Iterator<Integer>`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org