You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/08 09:12:00 UTC

[GitHub] [druid] clintropolis opened a new pull request #11888: add 'TypeStrategy' to types

clintropolis opened a new pull request #11888:
URL: https://github.com/apache/druid/pull/11888


   ### Description
   Following up on discussion in #11853, this PR introduces a new `TypeStrategy` interface which defines binary serde and comparison for values of any `TypeSignature`.
   
   The binary serialization methods were consolidated from the static methods in `Types`, and the code I think is overall much more simple.
   
   `TypeStrategy` deals in objects, so is not optimal for primitive values like longs and doubles, so the static methods for reading/writing these values with nulls remain but now live in `TypeStrategies`, along with all of the built-in implementations (string, long, double, float, array).
   
   The comparator implementations match the orderings defined in `druid-processing`... we may wish to consider bringing some of them into `druid-core`, merging `druid-core` and `druid-processing` :trollface: , or something else to try to consolidate things a bit better. I'm also unsure if the array comparator is good, but nothing should actually be using it yet so maybe it isn't the end of the world.
   
   `ObjectByteStrategy` is moved back into `ObjectStrategy`, it's registry replaced with one of `TypeStrategy`, and instead `ComplexMetricsSerde` has a new default implementation of a method called `getTypeStrategy` which wraps the `ObjectStrategy` methods, but could be overridden for more optimal implementations.
   
   I think the utility of being able to get binary serde to use for things like buffer aggregators is on full display in the `ExprEval` serialization methods, so I think overall i'm in favor of this change, since basically anywhere you have a ` ColumnInspector` (which includes `RowSignature`) you have the means to read, write, and compare values.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `TypeStrategy`
    * `TypeStrategies`
    * `Types`
    * `ComplexMetricsSerde`
    * `ColumnType`
    * `ColumnTypeFactory`
    * `ExpressionType`
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r764346431



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient

Review comment:
       added to javadocs to better explain usage




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r747911389



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Implementations of this mechanism support reading and writing ONLY non-null values. To read and write nullable
+ * values and you have enough memory to burn a full byte for every value you want to store, consider using the
+ * {@link TypeStrategies#readNullableType} and {@link TypeStrategies#writeNullableType} family of
+ * methods, which will store values with a leading byte containing either {@link NullHandling#IS_NULL_BYTE} or
+ * {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate. If you have a lot of values to write and a lot of nulls,
+ * consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * The size in bytes that writing this value to memory would require, useful for constraining the values maximum size
+   *
+   * This does not include the null byte, use {@link #estimateSizeBytesNullable(Object)} instead.
+   */
+  int estimateSizeBytes(@Nullable T value);
+
+  /**
+   * The size in bytes that writing this value to memory would require, including the null byte, useful for constraining
+   * the values maximum size. If the value is null, the size will be {@link Byte#BYTES}, otherwise it will be
+   * {@link Byte#BYTES} + {@link #estimateSizeBytes(Object)}
+   */
+  default int estimateSizeBytesNullable(@Nullable T value)
+  {
+    if (value == null) {
+      return Byte.BYTES;
+    }
+    return Byte.BYTES + estimateSizeBytes(value);
+  }
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move
+   * the underlying position by the size of the value read.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  T read(ByteBuffer buffer);
+
+  /**
+   * Write a non-null value to the {@link ByteBuffer} at position {@link ByteBuffer#position()}. This will move the
+   * underlying position by the size of the value written.
+   *
+   * Callers should ensure the {@link ByteBuffer} has adequate capacity before writing values, use
+   * {@link #estimateSizeBytes(Object)} to determine the required size of a value before writing if the size
+   * is unknown.
+   */
+  void write(ByteBuffer buffer, T value);
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the requested position. This will not permanently move the
+   * underlying {@link ByteBuffer#position()}.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  default T read(ByteBuffer buffer, int offset)
+  {
+    final int oldPosition = buffer.position();
+    buffer.position(offset);
+    T value = read(buffer);
+    buffer.position(oldPosition);

Review comment:
       Consider two changes:
   
   - Using try/finally so the buffer position is still valid if `read` throws an exception
   - Updating the javadoc to say that even though the buffer position is unchanged upon method exit, it may be changed temporarily while the method is running, so it isn't concurrency-safe.

##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient
+   */
+  int estimateSizeBytes(T value);
+
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move
+   * the underlying position by the size of the value read.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  T read(ByteBuffer buffer);
+
+  /**
+   * Write a non-null value to the {@link ByteBuffer} at position {@link ByteBuffer#position()}. This will move the
+   * underlying position by the size of the value written, and returns the number of bytes written.
+   *
+   * If writing the value would take more than 'maxSizeBytes' (or the buffer limit, whichever is smaller), this method
+   * will return a negative value indicating the number of additional bytes that would be required to fully write the
+   * value. Partial results may be written to the buffer when in this state, and the position may be left at whatever
+   * point the implementation ran out of space while writing the value.

Review comment:
       Does this mean the caller is expected to save the position? Or is the caller safe to assume it can roll back by `-retval`?
   
   i.e., is this code safe?
   
   ```
   int written = write(buf, value, maxBytes);
   if (written < 0) {
     buf.position(buf.position() + written); // roll back buf, undoing the write
   }
   ```
   
   or do callers need to do this?
   
   ```
   int oldPosition = buf.position();
   int written = write(buf, value, maxBytes);
   if (written < 0) {
     buf.position(oldPosition); // roll back buf, undoing the write
   }
   ```

##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Implementations of this mechanism support reading and writing ONLY non-null values. To read and write nullable
+ * values and you have enough memory to burn a full byte for every value you want to store, consider using the
+ * {@link TypeStrategies#readNullableType} and {@link TypeStrategies#writeNullableType} family of
+ * methods, which will store values with a leading byte containing either {@link NullHandling#IS_NULL_BYTE} or
+ * {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate. If you have a lot of values to write and a lot of nulls,
+ * consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * The size in bytes that writing this value to memory would require, useful for constraining the values maximum size
+   *
+   * This does not include the null byte, use {@link #estimateSizeBytesNullable(Object)} instead.
+   */
+  int estimateSizeBytes(@Nullable T value);
+
+  /**
+   * The size in bytes that writing this value to memory would require, including the null byte, useful for constraining
+   * the values maximum size. If the value is null, the size will be {@link Byte#BYTES}, otherwise it will be
+   * {@link Byte#BYTES} + {@link #estimateSizeBytes(Object)}
+   */
+  default int estimateSizeBytesNullable(@Nullable T value)
+  {
+    if (value == null) {
+      return Byte.BYTES;
+    }
+    return Byte.BYTES + estimateSizeBytes(value);
+  }
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move
+   * the underlying position by the size of the value read.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  T read(ByteBuffer buffer);
+
+  /**
+   * Write a non-null value to the {@link ByteBuffer} at position {@link ByteBuffer#position()}. This will move the
+   * underlying position by the size of the value written.
+   *
+   * Callers should ensure the {@link ByteBuffer} has adequate capacity before writing values, use
+   * {@link #estimateSizeBytes(Object)} to determine the required size of a value before writing if the size
+   * is unknown.
+   */
+  void write(ByteBuffer buffer, T value);
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the requested position. This will not permanently move the
+   * underlying {@link ByteBuffer#position()}.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  default T read(ByteBuffer buffer, int offset)
+  {
+    final int oldPosition = buffer.position();
+    buffer.position(offset);
+    T value = read(buffer);
+    buffer.position(oldPosition);
+    return value;
+  }
+
+  /**
+   * Write a non-null value to the {@link ByteBuffer} at the requested position. This will not permanently move the
+   * underlying {@link ByteBuffer#position()}, and returns the number of bytes written.
+   *
+   * Callers should ensure the {@link ByteBuffer} has adequate capacity before writing values, use
+   * {@link #estimateSizeBytes(Object)} to determine the required size of a value before writing if the size
+   * is unknown.
+   */
+  default int write(ByteBuffer buffer, int offset, T value)
+  {
+    final int oldPosition = buffer.position();
+    buffer.position(offset);
+    write(buffer, value);
+    final int size = buffer.position() - offset;
+    buffer.position(oldPosition);

Review comment:
       Consider two changes:
   
   - Using try/finally so the buffer position is still valid if `write` throws an exception
   - Updating the javadoc to say that even though the buffer position is unchanged upon method exit, it may be changed temporarily while the method is running, so it isn't concurrency-safe.

##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient
+   */
+  int estimateSizeBytes(T value);
+
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move

Review comment:
       "non-null" isn't true for the NullableTypeStrategy impl, which is still a TypeStrategy, so this method contract should be adjusted.

##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient

Review comment:
       I'd include something here about how this estimate is expected to be used. (Like an example)
   
   That'll help people understand when they should use this method, and also understand how to implement it. I'm suggesting this because you gotta be careful with "estimate" methods: people have wildly differing ideas of how accurate they need to be, so it pays to be specific in the doc comments.

##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>

Review comment:
       Will stuff serialized with this interface be persisted to disk, sent between servers that might be running different versions of code, used in caches, etc? Or will it always be in-memory, single-server, & ephemeral?
   
   Implementers are going to want to know the answer, so they can design their formats accordingly. (If persistence is required then they might want to add a version byte, avoid using native endianness, etc.)

##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient
+   */
+  int estimateSizeBytes(T value);
+
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move
+   * the underlying position by the size of the value read.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  T read(ByteBuffer buffer);
+
+  /**
+   * Write a non-null value to the {@link ByteBuffer} at position {@link ByteBuffer#position()}. This will move the
+   * underlying position by the size of the value written, and returns the number of bytes written.
+   *
+   * If writing the value would take more than 'maxSizeBytes' (or the buffer limit, whichever is smaller), this method

Review comment:
       Should we spec this to be an error if `maxSizeBytes` is greater than `buffer.remaining()`? Seems like a weird thing for a caller to do. Or is there a reason a caller might do that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r779376508



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategies.java
##########
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Ordering;
+import com.google.common.primitives.Floats;
+import com.google.common.primitives.Longs;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.StringUtils;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.concurrent.ConcurrentHashMap;
+
+public class TypeStrategies
+{
+  public static final int VALUE_OFFSET = Byte.BYTES;
+  public static final int NULLABLE_LONG_SIZE = Byte.BYTES + Long.BYTES;
+  public static final int NULLABLE_DOUBLE_SIZE = Byte.BYTES + Double.BYTES;
+  public static final int NULLABLE_FLOAT_SIZE = Byte.BYTES + Float.BYTES;
+
+  public static final LongTypeStrategy LONG = new LongTypeStrategy();
+  public static final FloatTypeStrategy FLOAT = new FloatTypeStrategy();
+  public static final DoubleTypeStrategy DOUBLE = new DoubleTypeStrategy();
+  public static final StringTypeStrategy STRING = new StringTypeStrategy();
+  public static final ConcurrentHashMap<String, TypeStrategy<?>> COMPLEX_STRATEGIES = new ConcurrentHashMap<>();
+
+  /**
+   * Get an {@link TypeStrategy} registered to some {@link TypeSignature#getComplexTypeName()}.
+   */
+  @Nullable
+  public static TypeStrategy<?> getComplex(String typeName)
+  {
+    return COMPLEX_STRATEGIES.get(typeName);
+  }
+
+  /**
+   * hmm... this might look familiar... (see ComplexMetrics)
+   *
+   * Register a complex type name -> {@link TypeStrategy} mapping.
+   *
+   * If the specified type name is already used and the supplied {@link TypeStrategy} is not of the
+   * same type as the existing value in the map for said key, an {@link ISE} is thrown.
+   *
+   * @param strategy The {@link TypeStrategy} object to be associated with the 'type' in the map.
+   */
+  public static void registerComplex(String typeName, TypeStrategy<?> strategy)
+  {
+    Preconditions.checkNotNull(typeName);
+    COMPLEX_STRATEGIES.compute(typeName, (key, value) -> {
+      if (value == null) {
+        return strategy;
+      } else {
+        if (!value.getClass().getName().equals(strategy.getClass().getName())) {
+          throw new ISE(
+              "Incompatible strategy for type[%s] already exists. Expected [%s], found [%s].",
+              key,
+              strategy.getClass().getName(),
+              value.getClass().getName()
+          );
+        } else {
+          return value;
+        }
+      }
+    });
+  }
+
+  /**
+   * Clear and set the 'null' byte of a nullable value to {@link NullHandling#IS_NULL_BYTE} to a {@link ByteBuffer} at
+   * the supplied position. This method does not change the buffer position, limit, or mark, because it does not expect
+   * to own the buffer given to it (i.e. buffer aggs)
+   *
+   * Nullable types are stored with a leading byte to indicate if the value is null, followed by the value bytes
+   * (if not null)
+   *
+   * layout: | null (byte) | value |
+   *
+   * @return number of bytes written (always 1)
+   */
+  public static int writeNull(ByteBuffer buffer, int offset)
+  {
+    buffer.put(offset, NullHandling.IS_NULL_BYTE);
+    return 1;
+  }
+
+  /**
+   * Checks if a 'nullable' value's null byte is set to {@link NullHandling#IS_NULL_BYTE}. This method will mask the
+   * value of the null byte to only check if the null bit is set, meaning that the higher bits can be utilized for
+   * flags as necessary (e.g. using high bits to indicate if the value has been set or not for aggregators).
+   *
+   * Note that writing nullable values with the methods of {@link Types} will always clear and set the null byte to
+   * either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE}, losing any flag bits.
+   *
+   * layout: | null (byte) | value |
+   */
+  public static boolean isNullableNull(ByteBuffer buffer, int offset)
+  {
+    // use & so that callers can use the high bits of the null byte to pack additional information if necessary
+    return (buffer.get(offset) & NullHandling.IS_NULL_BYTE) == NullHandling.IS_NULL_BYTE;
+  }
+
+  /**
+   * Write a non-null long value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the long value is written in the next 8 bytes.
+   *
+   * layout: | null (byte) | long |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 9)
+   */
+  public static int writeNullableLong(ByteBuffer buffer, int offset, long value)

Review comment:
       changed to `readNotNullNullable...`/`writeNotNullNullable...` to reduce confusion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r760137051



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategies.java
##########
@@ -0,0 +1,463 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Ordering;
+import com.google.common.primitives.Floats;
+import com.google.common.primitives.Longs;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.StringUtils;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.concurrent.ConcurrentHashMap;
+
+public class TypeStrategies
+{
+  public static final int VALUE_OFFSET = Byte.BYTES;
+  public static final int NULLABLE_LONG_SIZE = Byte.BYTES + Long.BYTES;
+  public static final int NULLABLE_DOUBLE_SIZE = Byte.BYTES + Double.BYTES;
+  public static final int NULLABLE_FLOAT_SIZE = Byte.BYTES + Float.BYTES;
+
+  public static final LongTypeStrategy LONG = new LongTypeStrategy();
+  public static final FloatTypeStrategy FLOAT = new FloatTypeStrategy();
+  public static final DoubleTypeStrategy DOUBLE = new DoubleTypeStrategy();
+  public static final StringTypeStrategy STRING = new StringTypeStrategy();
+  public static final ConcurrentHashMap<String, TypeStrategy<?>> COMPLEX_STRATEGIES = new ConcurrentHashMap<>();
+
+  /**
+   * Get an {@link TypeStrategy} registered to some {@link TypeSignature#getComplexTypeName()}.
+   */
+  @Nullable
+  public static TypeStrategy<?> getComplex(String typeName)
+  {
+    return COMPLEX_STRATEGIES.get(typeName);
+  }
+
+  /**
+   * hmm... this might look familiar... (see ComplexMetrics)
+   *
+   * Register a complex type name -> {@link TypeStrategy} mapping.
+   *
+   * If the specified type name is already used and the supplied {@link TypeStrategy} is not of the
+   * same type as the existing value in the map for said key, an {@link ISE} is thrown.
+   *
+   * @param strategy The {@link TypeStrategy} object to be associated with the 'type' in the map.
+   */
+  public static void registerComplex(String typeName, TypeStrategy<?> strategy)
+  {
+    Preconditions.checkNotNull(typeName);
+    COMPLEX_STRATEGIES.compute(typeName, (key, value) -> {
+      if (value == null) {
+        return strategy;
+      } else {
+        if (!value.getClass().getName().equals(strategy.getClass().getName())) {
+          throw new ISE(
+              "Incompatible strategy for type[%s] already exists. Expected [%s], found [%s].",
+              key,
+              strategy.getClass().getName(),
+              value.getClass().getName()
+          );
+        } else {
+          return value;
+        }
+      }
+    });
+  }
+
+  /**
+   * Clear and set the 'null' byte of a nullable value to {@link NullHandling#IS_NULL_BYTE} to a {@link ByteBuffer} at
+   * the supplied position. This method does not change the buffer position, limit, or mark, because it does not expect
+   * to own the buffer given to it (i.e. buffer aggs)
+   *
+   * Nullable types are stored with a leading byte to indicate if the value is null, followed by the value bytes
+   * (if not null)
+   *
+   * layout: | null (byte) | value |
+   *
+   * @return number of bytes written (always 1)
+   */
+  public static int writeNull(ByteBuffer buffer, int offset)
+  {
+    buffer.put(offset, NullHandling.IS_NULL_BYTE);
+    return 1;
+  }
+
+  /**
+   * Checks if a 'nullable' value's null byte is set to {@link NullHandling#IS_NULL_BYTE}. This method will mask the
+   * value of the null byte to only check if the null bit is set, meaning that the higher bits can be utilized for
+   * flags as necessary (e.g. using high bits to indicate if the value has been set or not for aggregators).
+   *
+   * Note that writing nullable values with the methods of {@link Types} will always clear and set the null byte to
+   * either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE}, losing any flag bits.
+   *
+   * layout: | null (byte) | value |
+   */
+  public static boolean isNullableNull(ByteBuffer buffer, int offset)
+  {
+    // use & so that callers can use the high bits of the null byte to pack additional information if necessary
+    return (buffer.get(offset) & NullHandling.IS_NULL_BYTE) == NullHandling.IS_NULL_BYTE;
+  }
+
+  /**
+   * Write a non-null long value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the long value is written in the next 8 bytes.
+   *
+   * layout: | null (byte) | long |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 9)
+   */
+  public static int writeNullableLong(ByteBuffer buffer, int offset, long value)
+  {
+    buffer.put(offset++, NullHandling.IS_NOT_NULL_BYTE);
+    buffer.putLong(offset, value);
+    return NULLABLE_LONG_SIZE;
+  }
+
+  /**
+   * Reads a non-null long value from a {@link ByteBuffer} at the supplied offset. This method should only be called
+   * if and only if {@link #isNullableNull} for the same offset returns false.
+   *
+   * layout: | null (byte) | long |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect  to own the buffer
+   * given to it (i.e. buffer aggs)
+   */
+  public static long readNullableLong(ByteBuffer buffer, int offset)
+  {
+    assert !isNullableNull(buffer, offset);
+    return buffer.getLong(offset + VALUE_OFFSET);
+  }
+
+  /**
+   * Write a non-null double value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the double value is written in the next 8 bytes.
+   *
+   * layout: | null (byte) | double |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 9)
+   */
+  public static int writeNullableDouble(ByteBuffer buffer, int offset, double value)
+  {
+    buffer.put(offset++, NullHandling.IS_NOT_NULL_BYTE);
+    buffer.putDouble(offset, value);
+    return NULLABLE_DOUBLE_SIZE;
+  }
+
+  /**
+   * Reads a non-null double value from a {@link ByteBuffer} at the supplied offset. This method should only be called
+   * if and only if {@link #isNullableNull} for the same offset returns false.
+   *
+   * layout: | null (byte) | double |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   */
+  public static double readNullableDouble(ByteBuffer buffer, int offset)
+  {
+    assert !isNullableNull(buffer, offset);
+    return buffer.getDouble(offset + VALUE_OFFSET);
+  }
+
+  /**
+   * Write a non-null float value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the float value is written in the next 4 bytes.
+   *
+   * layout: | null (byte) | float |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 5)
+   */
+  public static int writeNullableFloat(ByteBuffer buffer, int offset, float value)
+  {
+    buffer.put(offset++, NullHandling.IS_NOT_NULL_BYTE);
+    buffer.putFloat(offset, value);
+    return NULLABLE_FLOAT_SIZE;
+  }
+
+  /**
+   * Reads a non-null float value from a {@link ByteBuffer} at the supplied offset. This method should only be called
+   * if and only if {@link #isNullableNull} for the same offset returns false.
+   *
+   * layout: | null (byte) | float |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   */
+  public static float readNullableFloat(ByteBuffer buffer, int offset)
+  {
+    assert !isNullableNull(buffer, offset);
+    return buffer.getFloat(offset + VALUE_OFFSET);
+  }
+
+
+  public static final class LongTypeStrategy implements TypeStrategy<Long>
+  {
+    private static final Comparator<Long> COMPARATOR = Comparator.nullsFirst(Longs::compare);
+
+    @Override
+    public int estimateSizeBytes(Long value)
+    {
+      return Long.BYTES;
+    }
+
+    @Override
+    public Long read(ByteBuffer buffer)
+    {
+      return buffer.getLong();
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Long value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Long.BYTES;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putLong(value);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(Long o1, Long o2)
+    {
+      return COMPARATOR.compare(o1, o2);
+    }
+  }
+
+  public static final class FloatTypeStrategy implements TypeStrategy<Float>
+  {
+    private static final Comparator<Float> COMPARATOR = Comparator.nullsFirst(Floats::compare);
+
+    @Override
+    public int estimateSizeBytes(Float value)
+    {
+      return Float.BYTES;
+    }
+
+    @Override
+    public Float read(ByteBuffer buffer)
+    {
+      return buffer.getFloat();
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Float value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Float.BYTES;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putFloat(value);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(Float o1, Float o2)
+    {
+      return COMPARATOR.compare(o1, o2);
+    }
+  }
+
+  public static final class DoubleTypeStrategy implements TypeStrategy<Double>
+  {
+    private static final Comparator<Double> COMPARATOR = Comparator.nullsFirst(Double::compare);
+
+    @Override
+    public int estimateSizeBytes(Double value)
+    {
+      return Double.BYTES;
+    }
+
+    @Override
+    public Double read(ByteBuffer buffer)
+    {
+      return buffer.getDouble();
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Double value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Double.BYTES;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putDouble(value);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(Double o1, Double o2)
+    {
+      return COMPARATOR.compare(o1, o2);
+    }
+  }
+
+  public static final class StringTypeStrategy implements TypeStrategy<String>
+  {
+    // copy of lexicographical comparator
+    private static final Ordering<String> ORDERING = Ordering.from(String::compareTo).nullsFirst();
+
+    @Override
+    public int estimateSizeBytes(String value)
+    {
+      if (value == null) {
+        return 0;
+      }
+      return Integer.BYTES + StringUtils.toUtf8(value).length;
+    }
+
+    @Override
+    public String read(ByteBuffer buffer)
+    {
+      // | length (int) | bytes |
+      final int length = buffer.getInt();
+      final byte[] blob = new byte[length];
+      buffer.get(blob, 0, length);
+      return StringUtils.fromUtf8(blob);
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, String value, int maxSizeBytes)
+    {
+      final byte[] bytes = StringUtils.toUtf8(value);
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Integer.BYTES + bytes.length;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putInt(bytes.length);
+        buffer.put(bytes, 0, bytes.length);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(String s, String s2)
+    {
+      // copy of lexicographical string comparator in druid processing
+      // Avoid comparisons for equal references
+      // Assuming we mostly compare different strings, checking s.equals(s2) will only make the comparison slower.
+      //noinspection StringEquality
+      if (s == s2) {
+        return 0;
+      }
+
+      return ORDERING.compare(s, s2);
+    }
+  }
+
+  public static final class ArrayTypeStrategy implements TypeStrategy<Object[]>
+  {
+    private final Comparator<Object> elementComparator;
+    private final TypeStrategy elementStrategy;
+
+    public ArrayTypeStrategy(TypeSignature<?> type)
+    {
+      this.elementStrategy = type.getElementType().getNullableStrategy();
+      this.elementComparator = Comparator.nullsFirst(elementStrategy);
+    }
+
+    @Override
+    public int estimateSizeBytes(Object[] value)
+    {
+      if (value == null) {
+        return 0;
+      }
+      return Integer.BYTES + Arrays.stream(value).mapToInt(elementStrategy::estimateSizeBytes).sum();
+    }
+
+    @Override
+    public Object[] read(ByteBuffer buffer)
+    {
+      final int arrayLength = buffer.getInt();
+      final Object[] array = new Object[arrayLength];
+      for (int i = 0; i < arrayLength; i++) {
+        array[i] = elementStrategy.read(buffer);
+      }
+      return array;
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Object[] value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      int sizeBytes = Integer.BYTES;
+      int remaining = max - sizeBytes;
+      if (remaining < 0) {
+        return remaining;
+      }
+      int extraNeeded = 0;
+
+      buffer.putInt(value.length);
+      for (Object o : value) {
+        int written = elementStrategy.write(buffer, o, remaining);
+        if (written < 0) {
+          extraNeeded += written;
+          remaining = 0;
+        } else {
+          sizeBytes += written;
+          remaining -= sizeBytes;
+        }
+      }
+      return extraNeeded < 0 ? extraNeeded : sizeBytes;
+    }
+
+    @Override
+    public int compare(Object[] o1, Object[] o2)
+    {
+      final int iter = Math.max(o1.length, o2.length);

Review comment:
       should this be math.min as might have a scenario we can get indexOutofBoundException no ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r760552467



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategies.java
##########
@@ -0,0 +1,463 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Ordering;
+import com.google.common.primitives.Floats;
+import com.google.common.primitives.Longs;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.StringUtils;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.concurrent.ConcurrentHashMap;
+
+public class TypeStrategies
+{
+  public static final int VALUE_OFFSET = Byte.BYTES;
+  public static final int NULLABLE_LONG_SIZE = Byte.BYTES + Long.BYTES;
+  public static final int NULLABLE_DOUBLE_SIZE = Byte.BYTES + Double.BYTES;
+  public static final int NULLABLE_FLOAT_SIZE = Byte.BYTES + Float.BYTES;
+
+  public static final LongTypeStrategy LONG = new LongTypeStrategy();
+  public static final FloatTypeStrategy FLOAT = new FloatTypeStrategy();
+  public static final DoubleTypeStrategy DOUBLE = new DoubleTypeStrategy();
+  public static final StringTypeStrategy STRING = new StringTypeStrategy();
+  public static final ConcurrentHashMap<String, TypeStrategy<?>> COMPLEX_STRATEGIES = new ConcurrentHashMap<>();
+
+  /**
+   * Get an {@link TypeStrategy} registered to some {@link TypeSignature#getComplexTypeName()}.
+   */
+  @Nullable
+  public static TypeStrategy<?> getComplex(String typeName)
+  {
+    return COMPLEX_STRATEGIES.get(typeName);
+  }
+
+  /**
+   * hmm... this might look familiar... (see ComplexMetrics)
+   *
+   * Register a complex type name -> {@link TypeStrategy} mapping.
+   *
+   * If the specified type name is already used and the supplied {@link TypeStrategy} is not of the
+   * same type as the existing value in the map for said key, an {@link ISE} is thrown.
+   *
+   * @param strategy The {@link TypeStrategy} object to be associated with the 'type' in the map.
+   */
+  public static void registerComplex(String typeName, TypeStrategy<?> strategy)
+  {
+    Preconditions.checkNotNull(typeName);
+    COMPLEX_STRATEGIES.compute(typeName, (key, value) -> {
+      if (value == null) {
+        return strategy;
+      } else {
+        if (!value.getClass().getName().equals(strategy.getClass().getName())) {
+          throw new ISE(
+              "Incompatible strategy for type[%s] already exists. Expected [%s], found [%s].",
+              key,
+              strategy.getClass().getName(),
+              value.getClass().getName()
+          );
+        } else {
+          return value;
+        }
+      }
+    });
+  }
+
+  /**
+   * Clear and set the 'null' byte of a nullable value to {@link NullHandling#IS_NULL_BYTE} to a {@link ByteBuffer} at
+   * the supplied position. This method does not change the buffer position, limit, or mark, because it does not expect
+   * to own the buffer given to it (i.e. buffer aggs)
+   *
+   * Nullable types are stored with a leading byte to indicate if the value is null, followed by the value bytes
+   * (if not null)
+   *
+   * layout: | null (byte) | value |
+   *
+   * @return number of bytes written (always 1)
+   */
+  public static int writeNull(ByteBuffer buffer, int offset)
+  {
+    buffer.put(offset, NullHandling.IS_NULL_BYTE);
+    return 1;
+  }
+
+  /**
+   * Checks if a 'nullable' value's null byte is set to {@link NullHandling#IS_NULL_BYTE}. This method will mask the
+   * value of the null byte to only check if the null bit is set, meaning that the higher bits can be utilized for
+   * flags as necessary (e.g. using high bits to indicate if the value has been set or not for aggregators).
+   *
+   * Note that writing nullable values with the methods of {@link Types} will always clear and set the null byte to
+   * either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE}, losing any flag bits.
+   *
+   * layout: | null (byte) | value |
+   */
+  public static boolean isNullableNull(ByteBuffer buffer, int offset)
+  {
+    // use & so that callers can use the high bits of the null byte to pack additional information if necessary
+    return (buffer.get(offset) & NullHandling.IS_NULL_BYTE) == NullHandling.IS_NULL_BYTE;
+  }
+
+  /**
+   * Write a non-null long value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the long value is written in the next 8 bytes.
+   *
+   * layout: | null (byte) | long |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 9)
+   */
+  public static int writeNullableLong(ByteBuffer buffer, int offset, long value)
+  {
+    buffer.put(offset++, NullHandling.IS_NOT_NULL_BYTE);
+    buffer.putLong(offset, value);
+    return NULLABLE_LONG_SIZE;
+  }
+
+  /**
+   * Reads a non-null long value from a {@link ByteBuffer} at the supplied offset. This method should only be called
+   * if and only if {@link #isNullableNull} for the same offset returns false.
+   *
+   * layout: | null (byte) | long |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect  to own the buffer
+   * given to it (i.e. buffer aggs)
+   */
+  public static long readNullableLong(ByteBuffer buffer, int offset)
+  {
+    assert !isNullableNull(buffer, offset);
+    return buffer.getLong(offset + VALUE_OFFSET);
+  }
+
+  /**
+   * Write a non-null double value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the double value is written in the next 8 bytes.
+   *
+   * layout: | null (byte) | double |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 9)
+   */
+  public static int writeNullableDouble(ByteBuffer buffer, int offset, double value)
+  {
+    buffer.put(offset++, NullHandling.IS_NOT_NULL_BYTE);
+    buffer.putDouble(offset, value);
+    return NULLABLE_DOUBLE_SIZE;
+  }
+
+  /**
+   * Reads a non-null double value from a {@link ByteBuffer} at the supplied offset. This method should only be called
+   * if and only if {@link #isNullableNull} for the same offset returns false.
+   *
+   * layout: | null (byte) | double |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   */
+  public static double readNullableDouble(ByteBuffer buffer, int offset)
+  {
+    assert !isNullableNull(buffer, offset);
+    return buffer.getDouble(offset + VALUE_OFFSET);
+  }
+
+  /**
+   * Write a non-null float value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the float value is written in the next 4 bytes.
+   *
+   * layout: | null (byte) | float |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 5)
+   */
+  public static int writeNullableFloat(ByteBuffer buffer, int offset, float value)
+  {
+    buffer.put(offset++, NullHandling.IS_NOT_NULL_BYTE);
+    buffer.putFloat(offset, value);
+    return NULLABLE_FLOAT_SIZE;
+  }
+
+  /**
+   * Reads a non-null float value from a {@link ByteBuffer} at the supplied offset. This method should only be called
+   * if and only if {@link #isNullableNull} for the same offset returns false.
+   *
+   * layout: | null (byte) | float |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   */
+  public static float readNullableFloat(ByteBuffer buffer, int offset)
+  {
+    assert !isNullableNull(buffer, offset);
+    return buffer.getFloat(offset + VALUE_OFFSET);
+  }
+
+
+  public static final class LongTypeStrategy implements TypeStrategy<Long>
+  {
+    private static final Comparator<Long> COMPARATOR = Comparator.nullsFirst(Longs::compare);
+
+    @Override
+    public int estimateSizeBytes(Long value)
+    {
+      return Long.BYTES;
+    }
+
+    @Override
+    public Long read(ByteBuffer buffer)
+    {
+      return buffer.getLong();
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Long value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Long.BYTES;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putLong(value);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(Long o1, Long o2)
+    {
+      return COMPARATOR.compare(o1, o2);
+    }
+  }
+
+  public static final class FloatTypeStrategy implements TypeStrategy<Float>
+  {
+    private static final Comparator<Float> COMPARATOR = Comparator.nullsFirst(Floats::compare);
+
+    @Override
+    public int estimateSizeBytes(Float value)
+    {
+      return Float.BYTES;
+    }
+
+    @Override
+    public Float read(ByteBuffer buffer)
+    {
+      return buffer.getFloat();
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Float value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Float.BYTES;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putFloat(value);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(Float o1, Float o2)
+    {
+      return COMPARATOR.compare(o1, o2);
+    }
+  }
+
+  public static final class DoubleTypeStrategy implements TypeStrategy<Double>
+  {
+    private static final Comparator<Double> COMPARATOR = Comparator.nullsFirst(Double::compare);
+
+    @Override
+    public int estimateSizeBytes(Double value)
+    {
+      return Double.BYTES;
+    }
+
+    @Override
+    public Double read(ByteBuffer buffer)
+    {
+      return buffer.getDouble();
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Double value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Double.BYTES;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putDouble(value);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(Double o1, Double o2)
+    {
+      return COMPARATOR.compare(o1, o2);
+    }
+  }
+
+  public static final class StringTypeStrategy implements TypeStrategy<String>
+  {
+    // copy of lexicographical comparator
+    private static final Ordering<String> ORDERING = Ordering.from(String::compareTo).nullsFirst();
+
+    @Override
+    public int estimateSizeBytes(String value)
+    {
+      if (value == null) {
+        return 0;
+      }
+      return Integer.BYTES + StringUtils.toUtf8(value).length;
+    }
+
+    @Override
+    public String read(ByteBuffer buffer)
+    {
+      // | length (int) | bytes |
+      final int length = buffer.getInt();
+      final byte[] blob = new byte[length];
+      buffer.get(blob, 0, length);
+      return StringUtils.fromUtf8(blob);
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, String value, int maxSizeBytes)
+    {
+      final byte[] bytes = StringUtils.toUtf8(value);
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Integer.BYTES + bytes.length;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putInt(bytes.length);
+        buffer.put(bytes, 0, bytes.length);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(String s, String s2)
+    {
+      // copy of lexicographical string comparator in druid processing
+      // Avoid comparisons for equal references
+      // Assuming we mostly compare different strings, checking s.equals(s2) will only make the comparison slower.
+      //noinspection StringEquality
+      if (s == s2) {
+        return 0;
+      }
+
+      return ORDERING.compare(s, s2);
+    }
+  }
+
+  public static final class ArrayTypeStrategy implements TypeStrategy<Object[]>
+  {
+    private final Comparator<Object> elementComparator;
+    private final TypeStrategy elementStrategy;
+
+    public ArrayTypeStrategy(TypeSignature<?> type)
+    {
+      this.elementStrategy = type.getElementType().getNullableStrategy();
+      this.elementComparator = Comparator.nullsFirst(elementStrategy);
+    }
+
+    @Override
+    public int estimateSizeBytes(Object[] value)
+    {
+      if (value == null) {
+        return 0;
+      }
+      return Integer.BYTES + Arrays.stream(value).mapToInt(elementStrategy::estimateSizeBytes).sum();
+    }
+
+    @Override
+    public Object[] read(ByteBuffer buffer)
+    {
+      final int arrayLength = buffer.getInt();
+      final Object[] array = new Object[arrayLength];
+      for (int i = 0; i < arrayLength; i++) {
+        array[i] = elementStrategy.read(buffer);
+      }
+      return array;
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Object[] value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      int sizeBytes = Integer.BYTES;
+      int remaining = max - sizeBytes;
+      if (remaining < 0) {
+        return remaining;
+      }
+      int extraNeeded = 0;
+
+      buffer.putInt(value.length);
+      for (Object o : value) {
+        int written = elementStrategy.write(buffer, o, remaining);
+        if (written < 0) {
+          extraNeeded += written;
+          remaining = 0;
+        } else {
+          sizeBytes += written;
+          remaining -= sizeBytes;
+        }
+      }
+      return extraNeeded < 0 ? extraNeeded : sizeBytes;
+    }
+
+    @Override
+    public int compare(Object[] o1, Object[] o2)
+    {
+      final int iter = Math.max(o1.length, o2.length);

Review comment:
       oops - yes, good catch, will add some tests since there obviously aren't any 😅 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r779314885



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * ALSO IMPORTANT!!! This is primarily intended for writing ephemeral values within a single process, and is not
+ * especially well suited (by itself) for persistent storage of data or cross process transfer. The support typically
+ * necessary for such more persistent storage, such as tracking version of a format or endianness of the values, should
+ * be handled externally to support these use cases.
+ *
+ * All implementations of this mechanism support reading and writing ONLY non-null values. To handle nulls inline with
+ * your values, consider {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write
+ * nullable values, AND, you have enough memory to burn a full byte for every value you want to store. It will store
+ * values with a leading byte containing either {@link NullHandling#IS_NULL_BYTE} or
+ * {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate. If you have a lot of values to write and a lot of nulls,
+ * consider alternative approaches to tracking your nulls instead.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ *
+ * Implementations of this interface should be thread safe, but may not use {@link ByteBuffer} in a thread safe manner,
+ * potentially modifying positions and limits, either temporarily or permanently depending on which set of methods is
+ * called.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient.
+   *
+   * Example usage of this method is estimating heap memory usage for an aggregator or the amount of buffer which
+   * might need allocated to then {@link #write} a value
+   */
+  int estimateSizeBytes(T value);
+
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move
+   * the underlying position by the size of the value read.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  T read(ByteBuffer buffer);
+
+  /**
+   * Write a non-null value to the {@link ByteBuffer} at position {@link ByteBuffer#position()}. This will move the
+   * underlying position by the size of the value written.
+   *
+   * This method returns the number of bytes written. If writing the value would take more than 'maxSizeBytes', this
+   * method will return a negative value indicating the number of additional bytes that would be required to fully
+   * write the value. Partial results may be written to the buffer when in this state, and the position may be left
+   * at whatever point the implementation ran out of space while writing the value. Callers should save the starting
+   * position if it is necessary to 'rewind' after a partial write.
+   *
+   * Callers MUST check that the return value is positive which indicates a successful write, while a negative response
+   * a partial write.
+   *
+   * @return number of bytes written
+   */
+  int write(ByteBuffer buffer, T value, int maxSizeBytes);

Review comment:
       Should we document in the javadoc that this method can explode when `maxSizeBytes > buffer.remaining()`?

##########
File path: processing/src/main/java/org/apache/druid/segment/column/ObjectStrategyComplexTypeStrategy.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.segment.data.ObjectStrategy;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * Default implementation of {@link TypeStrategy} for all {@link org.apache.druid.segment.serde.ComplexMetricSerde}
+ * implementations that just wraps the {@link ObjectStrategy} they are required to implement.
+ *
+ * This is not likely to be the most efficient way to do things, especially since writing must first produce a byte
+ * array before it can be written to the buffer, but it is cheap and should work correctly, which is important.
+ */
+public class ObjectStrategyComplexTypeStrategy<T> implements TypeStrategy<T>
+{
+  private final ObjectStrategy<T> objectStrategy;
+
+  public ObjectStrategyComplexTypeStrategy(ObjectStrategy<T> objectStrategy)
+  {
+    this.objectStrategy = objectStrategy;
+  }
+
+  @Override
+  public int estimateSizeBytes(@Nullable T value)
+  {
+    byte[] bytes = objectStrategy.toBytes(value);
+    return bytes == null ? 0 : bytes.length;
+  }
+
+  @Override
+  public T read(ByteBuffer buffer)
+  {
+    final int complexLength = buffer.getInt();
+    ByteBuffer dupe = buffer.duplicate();
+    dupe.limit(dupe.position() + complexLength);
+    return objectStrategy.fromByteBuffer(dupe, complexLength);
+  }
+
+  @Override
+  public int write(ByteBuffer buffer, T value, int maxSizeBytes)
+  {
+    byte[] bytes = objectStrategy.toBytes(value);

Review comment:
       Should this method also call `TypeStrategies.checkMaxSize()` first?

##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategies.java
##########
@@ -0,0 +1,502 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Ordering;
+import com.google.common.primitives.Floats;
+import com.google.common.primitives.Longs;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.StringUtils;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.concurrent.ConcurrentHashMap;
+
+public class TypeStrategies
+{
+  public static final int VALUE_OFFSET = Byte.BYTES;
+  public static final int NULLABLE_LONG_SIZE = Byte.BYTES + Long.BYTES;
+  public static final int NULLABLE_DOUBLE_SIZE = Byte.BYTES + Double.BYTES;
+  public static final int NULLABLE_FLOAT_SIZE = Byte.BYTES + Float.BYTES;
+
+  public static final LongTypeStrategy LONG = new LongTypeStrategy();
+  public static final FloatTypeStrategy FLOAT = new FloatTypeStrategy();
+  public static final DoubleTypeStrategy DOUBLE = new DoubleTypeStrategy();
+  public static final StringTypeStrategy STRING = new StringTypeStrategy();
+  public static final ConcurrentHashMap<String, TypeStrategy<?>> COMPLEX_STRATEGIES = new ConcurrentHashMap<>();
+
+  /**
+   * Get an {@link TypeStrategy} registered to some {@link TypeSignature#getComplexTypeName()}.
+   */
+  @Nullable
+  public static TypeStrategy<?> getComplex(String typeName)
+  {
+    return COMPLEX_STRATEGIES.get(typeName);
+  }
+
+  /**
+   * hmm... this might look familiar... (see ComplexMetrics)
+   *
+   * Register a complex type name -> {@link TypeStrategy} mapping.
+   *
+   * If the specified type name is already used and the supplied {@link TypeStrategy} is not of the
+   * same type as the existing value in the map for said key, an {@link ISE} is thrown.
+   *
+   * @param strategy The {@link TypeStrategy} object to be associated with the 'type' in the map.
+   */
+  public static void registerComplex(String typeName, TypeStrategy<?> strategy)
+  {
+    Preconditions.checkNotNull(typeName);
+    COMPLEX_STRATEGIES.compute(typeName, (key, value) -> {
+      if (value == null) {
+        return strategy;
+      } else {
+        if (!value.getClass().getName().equals(strategy.getClass().getName())) {
+          throw new ISE(
+              "Incompatible strategy for type[%s] already exists. Expected [%s], found [%s].",
+              key,
+              strategy.getClass().getName(),
+              value.getClass().getName()
+          );
+        } else {
+          return value;
+        }
+      }
+    });
+  }
+
+  /**
+   * Clear and set the 'null' byte of a nullable value to {@link NullHandling#IS_NULL_BYTE} to a {@link ByteBuffer} at
+   * the supplied position. This method does not change the buffer position, limit, or mark, because it does not expect
+   * to own the buffer given to it (i.e. buffer aggs)
+   *
+   * Nullable types are stored with a leading byte to indicate if the value is null, followed by the value bytes
+   * (if not null)
+   *
+   * layout: | null (byte) | value |
+   *
+   * @return number of bytes written (always 1)
+   */
+  public static int writeNull(ByteBuffer buffer, int offset)
+  {
+    buffer.put(offset, NullHandling.IS_NULL_BYTE);
+    return 1;
+  }
+
+  /**
+   * Checks if a 'nullable' value's null byte is set to {@link NullHandling#IS_NULL_BYTE}. This method will mask the
+   * value of the null byte to only check if the null bit is set, meaning that the higher bits can be utilized for
+   * flags as necessary (e.g. using high bits to indicate if the value has been set or not for aggregators).
+   *
+   * Note that writing nullable values with the methods of {@link Types} will always clear and set the null byte to
+   * either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE}, losing any flag bits.
+   *
+   * layout: | null (byte) | value |
+   */
+  public static boolean isNullableNull(ByteBuffer buffer, int offset)
+  {
+    // use & so that callers can use the high bits of the null byte to pack additional information if necessary
+    return (buffer.get(offset) & NullHandling.IS_NULL_BYTE) == NullHandling.IS_NULL_BYTE;
+  }
+
+  /**
+   * Write a non-null long value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the long value is written in the next 8 bytes.
+   *
+   * layout: | null (byte) | long |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 9)
+   */
+  public static int writeNullableLong(ByteBuffer buffer, int offset, long value)

Review comment:
       nit: should it be called `writeLongWithNonNullMark` or something rather than `NullableLong`? It reads like the long parameter can be null.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r764347282



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient
+   */
+  int estimateSizeBytes(T value);
+
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move

Review comment:
       split `NullableTypeStrategy` out to be standalone so the contract isn't confusing, it wasn't really used outside of tests as a `TypeStrategy` so it doesn't seem like a loss and there is no longer ambiguity of whether or not nulls are handled so i think is an improvement




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r761079426



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient
+   */
+  int estimateSizeBytes(T value);
+
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move
+   * the underlying position by the size of the value read.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  T read(ByteBuffer buffer);
+
+  /**
+   * Write a non-null value to the {@link ByteBuffer} at position {@link ByteBuffer#position()}. This will move the
+   * underlying position by the size of the value written, and returns the number of bytes written.
+   *
+   * If writing the value would take more than 'maxSizeBytes' (or the buffer limit, whichever is smaller), this method
+   * will return a negative value indicating the number of additional bytes that would be required to fully write the
+   * value. Partial results may be written to the buffer when in this state, and the position may be left at whatever
+   * point the implementation ran out of space while writing the value.

Review comment:
       Does this mean the caller is expected to save the position? Or is the caller safe to assume it can roll back by `-retval`?
   
   i.e., should callers do this?
   
   ```
   int written = write(buf, value, maxBytes);
   if (written < 0) {
     buf.position(buf.position() + written); // roll back buf, undoing the write
   }
   ```
   
   or this?
   
   ```
   int oldPosition = buf.position();
   int written = write(buf, value, maxBytes);
   if (written < 0) {
     buf.position(oldPosition); // roll back buf, undoing the write
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
cryptoe commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r760137051



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategies.java
##########
@@ -0,0 +1,463 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Ordering;
+import com.google.common.primitives.Floats;
+import com.google.common.primitives.Longs;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.StringUtils;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.concurrent.ConcurrentHashMap;
+
+public class TypeStrategies
+{
+  public static final int VALUE_OFFSET = Byte.BYTES;
+  public static final int NULLABLE_LONG_SIZE = Byte.BYTES + Long.BYTES;
+  public static final int NULLABLE_DOUBLE_SIZE = Byte.BYTES + Double.BYTES;
+  public static final int NULLABLE_FLOAT_SIZE = Byte.BYTES + Float.BYTES;
+
+  public static final LongTypeStrategy LONG = new LongTypeStrategy();
+  public static final FloatTypeStrategy FLOAT = new FloatTypeStrategy();
+  public static final DoubleTypeStrategy DOUBLE = new DoubleTypeStrategy();
+  public static final StringTypeStrategy STRING = new StringTypeStrategy();
+  public static final ConcurrentHashMap<String, TypeStrategy<?>> COMPLEX_STRATEGIES = new ConcurrentHashMap<>();
+
+  /**
+   * Get an {@link TypeStrategy} registered to some {@link TypeSignature#getComplexTypeName()}.
+   */
+  @Nullable
+  public static TypeStrategy<?> getComplex(String typeName)
+  {
+    return COMPLEX_STRATEGIES.get(typeName);
+  }
+
+  /**
+   * hmm... this might look familiar... (see ComplexMetrics)
+   *
+   * Register a complex type name -> {@link TypeStrategy} mapping.
+   *
+   * If the specified type name is already used and the supplied {@link TypeStrategy} is not of the
+   * same type as the existing value in the map for said key, an {@link ISE} is thrown.
+   *
+   * @param strategy The {@link TypeStrategy} object to be associated with the 'type' in the map.
+   */
+  public static void registerComplex(String typeName, TypeStrategy<?> strategy)
+  {
+    Preconditions.checkNotNull(typeName);
+    COMPLEX_STRATEGIES.compute(typeName, (key, value) -> {
+      if (value == null) {
+        return strategy;
+      } else {
+        if (!value.getClass().getName().equals(strategy.getClass().getName())) {
+          throw new ISE(
+              "Incompatible strategy for type[%s] already exists. Expected [%s], found [%s].",
+              key,
+              strategy.getClass().getName(),
+              value.getClass().getName()
+          );
+        } else {
+          return value;
+        }
+      }
+    });
+  }
+
+  /**
+   * Clear and set the 'null' byte of a nullable value to {@link NullHandling#IS_NULL_BYTE} to a {@link ByteBuffer} at
+   * the supplied position. This method does not change the buffer position, limit, or mark, because it does not expect
+   * to own the buffer given to it (i.e. buffer aggs)
+   *
+   * Nullable types are stored with a leading byte to indicate if the value is null, followed by the value bytes
+   * (if not null)
+   *
+   * layout: | null (byte) | value |
+   *
+   * @return number of bytes written (always 1)
+   */
+  public static int writeNull(ByteBuffer buffer, int offset)
+  {
+    buffer.put(offset, NullHandling.IS_NULL_BYTE);
+    return 1;
+  }
+
+  /**
+   * Checks if a 'nullable' value's null byte is set to {@link NullHandling#IS_NULL_BYTE}. This method will mask the
+   * value of the null byte to only check if the null bit is set, meaning that the higher bits can be utilized for
+   * flags as necessary (e.g. using high bits to indicate if the value has been set or not for aggregators).
+   *
+   * Note that writing nullable values with the methods of {@link Types} will always clear and set the null byte to
+   * either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE}, losing any flag bits.
+   *
+   * layout: | null (byte) | value |
+   */
+  public static boolean isNullableNull(ByteBuffer buffer, int offset)
+  {
+    // use & so that callers can use the high bits of the null byte to pack additional information if necessary
+    return (buffer.get(offset) & NullHandling.IS_NULL_BYTE) == NullHandling.IS_NULL_BYTE;
+  }
+
+  /**
+   * Write a non-null long value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the long value is written in the next 8 bytes.
+   *
+   * layout: | null (byte) | long |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 9)
+   */
+  public static int writeNullableLong(ByteBuffer buffer, int offset, long value)
+  {
+    buffer.put(offset++, NullHandling.IS_NOT_NULL_BYTE);
+    buffer.putLong(offset, value);
+    return NULLABLE_LONG_SIZE;
+  }
+
+  /**
+   * Reads a non-null long value from a {@link ByteBuffer} at the supplied offset. This method should only be called
+   * if and only if {@link #isNullableNull} for the same offset returns false.
+   *
+   * layout: | null (byte) | long |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect  to own the buffer
+   * given to it (i.e. buffer aggs)
+   */
+  public static long readNullableLong(ByteBuffer buffer, int offset)
+  {
+    assert !isNullableNull(buffer, offset);
+    return buffer.getLong(offset + VALUE_OFFSET);
+  }
+
+  /**
+   * Write a non-null double value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the double value is written in the next 8 bytes.
+   *
+   * layout: | null (byte) | double |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 9)
+   */
+  public static int writeNullableDouble(ByteBuffer buffer, int offset, double value)
+  {
+    buffer.put(offset++, NullHandling.IS_NOT_NULL_BYTE);
+    buffer.putDouble(offset, value);
+    return NULLABLE_DOUBLE_SIZE;
+  }
+
+  /**
+   * Reads a non-null double value from a {@link ByteBuffer} at the supplied offset. This method should only be called
+   * if and only if {@link #isNullableNull} for the same offset returns false.
+   *
+   * layout: | null (byte) | double |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   */
+  public static double readNullableDouble(ByteBuffer buffer, int offset)
+  {
+    assert !isNullableNull(buffer, offset);
+    return buffer.getDouble(offset + VALUE_OFFSET);
+  }
+
+  /**
+   * Write a non-null float value to a {@link ByteBuffer} at the supplied offset. The first byte is always cleared and
+   * set to {@link NullHandling#IS_NOT_NULL_BYTE}, the float value is written in the next 4 bytes.
+   *
+   * layout: | null (byte) | float |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   *
+   * @return number of bytes written (always 5)
+   */
+  public static int writeNullableFloat(ByteBuffer buffer, int offset, float value)
+  {
+    buffer.put(offset++, NullHandling.IS_NOT_NULL_BYTE);
+    buffer.putFloat(offset, value);
+    return NULLABLE_FLOAT_SIZE;
+  }
+
+  /**
+   * Reads a non-null float value from a {@link ByteBuffer} at the supplied offset. This method should only be called
+   * if and only if {@link #isNullableNull} for the same offset returns false.
+   *
+   * layout: | null (byte) | float |
+   *
+   * This method does not change the buffer position, limit, or mark, because it does not expect to own the buffer
+   * given to it (i.e. buffer aggs)
+   */
+  public static float readNullableFloat(ByteBuffer buffer, int offset)
+  {
+    assert !isNullableNull(buffer, offset);
+    return buffer.getFloat(offset + VALUE_OFFSET);
+  }
+
+
+  public static final class LongTypeStrategy implements TypeStrategy<Long>
+  {
+    private static final Comparator<Long> COMPARATOR = Comparator.nullsFirst(Longs::compare);
+
+    @Override
+    public int estimateSizeBytes(Long value)
+    {
+      return Long.BYTES;
+    }
+
+    @Override
+    public Long read(ByteBuffer buffer)
+    {
+      return buffer.getLong();
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Long value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Long.BYTES;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putLong(value);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(Long o1, Long o2)
+    {
+      return COMPARATOR.compare(o1, o2);
+    }
+  }
+
+  public static final class FloatTypeStrategy implements TypeStrategy<Float>
+  {
+    private static final Comparator<Float> COMPARATOR = Comparator.nullsFirst(Floats::compare);
+
+    @Override
+    public int estimateSizeBytes(Float value)
+    {
+      return Float.BYTES;
+    }
+
+    @Override
+    public Float read(ByteBuffer buffer)
+    {
+      return buffer.getFloat();
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Float value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Float.BYTES;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putFloat(value);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(Float o1, Float o2)
+    {
+      return COMPARATOR.compare(o1, o2);
+    }
+  }
+
+  public static final class DoubleTypeStrategy implements TypeStrategy<Double>
+  {
+    private static final Comparator<Double> COMPARATOR = Comparator.nullsFirst(Double::compare);
+
+    @Override
+    public int estimateSizeBytes(Double value)
+    {
+      return Double.BYTES;
+    }
+
+    @Override
+    public Double read(ByteBuffer buffer)
+    {
+      return buffer.getDouble();
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Double value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Double.BYTES;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putDouble(value);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(Double o1, Double o2)
+    {
+      return COMPARATOR.compare(o1, o2);
+    }
+  }
+
+  public static final class StringTypeStrategy implements TypeStrategy<String>
+  {
+    // copy of lexicographical comparator
+    private static final Ordering<String> ORDERING = Ordering.from(String::compareTo).nullsFirst();
+
+    @Override
+    public int estimateSizeBytes(String value)
+    {
+      if (value == null) {
+        return 0;
+      }
+      return Integer.BYTES + StringUtils.toUtf8(value).length;
+    }
+
+    @Override
+    public String read(ByteBuffer buffer)
+    {
+      // | length (int) | bytes |
+      final int length = buffer.getInt();
+      final byte[] blob = new byte[length];
+      buffer.get(blob, 0, length);
+      return StringUtils.fromUtf8(blob);
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, String value, int maxSizeBytes)
+    {
+      final byte[] bytes = StringUtils.toUtf8(value);
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      final int sizeBytes = Integer.BYTES + bytes.length;
+      final int remaining = max - sizeBytes;
+      if (remaining >= 0) {
+        buffer.putInt(bytes.length);
+        buffer.put(bytes, 0, bytes.length);
+        return sizeBytes;
+      }
+      return remaining;
+    }
+
+    @Override
+    public int compare(String s, String s2)
+    {
+      // copy of lexicographical string comparator in druid processing
+      // Avoid comparisons for equal references
+      // Assuming we mostly compare different strings, checking s.equals(s2) will only make the comparison slower.
+      //noinspection StringEquality
+      if (s == s2) {
+        return 0;
+      }
+
+      return ORDERING.compare(s, s2);
+    }
+  }
+
+  public static final class ArrayTypeStrategy implements TypeStrategy<Object[]>
+  {
+    private final Comparator<Object> elementComparator;
+    private final TypeStrategy elementStrategy;
+
+    public ArrayTypeStrategy(TypeSignature<?> type)
+    {
+      this.elementStrategy = type.getElementType().getNullableStrategy();
+      this.elementComparator = Comparator.nullsFirst(elementStrategy);
+    }
+
+    @Override
+    public int estimateSizeBytes(Object[] value)
+    {
+      if (value == null) {
+        return 0;
+      }
+      return Integer.BYTES + Arrays.stream(value).mapToInt(elementStrategy::estimateSizeBytes).sum();
+    }
+
+    @Override
+    public Object[] read(ByteBuffer buffer)
+    {
+      final int arrayLength = buffer.getInt();
+      final Object[] array = new Object[arrayLength];
+      for (int i = 0; i < arrayLength; i++) {
+        array[i] = elementStrategy.read(buffer);
+      }
+      return array;
+    }
+
+    @Override
+    public int write(ByteBuffer buffer, Object[] value, int maxSizeBytes)
+    {
+      final int max = Math.min(buffer.limit() - buffer.position(), maxSizeBytes);
+      int sizeBytes = Integer.BYTES;
+      int remaining = max - sizeBytes;
+      if (remaining < 0) {
+        return remaining;
+      }
+      int extraNeeded = 0;
+
+      buffer.putInt(value.length);
+      for (Object o : value) {
+        int written = elementStrategy.write(buffer, o, remaining);
+        if (written < 0) {
+          extraNeeded += written;
+          remaining = 0;
+        } else {
+          sizeBytes += written;
+          remaining -= sizeBytes;
+        }
+      }
+      return extraNeeded < 0 ? extraNeeded : sizeBytes;
+    }
+
+    @Override
+    public int compare(Object[] o1, Object[] o2)
+    {
+      final int iter = Math.max(o1.length, o2.length);

Review comment:
       umm should this be math.min as might have a scenario we can get indexOutofBoundException ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r764348554



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>

Review comment:
       added more javadocs indicating that this is primarily for transient stuffs, and if you want persistence should bring your own version and endian tracking and stuff




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r764348431



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient
+   */
+  int estimateSizeBytes(T value);
+
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move
+   * the underlying position by the size of the value read.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  T read(ByteBuffer buffer);
+
+  /**
+   * Write a non-null value to the {@link ByteBuffer} at position {@link ByteBuffer#position()}. This will move the
+   * underlying position by the size of the value written, and returns the number of bytes written.
+   *
+   * If writing the value would take more than 'maxSizeBytes' (or the buffer limit, whichever is smaller), this method

Review comment:
       i had it this way so that callers that don't know or care about size can pass in an arbitrary value like Integer.MAX_VALUE, but it didn't seem very useful so i've made it an exception now, the old behavior was a bit strange coupled with the negative size written return values for when write doesn't have enough room since the caller knows that they definitely need to add that many bytes to maxSizeBytes to have enough space to complete the write

##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * IMPORTANT!!! DO NOT USE THIS FOR WRITING COLUMNS, THERE ARE VERY LIKELY FAR BETTER WAYS TO DO THIS. However, if you
+ * need to store a single value or small number of values, continue reading.
+ *
+ * Most implementations of this mechanism, support reading and writing ONLY non-null values. The exception to this is
+ * {@link NullableTypeStrategy}, which might be acceptable to use if you need to read and write nullable values, AND,
+ * you have enough memory to burn a full byte for every value you want to store. It will store values with a leading
+ * byte containing either {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate.
+ * If you have a lot of values to write and a lot of nulls, consider alternative approaches to tracking your nulls.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object, int)} and {@link #estimateSizeBytes(Object)}, default implementations are provided
+ * to set and reset buffer positions as appropriate for the offset based methods, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * Estimate the size in bytes that writing this value to memory would require. This method is not required to be
+   * exactly correct, but many implementations might be. Implementations should err on the side of over-estimating if
+   * exact sizing is not efficient
+   */
+  int estimateSizeBytes(T value);
+
+
+  /**
+   * Read a non-null value from the {@link ByteBuffer} at the current {@link ByteBuffer#position()}. This will move
+   * the underlying position by the size of the value read.
+   *
+   * The contract of this method is that any value returned from this method MUST be completely detached from the
+   * underlying {@link ByteBuffer}, since it might outlive the memory location being allocated to hold the object.
+   * In other words, if an object is memory mapped, it must be copied on heap, or relocated to another memory location
+   * that is owned by the caller with {@link #write}.
+   */
+  T read(ByteBuffer buffer);
+
+  /**
+   * Write a non-null value to the {@link ByteBuffer} at position {@link ByteBuffer#position()}. This will move the
+   * underlying position by the size of the value written, and returns the number of bytes written.
+   *
+   * If writing the value would take more than 'maxSizeBytes' (or the buffer limit, whichever is smaller), this method
+   * will return a negative value indicating the number of additional bytes that would be required to fully write the
+   * value. Partial results may be written to the buffer when in this state, and the position may be left at whatever
+   * point the implementation ran out of space while writing the value.

Review comment:
       the latter, caller should save the position before writing to handle an incomplete write, added javadocs to clarify




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r758753145



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * Implementations of this mechanism support writing both null and non-null values. When using the 'nullable' family
+ * of the read and write methods, values are stored such that the leading byte contains either
+ * {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate. The default
+ * implementations of these methods use masking to check the null bit, so flags may be used in the upper bits of the
+ * null byte.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object)} and {@link #estimateSizeBytes(Object)}, the rest provide default implementations
+ * which set the null/not null byte, and reset buffer positions as appropriate, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * The size in bytes that writing this value to memory would require, useful for constraining the values maximum size
+   *
+   * This does not include the null byte, use {@link #estimateSizeBytesNullable(Object)} instead.
+   */
+  int estimateSizeBytes(@Nullable T value);
+
+  /**
+   * The size in bytes that writing this value to memory would require, including the null byte, useful for constraining
+   * the values maximum size. If the value is null, the size will be {@link Byte#BYTES}, otherwise it will be
+   * {@link Byte#BYTES} + {@link #estimateSizeBytes(Object)}
+   */
+  default int estimateSizeBytesNullable(@Nullable T value)

Review comment:
       Thanks, I agree, I pulled the null-byte wrapper handling into a special `NullableTypeStrategy` which can be used to wrap regular `TypeStrategy` implementations (which are now not expected to process nulls), and added lots of javadocs to suggest not using it unless you've got space to burn or no other options, so hopefully it will limit use.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #11888:
URL: https://github.com/apache/druid/pull/11888#issuecomment-1009509975


   thanks for review all :+1:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cheddar commented on a change in pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
cheddar commented on a change in pull request #11888:
URL: https://github.com/apache/druid/pull/11888#discussion_r745255446



##########
File path: core/src/main/java/org/apache/druid/segment/column/TypeStrategy.java
##########
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import org.apache.druid.common.config.NullHandling;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+
+/**
+ * TypeStrategy provides value comparison and binary serialization for Druid types. This can be obtained for ANY Druid
+ * type via {@link TypeSignature#getStrategy()}.
+ *
+ * Implementations of this mechanism support writing both null and non-null values. When using the 'nullable' family
+ * of the read and write methods, values are stored such that the leading byte contains either
+ * {@link NullHandling#IS_NULL_BYTE} or {@link NullHandling#IS_NOT_NULL_BYTE} as appropriate. The default
+ * implementations of these methods use masking to check the null bit, so flags may be used in the upper bits of the
+ * null byte.
+ *
+ * This mechanism allows using the natural {@link ByteBuffer#position()} and modify the underlying position as they
+ * operate, and also random access reads are specific offets, which do not modify the underlying position. If a method
+ * accepts an offset parameter, it does not modify the position, if not, it does.
+ *
+ * The only methods implementors are required to provide are {@link #read(ByteBuffer)},
+ * {@link #write(ByteBuffer, Object)} and {@link #estimateSizeBytes(Object)}, the rest provide default implementations
+ * which set the null/not null byte, and reset buffer positions as appropriate, but may be overridden if a more
+ * optimized implementation is needed.
+ */
+public interface TypeStrategy<T> extends Comparator<T>
+{
+  /**
+   * The size in bytes that writing this value to memory would require, useful for constraining the values maximum size
+   *
+   * This does not include the null byte, use {@link #estimateSizeBytesNullable(Object)} instead.
+   */
+  int estimateSizeBytes(@Nullable T value);
+
+  /**
+   * The size in bytes that writing this value to memory would require, including the null byte, useful for constraining
+   * the values maximum size. If the value is null, the size will be {@link Byte#BYTES}, otherwise it will be
+   * {@link Byte#BYTES} + {@link #estimateSizeBytes(Object)}
+   */
+  default int estimateSizeBytesNullable(@Nullable T value)

Review comment:
       I'm not sure about hand the null handling over to the implementation here.  The strategy as I understand it is using a whole byte for whether there is a null value.  A byte for whether it is null consumes 8x the space than it really needs, this can actually add up in unfortunate ways when there are lots of columns in the result set.  
   
   For example, a "better" implementation would be to start every row with a bitmap of all of the columns that are null.  This would consume only a single bit per column rather than a byte per column and, if we were so inclined, could also be properly padded to try to enforce word-alignment.
   
   Having an implementation that does it with a whole byte as a stepping stone is maybe okay.  But, with the way this interface is built, the interface is forcing rather sub-optimal null-handling on the query engines that use this interface.  I think it would be better to perhaps declare that a size of `-1` means that the value is equivalent to `null` and have the thing external from this do something meaningful with that knowledge.
   
   We would likely also need to adjust the signature of `write` to return an `int` to indicate the number of bytes written (once again, returning a `-1` for "it was null")




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #11888:
URL: https://github.com/apache/druid/pull/11888#issuecomment-970889028


   Some high level thoughts:
   
   1) estimateSizeBytesNullable appears to be the only reference to nullability. The other methods say they deal with non-null data. That seems inconsistent. Here I think I'm in favor of what @cheddar is suggesting, that this interface should not be null-aware, and null handling should be left to the caller. (Callers shouldn't need to use this interface if they get a null.)
   
   2) To simplify life for callers that don't want to care about null handling, we could make a NullableTypeStrategy that accepts nulls and stores a null byte. It'd be composable with all the other TypeStrategy implementations, and lets certain callers stay simple while also letting other callers be fancy.
   
   3) I'm confused about the intent of estimateSizeBytes: is it meant to be a maximum size? Like, can the estimate be an underestimate? Some of the javadocs and comments seem to suggest that is the case. If it's meant to be a max, we should call it specifically "max" instead of "estimate".
   
   4) I'm not sure if anything could use it today, but, we should design this interface to support variable size serialization lengths, since we'll want it for various future data types. Those could work by having `write` take a max length, and returning -1 if it would need more than that. Then the caller could allocate more space and call `write` again. Or, maybe, `write` could return `-the_amount_of_space_it_needs`. In this case the "estimate" method really could return an estimate rather than a max.
   
   5) Even with variable size support like (4) suggest, we will still have callers that can't deal with that. So there will need to be some API that supports those callers. I'm not sure what that'd look like; maybe a getMaxBytes method that returns the max, or -1 if there really isn't a known max? And if the max is -1, then callers that only support constant size serialization can't use the API?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #11888:
URL: https://github.com/apache/druid/pull/11888


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #11888: add 'TypeStrategy' to types

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #11888:
URL: https://github.com/apache/druid/pull/11888#issuecomment-982044151


   > 1) estimateSizeBytesNullable appears to be the only reference to nullability. The other methods say they deal with non-null data. That seems inconsistent. Here I think I'm in favor of what @cheddar is suggesting, that this interface should not be null-aware, and null handling should be left to the caller. (Callers shouldn't need to use this interface if they get a null.)
   
   Oops, this was a leftover from the first round of changes from @cheddar 's comment where I moved most of the null byte handling stuff out of `TypeStrategy` itself, but forgot to move this one out. Since then, I've moved all of the null byte handling into the special `NullableTypeStrategy` that should be used for callers that want to use the null byte from `NullHandling`, along with lots of javadocs to encourage other options if available since the byte per value is quite expensive.
   
   > 2) To simplify life for callers that don't want to care about null handling, we could make a NullableTypeStrategy that accepts nulls and stores a null byte. It'd be composable with all the other TypeStrategy implementations, and lets certain callers stay simple while also letting other callers be fancy.
   
   Did this, as well as adding a `getNullableStrategy` to `TypeSignature` that will automatically wrap the output of `getStrategy` with the `NullableTypeStrategy`
   
   > 3) I'm confused about the intent of estimateSizeBytes: is it meant to be a maximum size? Like, can the estimate be an underestimate? Some of the javadocs and comments seem to suggest that is the case. If it's meant to be a max, we should call it specifically "max" instead of "estimate".
   
   Before the most recent set of changes, estimate was _actually_ being used to get the exact size of some value in bytes, in order to check that variably sized values did not exceed this maximum. This has been reworked as suggested in 4) to instead pass `maxSizeBytes` directly into the `write` family of methods, and now has been relaxed to only needing to be an estimate. The only thing still using it in the current PR is the heap based `ExpressionLambdaAggregator`, which uses it to check that the aggregator does not explode the heap in an unbounded way.
   
   > 4) I'm not sure if anything could use it today, but, we should design this interface to support variable size serialization lengths, since we'll want it for various future data types. Those could work by having write take a max length, and returning -1 if it would need more than that. Then the caller could allocate more space and call write again. Or, maybe, write could return -the_amount_of_space_it_needs. In this case the "estimate" method really could return an estimate rather than a max.
   
   Variably sized serialization was previously already supported (but required externally verifying that there was enough space with the estimate method). But, I liked the ideas here, so I have modified the `write` methods to now accept a maxSizeBytes parameter and went with `-the_amount_of_extra_space_it_needs` approach if this value (or buffer size) is exceeded, so that callers could choose to explode if the value was not able to be completely written, as well as know how much additional space would be required to fully write the value.
   
   > 5) Even with variable size support like (4) suggest, we will still have callers that can't deal with that. So there will need to be some API that supports those callers. I'm not sure what that'd look like; maybe a getMaxBytes method that returns the max, or -1 if there really isn't a known max? And if the max is -1, then callers that only support constant size serialization can't use the API?
   
   Variably sized support is in place, I'm not sure we need a `getMaxBytes` method as attempting to call write now requires specifying a max bytes, even for constant sized values, callers with constant sized serialization can either specify the correct size or some size larger than the actual size, which is sort of ugly, but also primitive numerics shouldn't really be using `TypeStrategy` directly, instead they should be using the static methods to avoid converting the java primitives, so maybe it isn't a huge deal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org