You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by "clintropolis (via GitHub)" <gi...@apache.org> on 2023/02/01 05:17:09 UTC

[GitHub] [druid] clintropolis opened a new pull request, #13732: various nested column fixes

clintropolis opened a new pull request, #13732:
URL: https://github.com/apache/druid/pull/13732

   ### Description
   This PR fixes a handful of semi-related issues, mostly to do with with nested columns, and some of them a result (or easier to encounter) after the introduction of the type 'mimic' behavior introduced in #13653.
   
   changes:
   * modified druid schema column type compution to special case COMPLEX<json> handling to choose COMPLEX<json> if any column in any segment is COMPLEX<json>
   * NestedFieldVirtualColumn can now work correctly on any type of column, returning either a column selector if a root path, or nil selector if not
   * fixed a random bug with NilVectorSelector when using a vector size larger than the default and druid.generic.useDefaultValueForNull=false would have the nulls vector set to all false instead of true
   * fixed an overly aggressive check in ExprEval.ofType when handling complex types which would try to treat any string as base64 without gracefully falling back if it was not in fact base64 encoded, along with special handling for complex<json>
   * added ExpressionVectorSelectors.castValueSelectorToObject and ExpressionVectorSelectors.castObjectSelectorToNumeric as convience methods to cast vector selectors using cast expressions without the trouble of constructing an expression. the polymorphic nature of the non-vectorized engine (and significantly larger overhead of non-vectorized expression processing) made adding similar methods for non-vectorized selectors less attractive and so have not been added at this time
   * more tests more better
   
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095438123


##########
processing/src/test/resources/simple-nested-test-data.json:
##########
@@ -1,7 +1,7 @@
 {"timestamp": "2021-01-01", "dim": "hello", "nest": {"x": 100, "y": 200, "z": 300}, "nester":{ "x": ["a", "b", "c"], "y": {"a": "a", "b": "b", "c": [1, 2, 3]}}, "variant": {"a": ["hello", "world"], "b": {"x": "hello", "y": "world"}}, "list":[{"x": 5, "y": 10}, {"x": 15, "y": 22}]}
 {"timestamp": "2021-01-01", "dim": "hello", "nester":{ "x": ["x", "y", "z"]}, "list":[{"x": 35, "y": 310}, {"x": 315, "y": 322}]}
 {"timestamp": "2021-01-01", "dim": "hello", "nest":{ "x": 300, "y":  800}, "nester": "hello"}
-{"timestamp": "2021-01-01", "dim": "hello", "nest":{ "y": 500}, "list":[{"x": 115, "y": 410}, {"x": 415, "y": 422}]}
+{"timestamp": "2021-01-01", "dim": "100", "nest":{ "y": 500}, "list":[{"x": 115, "y": 410}, {"x": 415, "y": 422}]}

Review Comment:
   test parsing numbers



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095431083


##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -168,22 +173,85 @@ public NestedDataColumnSupplier(
         );
         if (metadata.hasNulls()) {
           columnBuilder.setHasNulls(true);
-          final ByteBuffer nullIndexBuffer = loadInternalFile(mapper, NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME);
+          final ByteBuffer nullIndexBuffer = loadInternalFile(
+              mapper,
+              metadata,
+              NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME
+          );
           nullValues = metadata.getBitmapSerdeFactory().getObjectStrategy().fromByteBufferWithSize(nullIndexBuffer);
         } else {
           nullValues = metadata.getBitmapSerdeFactory().getBitmapFactory().makeEmptyImmutableBitmap();
         }
+
+        return new NestedDataColumnSupplier(
+            version,
+            metadata,
+            fields,
+            fieldInfo,
+            compressedRawColumnSupplier,
+            nullValues,
+            stringDictionary,
+            frontCodedStringDictionarySupplier,
+            longDictionarySupplier,
+            doubleDictionarySupplier,
+            columnConfig,
+            mapper,
+            simpleType
+        );

Review Comment:
   i didn't actually change anything here, i just moved the logic out of the constructor into a static method per a request of yours on a previous PR. the nested column segments are pretty light due to the use of suppliers, particularly when using `FrontCodedIndexed` for the string dictionary https://github.com/apache/druid/pull/12277#discussion_r1004350233, so they consist mostly of a reference to the underlying buffer from the mmap segment and the supplier object until materialized into an indexed for queries. I did this with heap usage specifically in mind.
   
   `GenericIndexed` is the most expensive part of segment heap usage afaict, and the limited use of it here helps keep things light, because its both materialized into an `Indexed` and also still functions like a supplier with the 'singleThreaded' method, and usually is the driver of heap size in the dumps i've seen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095577739


##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -174,7 +198,72 @@
                     .rows(ROWS)
                     .buildMMappedIndex();
 
-    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+    final QueryableIndex indexMix11 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER)

Review Comment:
   these tests are in the model of the other calcite query tests which still use parser because they predate the newer split dimensionschema and inputsource/inputformat stuff of native batch
   
   I will see if i can rework them to not use outdated stuff, and just make `InputRow` with `MapInputRowParser.parse` like all of the newer `InputFormat` implementations do



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095433036


##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@ public DimensionSelector makeDimensionSelector(
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
-      return DimensionSelector.constant(null);
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
+      return dimensionSpec.decorate(DimensionSelector.constant(null, dimensionSpec.getExtractionFn()));
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
     }
-    return column.makeDimensionSelector(parts, offset, dimensionSpec.getExtractionFn());
+
+    return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder, offset, dimensionSpec.getExtractionFn()));
+  }
+
+  private DimensionSelector makeDimensionSelectorUndecorated(
+      ColumnHolder holder,
+      ReadableOffset offset,
+      @Nullable ExtractionFn extractionFn
+  )
+  {
+    BaseColumn theColumn = holder.getColumn();
+    if (theColumn instanceof NestedDataComplexColumn) {
+      final NestedDataComplexColumn column = (NestedDataComplexColumn) theColumn;
+      return column.makeDimensionSelector(parts, offset, extractionFn);
+    }
+
+    // ok, so not a nested column, but we can still do stuff
+    if (!parts.isEmpty()) {
+      // we are being asked for a path that will never exist, so we are null selector
+      return DimensionSelector.constant(null, extractionFn);
+    }

Review Comment:
   >Is this because parts.isEmpty() means that we are looking for the root? And if we are not looking for the root then it's impossible to have something?
   
   yep, exactly, will try to clarify comments
   
   >If so, can we move the stuff that's after this if to be inside of the if (to eliminate the negation) and adjust the comments to explain the relationship between parts.isEmpty() and the "real world/query"?
   
   hah, i originally had it flipped and decided last minute to do this way so less stuff inside the if which seemed slightly more pleasing, but will switch back because is all the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] imply-cheddar commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.

imply-cheddar commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1098084877


##########
core/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -463,6 +479,13 @@ public static ExprEval bestEffortOf(@Nullable Object val)
       return ofArray(coerced.lhs, coerced.rhs);
     }
 
+    // in 'best effort' mode, we couldn't possibly use byte[] as a complex or anything else useful without type
+    // knowledge, so lets turn it into a base64 encoded string so at least something downstream can use it by decoding
+    // back into bytes
+    if (val instanceof byte[]) {
+      return new StringExprEval(StringUtils.encodeBase64String((byte[]) val));
+    }

Review Comment:
   Okay



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095435579


##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@ public DimensionSelector makeDimensionSelector(
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
-      return DimensionSelector.constant(null);
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
+      return dimensionSpec.decorate(DimensionSelector.constant(null, dimensionSpec.getExtractionFn()));
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
     }
-    return column.makeDimensionSelector(parts, offset, dimensionSpec.getExtractionFn());
+
+    return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder, offset, dimensionSpec.getExtractionFn()));
+  }
+
+  private DimensionSelector makeDimensionSelectorUndecorated(
+      ColumnHolder holder,
+      ReadableOffset offset,
+      @Nullable ExtractionFn extractionFn
+  )
+  {
+    BaseColumn theColumn = holder.getColumn();
+    if (theColumn instanceof NestedDataComplexColumn) {
+      final NestedDataComplexColumn column = (NestedDataComplexColumn) theColumn;
+      return column.makeDimensionSelector(parts, offset, extractionFn);
+    }
+
+    // ok, so not a nested column, but we can still do stuff
+    if (!parts.isEmpty()) {
+      // we are being asked for a path that will never exist, so we are null selector
+      return DimensionSelector.constant(null, extractionFn);
+    }
+
+    // the path was the 'root', we're in luck, spit out a selector that behaves the same way as a nested column

Review Comment:
   ah this comment is maybe confusing after i re-arranged some stuff. the comment is in relation to making it past the last 'if' statement that would have bailed with a nil selector if we weren't looking for the root path
   
   im checking for a dictionary encoded column here because if so, i want to wrap it in the auto casting gizmo because the string dimension selectors do not have that behavior by default. If it is not dictionary encoded, i can just use the column value selector because it is going to be a numeric column which do all implement all of the 'get number' methods
   
   i'll admit that this is possibly a bit dependent on the way things currently are and that the checks might not be adequate in the future. I guess if i flip this logic around like your other comment suggested i could make these check explicitly for either numeric columns and use value selector or string dictionary column and use the wrapped dim selector, else return nil selector



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by github-code-scanning.

github-code-scanning[bot] commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1092796694


##########
processing/src/test/java/org/apache/druid/segment/virtual/ExpressionVectorSelectorsCastTest.java:
##########
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.virtual;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.expression.TestExprMacroTable;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.QueryableIndexStorageAdapter;
+import org.apache.druid.segment.VirtualColumns;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.generator.GeneratorBasicSchemas;
+import org.apache.druid.segment.generator.GeneratorSchemaInfo;
+import org.apache.druid.segment.generator.SegmentGenerator;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.io.IOException;
+
+public class ExpressionVectorSelectorsCastTest
+{
+  private static final int ROWS_PER_SEGMENT = 10_000;
+
+  private static QueryableIndex INDEX;
+  private static Closer CLOSER;
+
+  @BeforeClass
+  public static void setupClass()
+  {
+    CLOSER = Closer.create();
+
+    final GeneratorSchemaInfo schemaInfo = GeneratorBasicSchemas.SCHEMA_MAP.get("expression-testbench");
+
+    final DataSegment dataSegment = DataSegment.builder()
+                                               .dataSource("foo")
+                                               .interval(schemaInfo.getDataInterval())
+                                               .version("1")
+                                               .shardSpec(new LinearShardSpec(0))
+                                               .size(0)
+                                               .build();
+
+    final SegmentGenerator segmentGenerator = CLOSER.register(new SegmentGenerator());
+    INDEX = CLOSER.register(
+        segmentGenerator.generate(dataSegment, schemaInfo, Granularities.HOUR, ROWS_PER_SEGMENT)
+    );
+  }
+
+  @AfterClass
+  public static void teardownClass() throws IOException
+  {
+    CLOSER.close();
+  }
+
+  @Test
+  public void testCastObjectSelectorToValueSelector()
+  {
+    testCast(INDEX, "string1", ColumnType.LONG, CLOSER);
+    testCast(INDEX, "string2", ColumnType.DOUBLE, CLOSER);
+    testCast(INDEX, "string3", ColumnType.FLOAT, CLOSER);
+  }
+
+  @Test
+  public void testCastValueSelectorSelectorToObjectSelector()
+  {
+    testCast(INDEX, "long1", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "long2", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "double1", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "double2", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "float1", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "float2", ColumnType.STRING, CLOSER);
+  }
+
+  public static void testCast(
+      QueryableIndex index,
+      String column,
+      ColumnType castTo,
+      Closer closer
+  )
+  {
+    final VirtualColumns virtualColumns = VirtualColumns.create(
+        ImmutableList.of(
+            new ExpressionVirtualColumn(
+                "v",
+                "cast(" + column + ", '" + ExpressionType.fromColumnType(castTo) + "')",
+                castTo,
+                TestExprMacroTable.INSTANCE
+            )
+        )
+    );
+    final QueryableIndexStorageAdapter storageAdapter = new QueryableIndexStorageAdapter(index);
+    VectorCursor cursor = storageAdapter.makeVectorCursor(
+        null,
+        index.getDataInterval(),
+        virtualColumns,
+        false,
+        512,
+        null
+    );
+    closer.register(cursor);
+
+    ColumnCapabilities capabilities = INDEX.getColumnCapabilities(column);
+
+    if (capabilities.isNumeric() && castTo.is(ValueType.STRING)) {
+      // numeric -> string
+      verifyNumericToString(column, castTo, cursor);
+    } else {
+      // string -> numeric
+      verifyStringToNumeric(column, castTo, cursor);
+    }
+  }
+
+  private static void verifyStringToNumeric(String column, ColumnType castTo, VectorCursor cursor)
+  {
+    VectorValueSelector selector = cursor.getColumnSelectorFactory().makeValueSelector("v");
+    VectorValueSelector castSelector = ExpressionVectorSelectors.castObjectSelectorToNumeric(
+        cursor.getColumnSelectorFactory().getReadableVectorInspector(),
+        column,
+        cursor.getColumnSelectorFactory().makeObjectSelector(column),
+        ColumnType.STRING,
+        castTo
+    );
+    while (!cursor.isDone()) {
+      boolean[] nulls;
+      boolean[] castNulls;
+      switch (castTo.getType()) {

Review Comment:
   ## Missing enum case in switch
   
   Switch statement does not have a case for [ARRAY](1).
   Switch statement does not have a case for [COMPLEX](2).
   Switch statement does not have a case for [STRING](3).
   Switch statement does not have a case for [FLOAT](4).
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4219)



##########
processing/src/test/java/org/apache/druid/segment/virtual/ExpressionVectorSelectorsCastTest.java:
##########
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.virtual;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.query.expression.TestExprMacroTable;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.QueryableIndexStorageAdapter;
+import org.apache.druid.segment.VirtualColumns;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.generator.GeneratorBasicSchemas;
+import org.apache.druid.segment.generator.GeneratorSchemaInfo;
+import org.apache.druid.segment.generator.SegmentGenerator;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.io.IOException;
+
+public class ExpressionVectorSelectorsCastTest
+{
+  private static final int ROWS_PER_SEGMENT = 10_000;
+
+  private static QueryableIndex INDEX;
+  private static Closer CLOSER;
+
+  @BeforeClass
+  public static void setupClass()
+  {
+    CLOSER = Closer.create();
+
+    final GeneratorSchemaInfo schemaInfo = GeneratorBasicSchemas.SCHEMA_MAP.get("expression-testbench");
+
+    final DataSegment dataSegment = DataSegment.builder()
+                                               .dataSource("foo")
+                                               .interval(schemaInfo.getDataInterval())
+                                               .version("1")
+                                               .shardSpec(new LinearShardSpec(0))
+                                               .size(0)
+                                               .build();
+
+    final SegmentGenerator segmentGenerator = CLOSER.register(new SegmentGenerator());
+    INDEX = CLOSER.register(
+        segmentGenerator.generate(dataSegment, schemaInfo, Granularities.HOUR, ROWS_PER_SEGMENT)
+    );
+  }
+
+  @AfterClass
+  public static void teardownClass() throws IOException
+  {
+    CLOSER.close();
+  }
+
+  @Test
+  public void testCastObjectSelectorToValueSelector()
+  {
+    testCast(INDEX, "string1", ColumnType.LONG, CLOSER);
+    testCast(INDEX, "string2", ColumnType.DOUBLE, CLOSER);
+    testCast(INDEX, "string3", ColumnType.FLOAT, CLOSER);
+  }
+
+  @Test
+  public void testCastValueSelectorSelectorToObjectSelector()
+  {
+    testCast(INDEX, "long1", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "long2", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "double1", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "double2", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "float1", ColumnType.STRING, CLOSER);
+    testCast(INDEX, "float2", ColumnType.STRING, CLOSER);
+  }
+
+  public static void testCast(
+      QueryableIndex index,
+      String column,
+      ColumnType castTo,
+      Closer closer
+  )
+  {
+    final VirtualColumns virtualColumns = VirtualColumns.create(
+        ImmutableList.of(
+            new ExpressionVirtualColumn(
+                "v",
+                "cast(" + column + ", '" + ExpressionType.fromColumnType(castTo) + "')",
+                castTo,
+                TestExprMacroTable.INSTANCE
+            )
+        )
+    );
+    final QueryableIndexStorageAdapter storageAdapter = new QueryableIndexStorageAdapter(index);
+    VectorCursor cursor = storageAdapter.makeVectorCursor(
+        null,
+        index.getDataInterval(),
+        virtualColumns,
+        false,
+        512,
+        null
+    );
+    closer.register(cursor);
+
+    ColumnCapabilities capabilities = INDEX.getColumnCapabilities(column);
+
+    if (capabilities.isNumeric() && castTo.is(ValueType.STRING)) {
+      // numeric -> string
+      verifyNumericToString(column, castTo, cursor);
+    } else {
+      // string -> numeric
+      verifyStringToNumeric(column, castTo, cursor);
+    }
+  }
+
+  private static void verifyStringToNumeric(String column, ColumnType castTo, VectorCursor cursor)
+  {
+    VectorValueSelector selector = cursor.getColumnSelectorFactory().makeValueSelector("v");
+    VectorValueSelector castSelector = ExpressionVectorSelectors.castObjectSelectorToNumeric(
+        cursor.getColumnSelectorFactory().getReadableVectorInspector(),
+        column,
+        cursor.getColumnSelectorFactory().makeObjectSelector(column),
+        ColumnType.STRING,
+        castTo
+    );
+    while (!cursor.isDone()) {
+      boolean[] nulls;
+      boolean[] castNulls;
+      switch (castTo.getType()) {
+        case LONG:
+          nulls = selector.getNullVector();
+          castNulls = castSelector.getNullVector();
+          long[] longs = selector.getLongVector();
+          long[] castLongs = castSelector.getLongVector();
+          for (int i = 0; i < selector.getCurrentVectorSize(); i++) {
+            if (nulls != null) {
+              Assert.assertEquals(nulls[i], castNulls[i]);
+            }
+            Assert.assertEquals(longs[i], castLongs[i]);
+          }
+          break;
+        case DOUBLE:
+          nulls = selector.getNullVector();
+          castNulls = selector.getNullVector();
+          double[] doubles = selector.getDoubleVector();
+          double[] castDoubles = castSelector.getDoubleVector();
+          for (int i = 0; i < selector.getCurrentVectorSize(); i++) {
+            if (nulls != null) {
+              Assert.assertEquals(nulls[i], castNulls[i]);
+            }
+            Assert.assertEquals(doubles[i], castDoubles[i], 0.0);
+          }
+          break;
+      }
+
+      cursor.advance();
+    }
+  }
+
+  private static void verifyNumericToString(String column, ColumnType castTo, VectorCursor cursor)
+  {
+    VectorObjectSelector objectSelector = cursor.getColumnSelectorFactory().makeObjectSelector("v");
+    VectorObjectSelector castSelector = ExpressionVectorSelectors.castValueSelectorToObject(
+        cursor.getColumnSelectorFactory().getReadableVectorInspector(),
+        column,
+        cursor.getColumnSelectorFactory().makeValueSelector(column),
+        cursor.getColumnSelectorFactory().getColumnCapabilities(column).toColumnType(),
+        castTo
+    );
+    while (!cursor.isDone()) {
+      switch (castTo.getType()) {

Review Comment:
   ## Missing enum case in switch
   
   Switch statement does not have a case for [ARRAY](1).
   Switch statement does not have a case for [COMPLEX](2).
   Switch statement does not have a case for [LONG](3).
   Switch statement does not have a case for [FLOAT](4).
   Switch statement does not have a case for [DOUBLE](5).
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4220)



##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +244,40 @@
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
       return DimensionSelector.constant(null);
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
+    }
+
+    BaseColumn theColumn = holder.getColumn();
+    if (theColumn instanceof NestedDataComplexColumn) {
+      final NestedDataComplexColumn column = (NestedDataComplexColumn) theColumn;
+      return column.makeDimensionSelector(parts, offset, dimensionSpec.getExtractionFn());
+    }
+
+    // ok, so not a nested column, but we can still do stuff
+    if (!parts.isEmpty()) {
+      // we are being asked for a path that will never exist, so we are null selector
+      return DimensionSelector.constant(null);
+    }
+
+    // the path was the 'root', we're in luck, spit out a selector that behaves the same way as a nested column
+    if (theColumn instanceof DictionaryEncodedColumn) {
+      final DictionaryEncodedColumn<?> column = (DictionaryEncodedColumn<?>) theColumn;
+      return new AutoCastingValueSelector(column.makeDimensionSelector(offset, dimensionSpec.getExtractionFn()));
     }
-    return column.makeDimensionSelector(parts, offset, dimensionSpec.getExtractionFn());
+    return ValueTypes.makeNumericWrappingDimensionSelector(
+        holder.getCapabilities().getType(),
+        theColumn.makeColumnValueSelector(offset),
+        dimensionSpec.getExtractionFn()

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [DimensionSpec.getExtractionFn](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4214)



##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +244,40 @@
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
       return DimensionSelector.constant(null);
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
+    }
+
+    BaseColumn theColumn = holder.getColumn();
+    if (theColumn instanceof NestedDataComplexColumn) {
+      final NestedDataComplexColumn column = (NestedDataComplexColumn) theColumn;
+      return column.makeDimensionSelector(parts, offset, dimensionSpec.getExtractionFn());

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [DimensionSpec.getExtractionFn](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4212)



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -174,7 +198,72 @@
                     .rows(ROWS)
                     .buildMMappedIndex();
 
-    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+    final QueryableIndex indexMix11 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER)
+                            .withRollup(false)
+                            .build()
+                    )
+                    .rows(ROWS)
+                    .buildMMappedIndex();
+
+    final QueryableIndex indexMix12 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER_MIX)
+                            .withRollup(false)
+                            .build()
+                    )
+                    .rows(ROWS)
+                    .buildMMappedIndex();
+
+    final QueryableIndex indexMix21 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER_MIX)

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Builder.withDimensionsSpec](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4217)



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -174,7 +198,72 @@
                     .rows(ROWS)
                     .buildMMappedIndex();
 
-    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+    final QueryableIndex indexMix11 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER)
+                            .withRollup(false)
+                            .build()
+                    )
+                    .rows(ROWS)
+                    .buildMMappedIndex();
+
+    final QueryableIndex indexMix12 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER_MIX)
+                            .withRollup(false)
+                            .build()
+                    )
+                    .rows(ROWS)
+                    .buildMMappedIndex();
+
+    final QueryableIndex indexMix21 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER_MIX)
+                            .withRollup(false)
+                            .build()
+                    )
+                    .rows(ROWS)
+                    .buildMMappedIndex();
+
+    final QueryableIndex indexMix22 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER)

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Builder.withDimensionsSpec](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4218)



##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +244,40 @@
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
       return DimensionSelector.constant(null);
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
+    }
+
+    BaseColumn theColumn = holder.getColumn();
+    if (theColumn instanceof NestedDataComplexColumn) {
+      final NestedDataComplexColumn column = (NestedDataComplexColumn) theColumn;
+      return column.makeDimensionSelector(parts, offset, dimensionSpec.getExtractionFn());
+    }
+
+    // ok, so not a nested column, but we can still do stuff
+    if (!parts.isEmpty()) {
+      // we are being asked for a path that will never exist, so we are null selector
+      return DimensionSelector.constant(null);
+    }
+
+    // the path was the 'root', we're in luck, spit out a selector that behaves the same way as a nested column
+    if (theColumn instanceof DictionaryEncodedColumn) {
+      final DictionaryEncodedColumn<?> column = (DictionaryEncodedColumn<?>) theColumn;
+      return new AutoCastingValueSelector(column.makeDimensionSelector(offset, dimensionSpec.getExtractionFn()));

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [DimensionSpec.getExtractionFn](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4213)



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -174,7 +198,72 @@
                     .rows(ROWS)
                     .buildMMappedIndex();
 
-    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+    final QueryableIndex indexMix11 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER)

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Builder.withDimensionsSpec](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4215)



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -174,7 +198,72 @@
                     .rows(ROWS)
                     .buildMMappedIndex();
 
-    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+    final QueryableIndex indexMix11 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER)
+                            .withRollup(false)
+                            .build()
+                    )
+                    .rows(ROWS)
+                    .buildMMappedIndex();
+
+    final QueryableIndex indexMix12 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER_MIX)

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Builder.withDimensionsSpec](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4216)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.

github-code-scanning[bot] commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1096339284


##########
extensions-core/druid-bloom-filter/src/test/java/org/apache/druid/query/aggregation/bloom/sql/BloomFilterSqlAggregatorTest.java:
##########
@@ -89,18 +87,18 @@
       final Injector injector
   ) throws IOException
   {
-    InputRowParser parser = new MapInputRowParser(
-        new TimeAndDimsParseSpec(
-            new TimestampSpec("t", "iso", null),
-            new DimensionsSpec(
-                ImmutableList.<DimensionSchema>builder()
-                             .addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3")))
-                             .add(new DoubleDimensionSchema("d1"))
-                             .add(new FloatDimensionSchema("f1"))
-                             .add(new LongDimensionSchema("l1"))
-                             .build()
-            )
-        ));
+    InputRowSchema schema = new InputRowSchema(
+        new TimestampSpec("t", "iso", null),
+        new DimensionsSpec(
+            ImmutableList.<DimensionSchema>builder()
+                         .addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3")))
+                         .add(new DoubleDimensionSchema("d1"))
+                         .add(new FloatDimensionSchema("f1"))
+                         .add(new LongDimensionSchema("l1"))
+                         .build()
+        ),
+        null
+    );

Review Comment:
   ## Unread local variable
   
   Variable 'InputRowSchema schema' is never read.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4227)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095431329


##########
processing/src/main/java/org/apache/druid/segment/vector/NilVectorSelector.java:
##########
@@ -82,9 +81,11 @@ public static NilVectorSelector create(final VectorSizeInspector vectorSizeInspe
           DEFAULT_OBJECT_VECTOR
       );
     } else {
+      final boolean[] nulls = new boolean[vectorSizeInspector.getMaxVectorSize()];
+      Arrays.fill(nulls, NullHandling.sqlCompatible());

Review Comment:
   reasonable, can fix since this one happens every time one of these is requested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.

github-code-scanning[bot] commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1093990565


##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
-      return DimensionSelector.constant(null);
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
+      return dimensionSpec.decorate(DimensionSelector.constant(null));
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
+    }
+
+    return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder, offset, dimensionSpec.getExtractionFn()));

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [DimensionSpec.getExtractionFn](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4223)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.

github-code-scanning[bot] commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1094037566


##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
-      return DimensionSelector.constant(null);
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
+      return dimensionSpec.decorate(DimensionSelector.constant(null, dimensionSpec.getExtractionFn()));

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [DimensionSpec.getExtractionFn](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4224)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] imply-cheddar commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.

imply-cheddar commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095351855


##########
processing/src/main/java/org/apache/druid/segment/vector/NilVectorSelector.java:
##########
@@ -82,9 +81,11 @@ public static NilVectorSelector create(final VectorSizeInspector vectorSizeInspe
           DEFAULT_OBJECT_VECTOR
       );
     } else {
+      final boolean[] nulls = new boolean[vectorSizeInspector.getMaxVectorSize()];
+      Arrays.fill(nulls, NullHandling.sqlCompatible());

Review Comment:
   The default initialization for a native `boolean` is false, so we should really only do this if `NullHandling.sqlCompatible()` is true.



##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@ public DimensionSelector makeDimensionSelector(
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
-      return DimensionSelector.constant(null);
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
+      return dimensionSpec.decorate(DimensionSelector.constant(null, dimensionSpec.getExtractionFn()));
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
     }
-    return column.makeDimensionSelector(parts, offset, dimensionSpec.getExtractionFn());
+
+    return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder, offset, dimensionSpec.getExtractionFn()));
+  }
+
+  private DimensionSelector makeDimensionSelectorUndecorated(
+      ColumnHolder holder,
+      ReadableOffset offset,
+      @Nullable ExtractionFn extractionFn
+  )
+  {
+    BaseColumn theColumn = holder.getColumn();
+    if (theColumn instanceof NestedDataComplexColumn) {
+      final NestedDataComplexColumn column = (NestedDataComplexColumn) theColumn;
+      return column.makeDimensionSelector(parts, offset, extractionFn);
+    }
+
+    // ok, so not a nested column, but we can still do stuff
+    if (!parts.isEmpty()) {
+      // we are being asked for a path that will never exist, so we are null selector
+      return DimensionSelector.constant(null, extractionFn);
+    }

Review Comment:
   Is this because `parts.isEmpty()` means that we are looking for the root?  And if we are not looking for the root then it's impossible to have something?  If so, can we move the stuff that's after this if to be inside of the if (to eliminate the negation) and adjust the comments to explain the relationship between `parts.isEmpty()` and the "real world/query"?



##########
processing/src/test/resources/simple-nested-test-data.json:
##########
@@ -1,7 +1,7 @@
 {"timestamp": "2021-01-01", "dim": "hello", "nest": {"x": 100, "y": 200, "z": 300}, "nester":{ "x": ["a", "b", "c"], "y": {"a": "a", "b": "b", "c": [1, 2, 3]}}, "variant": {"a": ["hello", "world"], "b": {"x": "hello", "y": "world"}}, "list":[{"x": 5, "y": 10}, {"x": 15, "y": 22}]}
 {"timestamp": "2021-01-01", "dim": "hello", "nester":{ "x": ["x", "y", "z"]}, "list":[{"x": 35, "y": 310}, {"x": 315, "y": 322}]}
 {"timestamp": "2021-01-01", "dim": "hello", "nest":{ "x": 300, "y":  800}, "nester": "hello"}
-{"timestamp": "2021-01-01", "dim": "hello", "nest":{ "y": 500}, "list":[{"x": 115, "y": 410}, {"x": 415, "y": 422}]}
+{"timestamp": "2021-01-01", "dim": "100", "nest":{ "y": 500}, "list":[{"x": 115, "y": 410}, {"x": 415, "y": 422}]}

Review Comment:
   I noticed that it's still a string.  Was this change also wanting to test adding a new type and having it be a number?  Or is this being used to validate that the parsing of numbers works?



##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
-      return DimensionSelector.constant(null);
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
+      return dimensionSpec.decorate(DimensionSelector.constant(null));
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
+    }
+
+    return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder, offset, dimensionSpec.getExtractionFn()));

Review Comment:
   True statement, should be ignored in this case.



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -174,7 +198,72 @@
                     .rows(ROWS)
                     .buildMMappedIndex();
 
-    return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+    final QueryableIndex indexMix11 =
+        IndexBuilder.create()
+                    .tmpDir(temporaryFolder.newFolder())
+                    .segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+                    .schema(
+                        new IncrementalIndexSchema.Builder()
+                            .withMetrics(
+                                new CountAggregatorFactory("cnt")
+                            )
+                            .withDimensionsSpec(PARSER)

Review Comment:
   Maybe make this call with `PARSER.getParseSpec().getDimensionSpec()` instead and avoid usage of the deprecated method?



##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@ public DimensionSelector makeDimensionSelector(
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
-      return DimensionSelector.constant(null);
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
+      return dimensionSpec.decorate(DimensionSelector.constant(null, dimensionSpec.getExtractionFn()));
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
     }
-    return column.makeDimensionSelector(parts, offset, dimensionSpec.getExtractionFn());
+
+    return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder, offset, dimensionSpec.getExtractionFn()));
+  }
+
+  private DimensionSelector makeDimensionSelectorUndecorated(
+      ColumnHolder holder,
+      ReadableOffset offset,
+      @Nullable ExtractionFn extractionFn
+  )
+  {
+    BaseColumn theColumn = holder.getColumn();
+    if (theColumn instanceof NestedDataComplexColumn) {
+      final NestedDataComplexColumn column = (NestedDataComplexColumn) theColumn;
+      return column.makeDimensionSelector(parts, offset, extractionFn);
+    }
+
+    // ok, so not a nested column, but we can still do stuff
+    if (!parts.isEmpty()) {
+      // we are being asked for a path that will never exist, so we are null selector
+      return DimensionSelector.constant(null, extractionFn);
+    }
+
+    // the path was the 'root', we're in luck, spit out a selector that behaves the same way as a nested column

Review Comment:
   why does the `instanceof DictionaryEncoded` check tell us that the path was the root?  Or is that part intended to be explaining the if before this?



##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -258,18 +299,37 @@ public ColumnValueSelector<?> makeColumnValueSelector(
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, this.columnName);
-    if (column == null) {
+    ColumnHolder holder = columnSelector.getColumnHolder(this.columnName);
+    if (holder == null) {
       return NilColumnValueSelector.instance();
     }
+    BaseColumn theColumn = holder.getColumn();
 
-    // processFromRaw is true, that means JSON_QUERY, which can return partial results, otherwise this virtual column
-    // is JSON_VALUE which only returns literals, so we can use the nested columns value selector
-    return processFromRaw
-           ? new RawFieldColumnSelector(column.makeColumnValueSelector(offset), parts)
-           : hasNegativeArrayIndex
-             ? new RawFieldLiteralColumnValueSelector(column.makeColumnValueSelector(offset), parts)
-             : column.makeColumnValueSelector(parts, offset);
+    if (processFromRaw || hasNegativeArrayIndex) {
+      // if the path has negative array elements, or has set the flag to process 'raw' values explicitly (JSON_QUERY),
+      // then we use the 'raw' processing of the RawFieldColumnSelector/RawFieldLiteralColumnValueSelector created
+      // with the column selector factory instead of using the optimized nested field column
+      return null;
+    }

Review Comment:
   I understand why processFromRaw takes us into this if, I don't understand why `hasNegativeArrayIndex` does?



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -168,22 +173,85 @@ public NestedDataColumnSupplier(
         );
         if (metadata.hasNulls()) {
           columnBuilder.setHasNulls(true);
-          final ByteBuffer nullIndexBuffer = loadInternalFile(mapper, NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME);
+          final ByteBuffer nullIndexBuffer = loadInternalFile(
+              mapper,
+              metadata,
+              NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME
+          );
           nullValues = metadata.getBitmapSerdeFactory().getObjectStrategy().fromByteBufferWithSize(nullIndexBuffer);
         } else {
           nullValues = metadata.getBitmapSerdeFactory().getBitmapFactory().makeEmptyImmutableBitmap();
         }
+
+        return new NestedDataColumnSupplier(
+            version,
+            metadata,
+            fields,
+            fieldInfo,
+            compressedRawColumnSupplier,
+            nullValues,
+            stringDictionary,
+            frontCodedStringDictionarySupplier,
+            longDictionarySupplier,
+            doubleDictionarySupplier,
+            columnConfig,
+            mapper,
+            simpleType
+        );

Review Comment:
   When building this, it appears like we are loading various files to then use in the suppliers.  Given the recent sensitivity around the amount of data held on JVM heap per segment, I'm curious if we are being as judicious as possible with our JVM memory.  We have to draw a fine line between performance and not performing hard-work multiple times and JVM heap, 'cause we can't fit everything in it.
   
   With the changes here, I would have to dig in further to really get a sense for how it is changing the JVM heap memory usage, so I'm just wondering what your thought process was around these changes and their impact to heap memory?



##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/SegmentMetadataCache.java:
##########
@@ -798,7 +798,20 @@ DatasourceTable.PhysicalDatasourceMetadata buildDruidTable(final String dataSour
                 rowSignature.getColumnType(column)
                             .orElseThrow(() -> new ISE("Encountered null type for column [%s]", column));
 
-            columnTypes.putIfAbsent(column, columnType);
+            columnTypes.compute(column, (c, existingType) -> {
+              if (existingType == null) {
+                return columnType;
+              }
+              if (columnType == null) {
+                return existingType;
+              }
+              // if any are json, are all json
+              if (NestedDataComplexTypeSerde.TYPE.equals(columnType) || NestedDataComplexTypeSerde.TYPE.equals(existingType)) {
+                return NestedDataComplexTypeSerde.TYPE;
+              }
+              // "existing type" is the 'newest' type, since we iterate the segments list by newest start time
+              return existingType;
+            });

Review Comment:
   This is a question not actually about your code, but the code that existed before it...  Given that this is using a compute to put values into the map, how do these things change?



##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
-      return DimensionSelector.constant(null);
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
+      return dimensionSpec.decorate(DimensionSelector.constant(null, dimensionSpec.getExtractionFn()));

Review Comment:
   True statement, should be ignored in this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095437897


##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -258,18 +299,37 @@ public ColumnValueSelector<?> makeColumnValueSelector(
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, this.columnName);
-    if (column == null) {
+    ColumnHolder holder = columnSelector.getColumnHolder(this.columnName);
+    if (holder == null) {
       return NilColumnValueSelector.instance();
     }
+    BaseColumn theColumn = holder.getColumn();
 
-    // processFromRaw is true, that means JSON_QUERY, which can return partial results, otherwise this virtual column
-    // is JSON_VALUE which only returns literals, so we can use the nested columns value selector
-    return processFromRaw
-           ? new RawFieldColumnSelector(column.makeColumnValueSelector(offset), parts)
-           : hasNegativeArrayIndex
-             ? new RawFieldLiteralColumnValueSelector(column.makeColumnValueSelector(offset), parts)
-             : column.makeColumnValueSelector(parts, offset);
+    if (processFromRaw || hasNegativeArrayIndex) {
+      // if the path has negative array elements, or has set the flag to process 'raw' values explicitly (JSON_QUERY),
+      // then we use the 'raw' processing of the RawFieldColumnSelector/RawFieldLiteralColumnValueSelector created
+      // with the column selector factory instead of using the optimized nested field column
+      return null;
+    }

Review Comment:
   ah maybe I could explain why a bit better here. the reason is that jsonpath expressions with a negative array element mean "from the end of the array", so `$.x[-1]` returns the last element of the array x, and so on. We don't currently have an easy way to know how many elements are in an array for each row to be able to use the nested literal columns for those elements, so `hasNegativeArrayIndex`  forces us to fall back to the 'raw' data processing which can get the whole array and find the appropriate element that way instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] imply-cheddar commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.

imply-cheddar commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095447120


##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@ public DimensionSelector makeDimensionSelector(
       ReadableOffset offset
   )
   {
-    final NestedDataComplexColumn column = NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
-    if (column == null) {
-      // complex column itself didn't exist
-      return DimensionSelector.constant(null);
+    ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+    if (holder == null) {
+      // column doesn't exist
+      return dimensionSpec.decorate(DimensionSelector.constant(null, dimensionSpec.getExtractionFn()));
     }
     if (hasNegativeArrayIndex) {
-      return new FieldDimensionSelector(
-          new RawFieldLiteralColumnValueSelector(
-              column.makeColumnValueSelector(offset),
-              parts
-          )
-      );
+      // if the path has negative array elements, then we have to use the 'raw' processing of the FieldDimensionSelector
+      // created with the column selector factory instead of using the optimized nested field column, return null
+      // to fall through
+      return null;
     }
-    return column.makeDimensionSelector(parts, offset, dimensionSpec.getExtractionFn());
+
+    return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder, offset, dimensionSpec.getExtractionFn()));
+  }
+
+  private DimensionSelector makeDimensionSelectorUndecorated(
+      ColumnHolder holder,
+      ReadableOffset offset,
+      @Nullable ExtractionFn extractionFn
+  )
+  {
+    BaseColumn theColumn = holder.getColumn();
+    if (theColumn instanceof NestedDataComplexColumn) {
+      final NestedDataComplexColumn column = (NestedDataComplexColumn) theColumn;
+      return column.makeDimensionSelector(parts, offset, extractionFn);
+    }
+
+    // ok, so not a nested column, but we can still do stuff
+    if (!parts.isEmpty()) {
+      // we are being asked for a path that will never exist, so we are null selector
+      return DimensionSelector.constant(null, extractionFn);
+    }

Review Comment:
   I understand the want for less lines inside of a block, but my rule is that the fewest negations is always easier to read.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] imply-cheddar commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.

imply-cheddar commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1098041858


##########
core/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -463,6 +479,13 @@ public static ExprEval bestEffortOf(@Nullable Object val)
       return ofArray(coerced.lhs, coerced.rhs);
     }
 
+    // in 'best effort' mode, we couldn't possibly use byte[] as a complex or anything else useful without type
+    // knowledge, so lets turn it into a base64 encoded string so at least something downstream can use it by decoding
+    // back into bytes
+    if (val instanceof byte[]) {
+      return new StringExprEval(StringUtils.encodeBase64String((byte[]) val));
+    }

Review Comment:
   Do we even need to base64 it?  why not just keep it as a `byte[]`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1098054421


##########
core/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -463,6 +479,13 @@ public static ExprEval bestEffortOf(@Nullable Object val)
       return ofArray(coerced.lhs, coerced.rhs);
     }
 
+    // in 'best effort' mode, we couldn't possibly use byte[] as a complex or anything else useful without type
+    // knowledge, so lets turn it into a base64 encoded string so at least something downstream can use it by decoding
+    // back into bytes
+    if (val instanceof byte[]) {
+      return new StringExprEval(StringUtils.encodeBase64String((byte[]) val));
+    }

Review Comment:
   this is going _into_ the expression system, if we don't handle it like this, it ends up as "unknown complex" and is basically useless in that form since we don't have a native byte[] typed expressions, and complex types cannot be cast to anything else (not that would do anything useful here). 
   
   With nested columns, leaving it as a byte[] is a problem, since we are using this method to process inputs to determine their type, we have a [default case](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/NestedDataColumnIndexer.java#L464) for anything that leaks through that isn't `LONG` or `DOUBLE` or `STRING` that effectively calls java toString on things that makes these end up not very useful `byte[].toString`.
   
   I could change this to be a parse exception, but even if we did that, i still think it would still be useful to feed these into the expressions as a base64 encoded string rather than throwing it away. Maybe in the future it would be nice to have a byte blob type to not have to encode it, but until then, I think this is the most useful thing we can do, and its consistent with the behavior we use to handle byte[] in [`Rows.objectToStrings`](https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/data/input/Rows.java#L63) that normal string columns go through.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] imply-cheddar commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.

imply-cheddar commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095445960


##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -168,22 +173,85 @@ public NestedDataColumnSupplier(
         );
         if (metadata.hasNulls()) {
           columnBuilder.setHasNulls(true);
-          final ByteBuffer nullIndexBuffer = loadInternalFile(mapper, NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME);
+          final ByteBuffer nullIndexBuffer = loadInternalFile(
+              mapper,
+              metadata,
+              NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME
+          );
           nullValues = metadata.getBitmapSerdeFactory().getObjectStrategy().fromByteBufferWithSize(nullIndexBuffer);
         } else {
           nullValues = metadata.getBitmapSerdeFactory().getBitmapFactory().makeEmptyImmutableBitmap();
         }
+
+        return new NestedDataColumnSupplier(
+            version,
+            metadata,
+            fields,
+            fieldInfo,
+            compressedRawColumnSupplier,
+            nullValues,
+            stringDictionary,
+            frontCodedStringDictionarySupplier,
+            longDictionarySupplier,
+            doubleDictionarySupplier,
+            columnConfig,
+            mapper,
+            simpleType
+        );

Review Comment:
   Yeah, I remembered asking about the move to static, I didn't realize that this was all the exact same logic as before.  It looked like there was stuff added, but perhaps that was just me mis-reading the diff.  If this is no change in terms of the things that are pre-cached, then all is good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095440950


##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/SegmentMetadataCache.java:
##########
@@ -798,7 +798,20 @@ DatasourceTable.PhysicalDatasourceMetadata buildDruidTable(final String dataSour
                 rowSignature.getColumnType(column)
                             .orElseThrow(() -> new ISE("Encountered null type for column [%s]", column));
 
-            columnTypes.putIfAbsent(column, columnType);
+            columnTypes.compute(column, (c, existingType) -> {
+              if (existingType == null) {
+                return columnType;
+              }
+              if (columnType == null) {
+                return existingType;
+              }
+              // if any are json, are all json
+              if (NestedDataComplexTypeSerde.TYPE.equals(columnType) || NestedDataComplexTypeSerde.TYPE.equals(existingType)) {
+                return NestedDataComplexTypeSerde.TYPE;
+              }
+              // "existing type" is the 'newest' type, since we iterate the segments list by newest start time
+              return existingType;
+            });

Review Comment:
   this is an area that could use improvement probably, but currently it will do a 'refresh' that rebuilds the schema for a datasource from the available set of segments, which calls this method, so its like ...iterating over all of them to recompute the schema. See `segmentsNeedingRefresh` and `dataSourcesNeedingRebuild` and such



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] clintropolis merged pull request #13732: various nested column (and other) fixes

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.

clintropolis merged PR #13732:
URL: https://github.com/apache/druid/pull/13732


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org