You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/02/24 06:52:20 UTC

[GitHub] [druid] jihoonson opened a new pull request #12279: Store null columns in the segments

jihoonson opened a new pull request #12279:
URL: https://github.com/apache/druid/pull/12279


   ### Terminology
   
   - Column: a logical column that can be stored across multiple segments.
   - Null column: a column that has only nulls in it. Druid is aware of this column.
   - Unknown column: a column that Druid is not aware of. In other words, it is a column that Druid is not tracking at all via segment metadata or any other methods.
   
   ### Description
   
   Today, null columns are not stored at ingestion time and thus they become unknown columns once the ingestion job is done. The druid native query engine uses segment-level schema and treats unknown columns as if they were null columns; reading unknown columns returns only nulls.
   
   Druid SQL is different. Druid is using Calcite SQL planner which requires valid column information at planning time. The column information is retrieved from datasource-level schema which is dynamically discovered by merging segment schemas. As a result, users cannot query unknown columns using SQL. This is causing a couple of issues. One of the main issues is the SQL queries failing against stream ingestion from time to time. While it creates segments, the realtime task announces a realtime segment that has all columns in the ingestion spec. This realtime task thus reports the broker even null columns for the realtime segment which can be used by the Druid SQL planner. Once the segment is handed off to a historical, it announces a historical segment that does not store any null columns in it. As a result, the same SQL query will no longer work after the segment is handed off.
   
   ### Proposed solution
   
   To make the SQL planner be aware of null columns, Druid needs to track of them. This PR proposes to store those null columns in the segment just like normal columns.
   
   #### Feature flag
   
   `druid.index.task.storeEmptyColumns` is added. This is on by default. A new task context, `storeEmptyColumns`, is added which can override the system property.
   
   #### Ingestion tasks
   
   When `storeEmptyColumns` is set, the task stores every column specified in `DimensionsSpec` in the segments that it creates. This applies to all kinds of ingestion except for Hadoop ingestion and Tranquility.
   
   #### Segment writes/reads
   
   For null columns, Druid stores a column name, column type, number of rows, and bitmapSerdeFactory. The first two are stored in `ColumnDescriptor` and the last two are stored in `NullColumnPartSerde`. `NullColumnPartSerde` has a no-op serializer and a deserializer that can dynamically create a bitmap index and a dictionary. Finally, the null column names are stored at the end of `index.drd` separately from normal columns. This is for the compatibility for older historicals. When they read a segment that has null column stored, they won't be aware of those columns but will just ignore them without exploding. 
   
   #### Test plan
   
   - Unit tests are added in this PR to verify the compatibility for older historicals.
   - Unit tests are added in this PR to verify that null columns are stored in segments.
   - Integration tests will be added in https://github.com/apache/druid/pull/12268.
   
   #### Future work
   
   - Currently, null numeric dimensions are always being stored even without this change. I would call this a bug as 1) its behavior doesn't match with string dimensions and 2) it currently stores all nulls in the segment file along with the null bitmap index, and reads them for query processing which is unnecessary and inefficient.
   - Hadoop ingestion may be supported later.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `NullColumnPartSerde`
    * `IndexIO`
    * `IndexMergerV9`
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12279:
URL: https://github.com/apache/druid/pull/12279#discussion_r815254439



##########
File path: processing/src/main/java/org/apache/druid/segment/IndexMergerV9.java
##########
@@ -809,6 +933,63 @@ private void mergeCapabilities(
     }
   }
 
+  /**
+   * Creates a merged columnCapabilities to merge two queryableIndexes.
+   * This method first snapshots a pair of capabilities and then merges them.
+   */
+  @Nullable
+  private static ColumnCapabilitiesImpl mergeCapabilities(
+      @Nullable final ColumnCapabilities capabilities,
+      @Nullable final ColumnCapabilities other,
+      CoercionLogic coercionLogic
+  )
+  {
+    ColumnCapabilitiesImpl merged = ColumnCapabilitiesImpl.snapshot(capabilities, coercionLogic);
+    ColumnCapabilitiesImpl otherSnapshot = ColumnCapabilitiesImpl.snapshot(other, coercionLogic);
+    if (merged == null) {
+      return otherSnapshot;
+    } else if (otherSnapshot == null) {
+      return merged;
+    }
+
+    if (!Objects.equals(merged.getType(), otherSnapshot.getType())
+        || !Objects.equals(merged.getComplexTypeName(), otherSnapshot.getComplexTypeName())
+        || !Objects.equals(merged.getElementType(), otherSnapshot.getElementType())) {
+      throw new ISE(
+          "Cannot merge columns of type[%s] and [%s]",
+          merged.getType(),
+          otherSnapshot.getType()
+      );
+    }
+
+    merged.setDictionaryEncoded(merged.isDictionaryEncoded().or(otherSnapshot.isDictionaryEncoded()).isTrue());
+    merged.setHasMultipleValues(merged.hasMultipleValues().or(otherSnapshot.hasMultipleValues()).isTrue());
+    merged.setDictionaryValuesSorted(
+        merged.areDictionaryValuesSorted().and(otherSnapshot.areDictionaryValuesSorted()).isTrue()
+    );
+    merged.setDictionaryValuesUnique(
+        merged.areDictionaryValuesUnique().and(otherSnapshot.areDictionaryValuesUnique()).isTrue()
+    );
+    merged.setHasNulls(merged.hasNulls().or(other.hasNulls()).isTrue());
+    // When merging persisted queryableIndexes in the same ingestion job,
+    // all queryableIndexes should have the exact same hasBitmapIndexes flag set which is set in the ingestionSpec.
+    // One exception is null-only columns as they always do NOT have bitmap indexes no matter whether the flag is set
+    // in the ingestionSpec. As a result, the mismatch checked in the if clause below can happen
+    // when one of the columnCapability is from a real column and another is from a null-only column.
+    // See NullColumnPartSerde for how columnCapability is created for null-only columns.
+    // When the mismatch is found, we prefer the flag set in the ingestionSpec over
+    // the columnCapability of null-only columns.
+    if (merged.hasBitmapIndexes() != otherSnapshot.hasBitmapIndexes()) {
+      merged.setHasBitmapIndexes(false);
+    }

Review comment:
       This is actually the method that I migrated from `ColumnCapabilities.merge()` and modified this line. `ColumnCapabilities.merge()` is being used only for merging segments in `IndexMergerV9` today, and I don't want other people to mistakenly use it after I modify this line.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12279:
URL: https://github.com/apache/druid/pull/12279#discussion_r833643573



##########
File path: indexing-service/src/test/java/org/apache/druid/indexing/common/task/IndexTaskTest.java
##########
@@ -213,6 +213,117 @@ public void setup() throws IOException
     taskRunner = new TestTaskRunner();
   }
 
+  @Test
+  public void testIngestNullOnlyColumns() throws Exception
+  {
+    File tmpDir = temporaryFolder.newFolder();
+
+    File tmpFile = File.createTempFile("druid", "index", tmpDir);
+
+    try (BufferedWriter writer = Files.newWriter(tmpFile, StandardCharsets.UTF_8)) {
+      writer.write("2014-01-01T00:00:10Z,,\n");
+      writer.write("2014-01-01T01:00:20Z,,\n");
+      writer.write("2014-01-01T02:00:30Z,,\n");
+    }
+
+    IndexTask indexTask = new IndexTask(
+        null,
+        null,
+        new IndexIngestionSpec(
+            new DataSchema(
+                "test-json",
+                DEFAULT_TIMESTAMP_SPEC,
+                new DimensionsSpec(
+                    ImmutableList.of(
+                        new StringDimensionSchema("ts"),
+                        new StringDimensionSchema("dim"),
+                        new LongDimensionSchema("valDim")
+                    )
+                ),
+                new AggregatorFactory[]{new LongSumAggregatorFactory("valMet", "val")},
+                new UniformGranularitySpec(
+                    Granularities.DAY,
+                    Granularities.MINUTE,
+                    Collections.singletonList(Intervals.of("2014/P1D"))
+                ),
+                null
+            ),
+            new IndexIOConfig(
+                null,
+                new LocalInputSource(tmpDir, "druid*"),
+                DEFAULT_INPUT_FORMAT,
+                false,
+                false
+            ),
+            createTuningConfigWithMaxRowsPerSegment(10, true)
+        ),
+        null
+    );
+
+    Assert.assertFalse(indexTask.supportsQueries());
+
+    final List<DataSegment> segments = runTask(indexTask).rhs;
+    Assert.assertEquals(1, segments.size());
+    Assert.assertEquals(ImmutableList.of("ts", "dim", "valDim"), segments.get(0).getDimensions());
+    Assert.assertEquals(ImmutableList.of("valMet"), segments.get(0).getMetrics());
+  }
+
+  @Test
+  public void testIngestNullOnlyColumns_storeEmptyColumnsOff_shouldNotStoreEmptyColumns() throws Exception
+  {
+    File tmpDir = temporaryFolder.newFolder();
+
+    File tmpFile = File.createTempFile("druid", "index", tmpDir);
+
+    try (BufferedWriter writer = Files.newWriter(tmpFile, StandardCharsets.UTF_8)) {
+      writer.write("2014-01-01T00:00:10Z,,\n");
+      writer.write("2014-01-01T01:00:20Z,,\n");
+      writer.write("2014-01-01T02:00:30Z,,\n");
+    }
+
+    IndexTask indexTask = new IndexTask(
+        null,
+        null,
+        new IndexIngestionSpec(
+            new DataSchema(
+                "test-json",
+                DEFAULT_TIMESTAMP_SPEC,
+                new DimensionsSpec(
+                    ImmutableList.of(
+                        new StringDimensionSchema("ts"),
+                        new StringDimensionSchema("dim"),
+                        new LongDimensionSchema("valDim")
+                    )
+                ),
+                new AggregatorFactory[]{new LongSumAggregatorFactory("valMet", "val")},
+                new UniformGranularitySpec(
+                    Granularities.DAY,
+                    Granularities.MINUTE,
+                    Collections.singletonList(Intervals.of("2014/P1D"))
+                ),
+                null
+            ),
+            new IndexIOConfig(
+                null,
+                new LocalInputSource(tmpDir, "druid*"),
+                DEFAULT_INPUT_FORMAT,
+                false,
+                false
+            ),
+            createTuningConfigWithMaxRowsPerSegment(10, true)
+        ),
+        ImmutableMap.of(Tasks.STORE_EMPTY_COLUMNS_KEY, false)
+    );
+
+    Assert.assertFalse(indexTask.supportsQueries());
+
+    final List<DataSegment> segments = runTask(indexTask).rhs;
+    Assert.assertEquals(1, segments.size());
+    // only empty string dimensions are ignored currently
+    Assert.assertEquals(ImmutableList.of("ts", "valDim"), segments.get(0).getDimensions());
+    Assert.assertEquals(ImmutableList.of("valMet"), segments.get(0).getMetrics());

Review comment:
       Null metrics are always being stored as either 0s (default mode) or nulls (sql-compatible mode). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson merged pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #12279:
URL: https://github.com/apache/druid/pull/12279


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12279:
URL: https://github.com/apache/druid/pull/12279#discussion_r815254069



##########
File path: processing/src/main/java/org/apache/druid/segment/serde/NullColumnPartSerde.java
##########
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.serde;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Suppliers;
+import org.apache.druid.java.util.common.io.smoosh.FileSmoosher;
+import org.apache.druid.query.extraction.ExtractionFn;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.NilColumnValueSelector;
+import org.apache.druid.segment.column.BitmapIndex;
+import org.apache.druid.segment.column.BitmapIndexes;
+import org.apache.druid.segment.column.DictionaryEncodedColumn;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.BitmapSerdeFactory;
+import org.apache.druid.segment.data.IndexedInts;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.vector.MultiValueDimensionVectorSelector;
+import org.apache.druid.segment.vector.NilVectorSelector;
+import org.apache.druid.segment.vector.ReadableVectorOffset;
+import org.apache.druid.segment.vector.SingleValueDimensionVectorSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.channels.WritableByteChannel;
+import java.util.Objects;
+
+/**
+ * A ColumnPartSerde to read and write null-only columns.
+ * Its serializer is no-op as nothing is stored for null-only columns.
+ * Its deserializer creates necessary column metadata and indexes when the column is read.
+ */
+public class NullColumnPartSerde implements ColumnPartSerde
+{
+  private static final Serializer NOOP_SERIALIZER = new Serializer()
+  {
+    @Override
+    public long getSerializedSize()
+    {
+      return 0;
+    }
+
+    @Override
+    public void writeTo(WritableByteChannel channel, FileSmoosher smoosher)
+    {
+    }
+  };
+
+  private final int numRows;
+  private final BitmapSerdeFactory bitmapSerdeFactory;
+  private final NullNumericColumn nullNumericColumn;
+  private final NullDictionaryEncodedColumn nullDictionaryEncodedColumn;
+  private final BitmapIndex bitmapIndex;
+
+  @JsonCreator
+  public NullColumnPartSerde(
+      @JsonProperty("numRows") int numRows,
+      @JsonProperty("bitmapSerdeFactory") BitmapSerdeFactory bitmapSerdeFactory
+  )
+  {
+    this.numRows = numRows;
+    this.bitmapSerdeFactory = bitmapSerdeFactory;
+    this.nullNumericColumn = new NullNumericColumn();
+    this.nullDictionaryEncodedColumn = new NullDictionaryEncodedColumn();
+    this.bitmapIndex = BitmapIndexes.forNilColumn(() -> numRows, bitmapSerdeFactory.getBitmapFactory());
+  }
+
+  @JsonProperty
+  public int getNumRows()
+  {
+    return numRows;
+  }
+
+  @JsonProperty
+  public BitmapSerdeFactory getBitmapSerdeFactory()
+  {
+    return bitmapSerdeFactory;
+  }
+
+  @Nullable
+  @Override
+  public Serializer getSerializer()
+  {
+    return NOOP_SERIALIZER;
+  }
+
+  @Override
+  public Deserializer getDeserializer()
+  {
+    return (buffer, builder, columnConfig) -> builder
+        .setHasMultipleValues(false)
+        .setHasNulls(true)
+        .setFilterable(true)
+        .setBitmapIndex(Suppliers.ofInstance(bitmapIndex))
+        .setNumericColumnSupplier(Suppliers.ofInstance(nullNumericColumn))
+        .setDictionaryEncodedColumnSupplier(Suppliers.ofInstance(nullDictionaryEncodedColumn));

Review comment:
       Ah good catch. I planned to fix it but forgot until you pointed it out. I removed `NullNumericColumn`. The column type is set by `ColumnDescriptor`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on pull request #12279:
URL: https://github.com/apache/druid/pull/12279#issuecomment-1067838815


   If I understand correctly, this PR fixes #11386


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12279:
URL: https://github.com/apache/druid/pull/12279#discussion_r826540489



##########
File path: processing/src/main/java/org/apache/druid/segment/serde/NullColumnPartSerde.java
##########
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.serde;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Suppliers;
+import org.apache.druid.java.util.common.io.smoosh.FileSmoosher;
+import org.apache.druid.query.extraction.ExtractionFn;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.column.BitmapIndex;
+import org.apache.druid.segment.column.BitmapIndexes;
+import org.apache.druid.segment.column.DictionaryEncodedColumn;
+import org.apache.druid.segment.data.BitmapSerdeFactory;
+import org.apache.druid.segment.data.IndexedInts;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.vector.MultiValueDimensionVectorSelector;
+import org.apache.druid.segment.vector.NilVectorSelector;
+import org.apache.druid.segment.vector.ReadableVectorOffset;
+import org.apache.druid.segment.vector.SingleValueDimensionVectorSelector;
+
+import javax.annotation.Nullable;
+import java.nio.channels.WritableByteChannel;
+import java.util.Objects;
+
+/**
+ * A ColumnPartSerde to read and write null-only columns.
+ * Its serializer is no-op as nothing is stored for null-only columns.
+ * Its deserializer creates necessary column metadata and indexes when the column is read.
+ */
+public class NullColumnPartSerde implements ColumnPartSerde
+{
+  private static final Serializer NOOP_SERIALIZER = new Serializer()
+  {
+    @Override
+    public long getSerializedSize()
+    {
+      return 0;
+    }
+
+    @Override
+    public void writeTo(WritableByteChannel channel, FileSmoosher smoosher)
+    {
+    }
+  };
+
+  private final int numRows;
+  private final BitmapSerdeFactory bitmapSerdeFactory;
+  private final NullDictionaryEncodedColumn nullDictionaryEncodedColumn;
+  private final BitmapIndex bitmapIndex;
+
+  @JsonCreator
+  public NullColumnPartSerde(
+      @JsonProperty("numRows") int numRows,
+      @JsonProperty("bitmapSerdeFactory") BitmapSerdeFactory bitmapSerdeFactory
+  )
+  {
+    this.numRows = numRows;
+    this.bitmapSerdeFactory = bitmapSerdeFactory;
+    this.nullDictionaryEncodedColumn = new NullDictionaryEncodedColumn();
+    this.bitmapIndex = BitmapIndexes.forNilColumn(() -> numRows, bitmapSerdeFactory.getBitmapFactory());
+  }
+
+  @JsonProperty
+  public int getNumRows()
+  {
+    return numRows;
+  }
+
+  @JsonProperty
+  public BitmapSerdeFactory getBitmapSerdeFactory()
+  {
+    return bitmapSerdeFactory;
+  }
+
+  @Nullable
+  @Override
+  public Serializer getSerializer()
+  {
+    return NOOP_SERIALIZER;
+  }
+
+  @Override
+  public Deserializer getDeserializer()
+  {
+    return (buffer, builder, columnConfig) -> {
+      builder
+          .setHasMultipleValues(false)
+          .setHasNulls(true)
+          .setFilterable(true)
+          .setBitmapIndex(Suppliers.ofInstance(bitmapIndex));
+      builder.setDictionaryEncodedColumnSupplier(Suppliers.ofInstance(nullDictionaryEncodedColumn));
+    };
+  }
+
+  @Override
+  public boolean equals(Object o)
+  {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || getClass() != o.getClass()) {
+      return false;
+    }
+    NullColumnPartSerde partSerde = (NullColumnPartSerde) o;
+    return numRows == partSerde.numRows
+           && bitmapSerdeFactory.equals(partSerde.bitmapSerdeFactory);
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return Objects.hash(numRows, bitmapSerdeFactory);
+  }
+
+  private final class NullDictionaryEncodedColumn implements DictionaryEncodedColumn<String>
+  {
+    // Get the singleton instance of DimensionSelector.NullDimensionSelectorHolder.NullDimensionSelector
+    // to reuse its dictionary lookup logic.
+    private final DimensionSelector nullDimensionSelector = DimensionSelector.constant(null);
+
+    @Override
+    public int length()
+    {
+      return numRows;
+    }
+
+    @Override
+    public boolean hasMultipleValues()
+    {
+      return false;
+    }
+
+    @Override
+    public int getSingleValueRow(int rowNum)
+    {
+      throw new RuntimeException("This method should not be called for null-only columns");
+    }
+
+    @Override
+    public IndexedInts getMultiValueRow(int rowNum)
+    {
+      throw new RuntimeException("This method should not be called for null-only columns");
+    }
+
+    @Nullable
+    @Override
+    public String lookupName(int id)
+    {
+      return nullDimensionSelector.lookupName(id);
+    }
+
+    @Override
+    public int lookupId(String name)
+    {
+      return nullDimensionSelector.idLookup().lookupId(name);
+    }
+
+    @Override
+    public int getCardinality()
+    {
+      return 1;
+    }
+
+    @Override
+    public DimensionSelector makeDimensionSelector(
+        ReadableOffset offset,
+        @Nullable ExtractionFn extractionFn
+    )
+    {
+      return DimensionSelector.constant(null, extractionFn);
+    }
+
+    @Override
+    public SingleValueDimensionVectorSelector makeSingleValueDimensionVectorSelector(
+        ReadableVectorOffset vectorOffset
+    )
+    {
+      return NilVectorSelector.create(vectorOffset);
+    }
+
+    @Override
+    public MultiValueDimensionVectorSelector makeMultiValueDimensionVectorSelector(
+        ReadableVectorOffset vectorOffset
+    )
+    {
+      throw new RuntimeException("This method should not be called for null-only columns");
+    }
+

Review comment:
       I don't think they are strictly necessary in this PR since numeric columns don't use `NullColumnPartSerde` yet even when it's completely empty as it's noted in the "Future work" section of the PR description. If you agree, I'd like to add them in a follow-up PR when I fix numeric columns.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12279:
URL: https://github.com/apache/druid/pull/12279#discussion_r823514601



##########
File path: processing/src/main/java/org/apache/druid/segment/serde/NullColumnPartSerde.java
##########
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.serde;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Suppliers;
+import org.apache.druid.java.util.common.io.smoosh.FileSmoosher;
+import org.apache.druid.query.extraction.ExtractionFn;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.column.BitmapIndex;
+import org.apache.druid.segment.column.BitmapIndexes;
+import org.apache.druid.segment.column.DictionaryEncodedColumn;
+import org.apache.druid.segment.data.BitmapSerdeFactory;
+import org.apache.druid.segment.data.IndexedInts;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.vector.MultiValueDimensionVectorSelector;
+import org.apache.druid.segment.vector.NilVectorSelector;
+import org.apache.druid.segment.vector.ReadableVectorOffset;
+import org.apache.druid.segment.vector.SingleValueDimensionVectorSelector;
+
+import javax.annotation.Nullable;
+import java.nio.channels.WritableByteChannel;
+import java.util.Objects;
+
+/**
+ * A ColumnPartSerde to read and write null-only columns.
+ * Its serializer is no-op as nothing is stored for null-only columns.
+ * Its deserializer creates necessary column metadata and indexes when the column is read.
+ */
+public class NullColumnPartSerde implements ColumnPartSerde
+{
+  private static final Serializer NOOP_SERIALIZER = new Serializer()
+  {
+    @Override
+    public long getSerializedSize()
+    {
+      return 0;
+    }
+
+    @Override
+    public void writeTo(WritableByteChannel channel, FileSmoosher smoosher)
+    {
+    }
+  };
+
+  private final int numRows;
+  private final BitmapSerdeFactory bitmapSerdeFactory;
+  private final NullDictionaryEncodedColumn nullDictionaryEncodedColumn;
+  private final BitmapIndex bitmapIndex;
+
+  @JsonCreator
+  public NullColumnPartSerde(
+      @JsonProperty("numRows") int numRows,
+      @JsonProperty("bitmapSerdeFactory") BitmapSerdeFactory bitmapSerdeFactory
+  )
+  {
+    this.numRows = numRows;
+    this.bitmapSerdeFactory = bitmapSerdeFactory;
+    this.nullDictionaryEncodedColumn = new NullDictionaryEncodedColumn();
+    this.bitmapIndex = BitmapIndexes.forNilColumn(() -> numRows, bitmapSerdeFactory.getBitmapFactory());
+  }
+
+  @JsonProperty
+  public int getNumRows()
+  {
+    return numRows;
+  }
+
+  @JsonProperty
+  public BitmapSerdeFactory getBitmapSerdeFactory()
+  {
+    return bitmapSerdeFactory;
+  }
+
+  @Nullable
+  @Override
+  public Serializer getSerializer()
+  {
+    return NOOP_SERIALIZER;
+  }
+
+  @Override
+  public Deserializer getDeserializer()
+  {
+    return (buffer, builder, columnConfig) -> {
+      builder
+          .setHasMultipleValues(false)
+          .setHasNulls(true)
+          .setFilterable(true)
+          .setBitmapIndex(Suppliers.ofInstance(bitmapIndex));
+      builder.setDictionaryEncodedColumnSupplier(Suppliers.ofInstance(nullDictionaryEncodedColumn));
+    };
+  }
+
+  @Override
+  public boolean equals(Object o)
+  {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || getClass() != o.getClass()) {
+      return false;
+    }
+    NullColumnPartSerde partSerde = (NullColumnPartSerde) o;
+    return numRows == partSerde.numRows
+           && bitmapSerdeFactory.equals(partSerde.bitmapSerdeFactory);
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return Objects.hash(numRows, bitmapSerdeFactory);
+  }
+
+  private final class NullDictionaryEncodedColumn implements DictionaryEncodedColumn<String>
+  {
+    // Get the singleton instance of DimensionSelector.NullDimensionSelectorHolder.NullDimensionSelector
+    // to reuse its dictionary lookup logic.
+    private final DimensionSelector nullDimensionSelector = DimensionSelector.constant(null);
+
+    @Override
+    public int length()
+    {
+      return numRows;
+    }
+
+    @Override
+    public boolean hasMultipleValues()
+    {
+      return false;
+    }
+
+    @Override
+    public int getSingleValueRow(int rowNum)
+    {
+      throw new RuntimeException("This method should not be called for null-only columns");
+    }
+
+    @Override
+    public IndexedInts getMultiValueRow(int rowNum)
+    {
+      throw new RuntimeException("This method should not be called for null-only columns");
+    }
+
+    @Nullable
+    @Override
+    public String lookupName(int id)
+    {
+      return nullDimensionSelector.lookupName(id);
+    }
+
+    @Override
+    public int lookupId(String name)
+    {
+      return nullDimensionSelector.idLookup().lookupId(name);
+    }
+
+    @Override
+    public int getCardinality()
+    {
+      return 1;
+    }
+
+    @Override
+    public DimensionSelector makeDimensionSelector(
+        ReadableOffset offset,
+        @Nullable ExtractionFn extractionFn
+    )
+    {
+      return DimensionSelector.constant(null, extractionFn);
+    }
+
+    @Override
+    public SingleValueDimensionVectorSelector makeSingleValueDimensionVectorSelector(
+        ReadableVectorOffset vectorOffset
+    )
+    {
+      return NilVectorSelector.create(vectorOffset);
+    }
+
+    @Override
+    public MultiValueDimensionVectorSelector makeMultiValueDimensionVectorSelector(
+        ReadableVectorOffset vectorOffset
+    )
+    {
+      throw new RuntimeException("This method should not be called for null-only columns");
+    }
+

Review comment:
       oops, I think you should implement `makeVectorValueSelector` and `makeVectorObjectSelector` here as well to return `NilVectorSelector` for missing numeric and complex columns




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12279:
URL: https://github.com/apache/druid/pull/12279#discussion_r815254154



##########
File path: processing/pom.xml
##########
@@ -99,6 +99,10 @@
             <groupId>commons-net</groupId>
             <artifactId>commons-net</artifactId>
         </dependency>
+        <dependency>
+            <groupId>org.apache.commons</groupId>
+            <artifactId>commons-compress</artifactId>

Review comment:
       I remember that I added it because Travis complain about it. But it seems no longer needed. Removed it now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12279:
URL: https://github.com/apache/druid/pull/12279#discussion_r816139077



##########
File path: processing/pom.xml
##########
@@ -99,6 +99,10 @@
             <groupId>commons-net</groupId>
             <artifactId>commons-net</artifactId>
         </dependency>
+        <dependency>
+            <groupId>org.apache.commons</groupId>
+            <artifactId>commons-compress</artifactId>

Review comment:
       Turns out I was using a util class in this library to create an array list from an iterator. Fixed it to use Guava instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #12279:
URL: https://github.com/apache/druid/pull/12279#issuecomment-1049548871


   To reviewers, my apologies. I created a new PR because of some issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #12279:
URL: https://github.com/apache/druid/pull/12279#issuecomment-1076928075


   I'm merging this PR. Thanks @clintropolis @abhishekagarwal87 for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12279:
URL: https://github.com/apache/druid/pull/12279#discussion_r814486368



##########
File path: processing/pom.xml
##########
@@ -99,6 +99,10 @@
             <groupId>commons-net</groupId>
             <artifactId>commons-net</artifactId>
         </dependency>
+        <dependency>
+            <groupId>org.apache.commons</groupId>
+            <artifactId>commons-compress</artifactId>

Review comment:
       what uses this?

##########
File path: processing/src/main/java/org/apache/druid/segment/serde/NullColumnPartSerde.java
##########
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.serde;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Suppliers;
+import org.apache.druid.java.util.common.io.smoosh.FileSmoosher;
+import org.apache.druid.query.extraction.ExtractionFn;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.NilColumnValueSelector;
+import org.apache.druid.segment.column.BitmapIndex;
+import org.apache.druid.segment.column.BitmapIndexes;
+import org.apache.druid.segment.column.DictionaryEncodedColumn;
+import org.apache.druid.segment.column.NumericColumn;
+import org.apache.druid.segment.data.BitmapSerdeFactory;
+import org.apache.druid.segment.data.IndexedInts;
+import org.apache.druid.segment.data.ReadableOffset;
+import org.apache.druid.segment.vector.MultiValueDimensionVectorSelector;
+import org.apache.druid.segment.vector.NilVectorSelector;
+import org.apache.druid.segment.vector.ReadableVectorOffset;
+import org.apache.druid.segment.vector.SingleValueDimensionVectorSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.channels.WritableByteChannel;
+import java.util.Objects;
+
+/**
+ * A ColumnPartSerde to read and write null-only columns.
+ * Its serializer is no-op as nothing is stored for null-only columns.
+ * Its deserializer creates necessary column metadata and indexes when the column is read.
+ */
+public class NullColumnPartSerde implements ColumnPartSerde
+{
+  private static final Serializer NOOP_SERIALIZER = new Serializer()
+  {
+    @Override
+    public long getSerializedSize()
+    {
+      return 0;
+    }
+
+    @Override
+    public void writeTo(WritableByteChannel channel, FileSmoosher smoosher)
+    {
+    }
+  };
+
+  private final int numRows;
+  private final BitmapSerdeFactory bitmapSerdeFactory;
+  private final NullNumericColumn nullNumericColumn;
+  private final NullDictionaryEncodedColumn nullDictionaryEncodedColumn;
+  private final BitmapIndex bitmapIndex;
+
+  @JsonCreator
+  public NullColumnPartSerde(
+      @JsonProperty("numRows") int numRows,
+      @JsonProperty("bitmapSerdeFactory") BitmapSerdeFactory bitmapSerdeFactory
+  )
+  {
+    this.numRows = numRows;
+    this.bitmapSerdeFactory = bitmapSerdeFactory;
+    this.nullNumericColumn = new NullNumericColumn();
+    this.nullDictionaryEncodedColumn = new NullDictionaryEncodedColumn();
+    this.bitmapIndex = BitmapIndexes.forNilColumn(() -> numRows, bitmapSerdeFactory.getBitmapFactory());
+  }
+
+  @JsonProperty
+  public int getNumRows()
+  {
+    return numRows;
+  }
+
+  @JsonProperty
+  public BitmapSerdeFactory getBitmapSerdeFactory()
+  {
+    return bitmapSerdeFactory;
+  }
+
+  @Nullable
+  @Override
+  public Serializer getSerializer()
+  {
+    return NOOP_SERIALIZER;
+  }
+
+  @Override
+  public Deserializer getDeserializer()
+  {
+    return (buffer, builder, columnConfig) -> builder
+        .setHasMultipleValues(false)
+        .setHasNulls(true)
+        .setFilterable(true)
+        .setBitmapIndex(Suppliers.ofInstance(bitmapIndex))
+        .setNumericColumnSupplier(Suppliers.ofInstance(nullNumericColumn))
+        .setDictionaryEncodedColumnSupplier(Suppliers.ofInstance(nullDictionaryEncodedColumn));

Review comment:
       this isn't correct, both of these statements both set the column supplier in the builder, I think you probably only need the dictionary encoded column supplier, which i think should behave correctly, and then can drop the `NullNumericColumn`?
   
   Also, does this column have no type?

##########
File path: processing/src/main/java/org/apache/druid/segment/IndexMergerV9.java
##########
@@ -809,6 +933,63 @@ private void mergeCapabilities(
     }
   }
 
+  /**
+   * Creates a merged columnCapabilities to merge two queryableIndexes.
+   * This method first snapshots a pair of capabilities and then merges them.
+   */
+  @Nullable
+  private static ColumnCapabilitiesImpl mergeCapabilities(
+      @Nullable final ColumnCapabilities capabilities,
+      @Nullable final ColumnCapabilities other,
+      CoercionLogic coercionLogic
+  )
+  {
+    ColumnCapabilitiesImpl merged = ColumnCapabilitiesImpl.snapshot(capabilities, coercionLogic);
+    ColumnCapabilitiesImpl otherSnapshot = ColumnCapabilitiesImpl.snapshot(other, coercionLogic);
+    if (merged == null) {
+      return otherSnapshot;
+    } else if (otherSnapshot == null) {
+      return merged;
+    }
+
+    if (!Objects.equals(merged.getType(), otherSnapshot.getType())
+        || !Objects.equals(merged.getComplexTypeName(), otherSnapshot.getComplexTypeName())
+        || !Objects.equals(merged.getElementType(), otherSnapshot.getElementType())) {
+      throw new ISE(
+          "Cannot merge columns of type[%s] and [%s]",
+          merged.getType(),
+          otherSnapshot.getType()
+      );
+    }
+
+    merged.setDictionaryEncoded(merged.isDictionaryEncoded().or(otherSnapshot.isDictionaryEncoded()).isTrue());
+    merged.setHasMultipleValues(merged.hasMultipleValues().or(otherSnapshot.hasMultipleValues()).isTrue());
+    merged.setDictionaryValuesSorted(
+        merged.areDictionaryValuesSorted().and(otherSnapshot.areDictionaryValuesSorted()).isTrue()
+    );
+    merged.setDictionaryValuesUnique(
+        merged.areDictionaryValuesUnique().and(otherSnapshot.areDictionaryValuesUnique()).isTrue()
+    );
+    merged.setHasNulls(merged.hasNulls().or(other.hasNulls()).isTrue());
+    // When merging persisted queryableIndexes in the same ingestion job,
+    // all queryableIndexes should have the exact same hasBitmapIndexes flag set which is set in the ingestionSpec.
+    // One exception is null-only columns as they always do NOT have bitmap indexes no matter whether the flag is set
+    // in the ingestionSpec. As a result, the mismatch checked in the if clause below can happen
+    // when one of the columnCapability is from a real column and another is from a null-only column.
+    // See NullColumnPartSerde for how columnCapability is created for null-only columns.
+    // When the mismatch is found, we prefer the flag set in the ingestionSpec over
+    // the columnCapability of null-only columns.
+    if (merged.hasBitmapIndexes() != otherSnapshot.hasBitmapIndexes()) {
+      merged.setHasBitmapIndexes(false);
+    }

Review comment:
       is this line the reason that you can't use the merge function of `ColumnCapabilities`? if so that seems pretty sad, I wonder if there is a way to make it work for all uses...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #12279: Store null columns in the segments

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #12279:
URL: https://github.com/apache/druid/pull/12279#discussion_r830785559



##########
File path: indexing-service/src/test/java/org/apache/druid/indexing/common/task/IndexTaskTest.java
##########
@@ -213,6 +213,117 @@ public void setup() throws IOException
     taskRunner = new TestTaskRunner();
   }
 
+  @Test
+  public void testIngestNullOnlyColumns() throws Exception
+  {
+    File tmpDir = temporaryFolder.newFolder();
+
+    File tmpFile = File.createTempFile("druid", "index", tmpDir);
+
+    try (BufferedWriter writer = Files.newWriter(tmpFile, StandardCharsets.UTF_8)) {
+      writer.write("2014-01-01T00:00:10Z,,\n");
+      writer.write("2014-01-01T01:00:20Z,,\n");
+      writer.write("2014-01-01T02:00:30Z,,\n");
+    }
+
+    IndexTask indexTask = new IndexTask(
+        null,
+        null,
+        new IndexIngestionSpec(
+            new DataSchema(
+                "test-json",
+                DEFAULT_TIMESTAMP_SPEC,
+                new DimensionsSpec(
+                    ImmutableList.of(
+                        new StringDimensionSchema("ts"),
+                        new StringDimensionSchema("dim"),
+                        new LongDimensionSchema("valDim")
+                    )
+                ),
+                new AggregatorFactory[]{new LongSumAggregatorFactory("valMet", "val")},
+                new UniformGranularitySpec(
+                    Granularities.DAY,
+                    Granularities.MINUTE,
+                    Collections.singletonList(Intervals.of("2014/P1D"))
+                ),
+                null
+            ),
+            new IndexIOConfig(
+                null,
+                new LocalInputSource(tmpDir, "druid*"),
+                DEFAULT_INPUT_FORMAT,
+                false,
+                false
+            ),
+            createTuningConfigWithMaxRowsPerSegment(10, true)
+        ),
+        null
+    );
+
+    Assert.assertFalse(indexTask.supportsQueries());
+
+    final List<DataSegment> segments = runTask(indexTask).rhs;
+    Assert.assertEquals(1, segments.size());
+    Assert.assertEquals(ImmutableList.of("ts", "dim", "valDim"), segments.get(0).getDimensions());
+    Assert.assertEquals(ImmutableList.of("valMet"), segments.get(0).getMetrics());
+  }
+
+  @Test
+  public void testIngestNullOnlyColumns_storeEmptyColumnsOff_shouldNotStoreEmptyColumns() throws Exception
+  {
+    File tmpDir = temporaryFolder.newFolder();
+
+    File tmpFile = File.createTempFile("druid", "index", tmpDir);
+
+    try (BufferedWriter writer = Files.newWriter(tmpFile, StandardCharsets.UTF_8)) {
+      writer.write("2014-01-01T00:00:10Z,,\n");
+      writer.write("2014-01-01T01:00:20Z,,\n");
+      writer.write("2014-01-01T02:00:30Z,,\n");
+    }
+
+    IndexTask indexTask = new IndexTask(
+        null,
+        null,
+        new IndexIngestionSpec(
+            new DataSchema(
+                "test-json",
+                DEFAULT_TIMESTAMP_SPEC,
+                new DimensionsSpec(
+                    ImmutableList.of(
+                        new StringDimensionSchema("ts"),
+                        new StringDimensionSchema("dim"),
+                        new LongDimensionSchema("valDim")
+                    )
+                ),
+                new AggregatorFactory[]{new LongSumAggregatorFactory("valMet", "val")},
+                new UniformGranularitySpec(
+                    Granularities.DAY,
+                    Granularities.MINUTE,
+                    Collections.singletonList(Intervals.of("2014/P1D"))
+                ),
+                null
+            ),
+            new IndexIOConfig(
+                null,
+                new LocalInputSource(tmpDir, "druid*"),
+                DEFAULT_INPUT_FORMAT,
+                false,
+                false
+            ),
+            createTuningConfigWithMaxRowsPerSegment(10, true)
+        ),
+        ImmutableMap.of(Tasks.STORE_EMPTY_COLUMNS_KEY, false)
+    );
+
+    Assert.assertFalse(indexTask.supportsQueries());
+
+    final List<DataSegment> segments = runTask(indexTask).rhs;
+    Assert.assertEquals(1, segments.size());
+    // only empty string dimensions are ignored currently
+    Assert.assertEquals(ImmutableList.of("ts", "valDim"), segments.get(0).getDimensions());
+    Assert.assertEquals(ImmutableList.of("valMet"), segments.get(0).getMetrics());

Review comment:
       I was expecting `valMet` to be not stored. did I miss something? 

##########
File path: processing/src/main/java/org/apache/druid/segment/column/BitmapIndexes.java
##########
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.column;
+
+import com.google.common.base.Supplier;
+import com.google.common.base.Suppliers;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.collections.bitmap.BitmapFactory;
+import org.apache.druid.collections.bitmap.ImmutableBitmap;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.segment.serde.StringBitmapIndexColumnPartSupplier;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Set;
+import java.util.function.IntSupplier;
+import java.util.function.Predicate;
+
+public final class BitmapIndexes
+{
+  /**
+   * Returns a bitmapIndex for null-only columns.
+   *
+   * @param rowCountSupplier a supplier that returns the row count of the segment that the column belongs to.
+   *                         Getting from the supplier the row count can be expensive and thus should be
+   *                         evaluated lazily.
+   * @param bitmapFactory    a bitmapFactory to create a bitmapIndex.
+   */
+  public static BitmapIndex forNilColumn(IntSupplier rowCountSupplier, BitmapFactory bitmapFactory)
+  {
+    return new BitmapIndex()
+    {
+      private final Supplier<ImmutableBitmap> nullBitmapSupplier = Suppliers.memoize(
+          () -> getBitmapFactory().complement(
+              getBitmapFactory().makeEmptyImmutableBitmap(),
+              rowCountSupplier.getAsInt()
+          )
+      );
+
+      @Override
+      public int getCardinality()
+      {
+        return 1;
+      }
+
+      @Nullable
+      @Override
+      public String getValue(int index)
+      {
+        return null;
+      }
+
+      @Override
+      public boolean hasNulls()
+      {
+        return true;
+      }
+
+      @Override
+      public BitmapFactory getBitmapFactory()
+      {
+        return bitmapFactory;
+      }
+
+      /**
+       * Return -2 for non-null values to match what the {@link BitmapIndex} implementation in
+       * {@link StringBitmapIndexColumnPartSupplier}
+       * would return for {@link BitmapIndex#getIndex(String)} when there is only a single index, for the null value.
+       * i.e., return an 'insertion point' of 1 for non-null values (see {@link BitmapIndex} interface)
+       */
+      @Override
+      public int getIndex(@Nullable String value)
+      {
+        return NullHandling.isNullOrEquivalent(value) ? 0 : -2;
+      }
+
+      @Override
+      public ImmutableBitmap getBitmap(int idx)
+      {
+        if (idx == 0) {
+          return nullBitmapSupplier.get();
+        } else {
+          return bitmapFactory.makeEmptyImmutableBitmap();
+        }
+      }
+
+      @Override
+      public ImmutableBitmap getBitmapForValue(@Nullable String value)
+      {
+        if (NullHandling.isNullOrEquivalent(value)) {
+          return bitmapFactory.complement(bitmapFactory.makeEmptyImmutableBitmap(), rowCountSupplier.getAsInt());

Review comment:
       this could be nullBitmapSupplier.get()

##########
File path: processing/src/main/java/org/apache/druid/segment/IndexMergerV9.java
##########
@@ -809,6 +933,63 @@ private void mergeCapabilities(
     }
   }
 
+  /**
+   * Creates a merged columnCapabilities to merge two queryableIndexes.
+   * This method first snapshots a pair of capabilities and then merges them.
+   */
+  @Nullable
+  private static ColumnCapabilitiesImpl mergeCapabilities(
+      @Nullable final ColumnCapabilities capabilities,
+      @Nullable final ColumnCapabilities other,
+      CoercionLogic coercionLogic
+  )
+  {
+    ColumnCapabilitiesImpl merged = ColumnCapabilitiesImpl.snapshot(capabilities, coercionLogic);
+    ColumnCapabilitiesImpl otherSnapshot = ColumnCapabilitiesImpl.snapshot(other, coercionLogic);
+    if (merged == null) {
+      return otherSnapshot;
+    } else if (otherSnapshot == null) {
+      return merged;
+    }
+
+    if (!Objects.equals(merged.getType(), otherSnapshot.getType())
+        || !Objects.equals(merged.getComplexTypeName(), otherSnapshot.getComplexTypeName())
+        || !Objects.equals(merged.getElementType(), otherSnapshot.getElementType())) {
+      throw new ISE(

Review comment:
       can we add other info such as complex typename and element type name in the exception? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org