You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/03/25 02:44:07 UTC

[GitHub] [druid] clintropolis opened a new pull request #9559: error on value counter overflow instead of writing sad segments

clintropolis opened a new pull request #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559
 
 
   
   ### Description
   
   This PR fixes an integer overflow issue if too many values are written to a column within a single segment, preferring to fail at segment creation time instead of write a sad segment with negative values. 
   
   This issue was first noticed with multi-value columns, exploding at query time with an error of the form:
   
   ```
   org.apache.druid.java.util.common.IAE: Index[-131072] < 0
   	at org.apache.druid.segment.data.GenericIndexed.checkIndex(GenericIndexed.java:269)
   	at org.apache.druid.segment.data.GenericIndexed.access$300(GenericIndexed.java:79)
   	at org.apache.druid.segment.data.GenericIndexed$3.get(GenericIndexed.java:696)
   	at org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.loadBuffer(CompressedVSizeColumnarIntsSupplier.java:437)
   	at org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.get(CompressedVSizeColumnarIntsSupplier.java:350)
   	at org.apache.druid.segment.data.SliceIndexedInts.get(SliceIndexedInts.java:60)
   ```
   
   but would also occur for any serializer given more than `Integer.MAX_VALUES` rows as input.
   
   To fix, primitive column serializers now check the number of values (row count in most cases, total number of values for the case of multi-value strings) to ensure that it does not extend beyond the values that will be expressed in the column header and won't cause any issues at query time. A new exception, `ColumnCapacityExceededException` has been added which will give an error message that suggests 
   
   ```
   Too many values to store for %s column, try reducing maxRowsPerSegment
   ```
   
   where `%s` is the column name (which all the serializers now know).
   
   I added a bunch of tests to confirm that this works, and also marked them all `@Ignore` because they take forever to run. The same `IAE` error can be replicated by running `V3CompressedVSizeColumnarMultiIntsSerializerTest.testTooManyValues` without the modifications to check that overflow has occurred.
   
   I also added a `CompressedDoublesSerdeTest` that copies `CompressedFloatsSerdeTest` since i noticed there wasn't a test for double columns.
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist above are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * Primitive column serializers and related tests
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9559: error on value counter overflow instead of writing sad segments

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559#discussion_r398844534
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/data/ColumnarDoublesSerializer.java
 ##########
 @@ -29,5 +29,6 @@
 public interface ColumnarDoublesSerializer extends Serializer
 {
   void open() throws IOException;
+  int size();
 
 Review comment:
   aw, I was just making it symmetrical with [ColumnarFloatsSerializer](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/data/ColumnarFloatsSerializer.java) and [ColumnarLongsSerializer](https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/data/ColumnarLongsSerializer.java), neither of which have any docs either. 
   
   This area of the code in general is sort of barren of javadocs, so how about a bargain: unless there is something else to change on this PR, how about I just do a follow-up that adds javadocs to a bunch of this stuff?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson merged pull request #9559: error on value counter overflow instead of writing sad segments

Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on issue #9559: error on value counter overflow instead of writing sad segments

Posted by GitBox <gi...@apache.org>.
clintropolis commented on issue #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559#issuecomment-604643356
 
 
   >Can this happen? maxRowsPerSegment is an Integer in the partitionSpec
   
   It _shouldn't_ happen, but nothing in the serializer was ensuring this, so it is a defensive mechanism since this part of the code is rather far away from where `maxRowsPerSegment` is supposed to be enforced.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on issue #9559: error on value counter overflow instead of writing sad segments

Posted by GitBox <gi...@apache.org>.
suneet-s commented on issue #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559#issuecomment-604604655
 
 
   > but would also occur for any serializer given more than `Integer.MAX_VALUES` rows as input.
   
   Can this happen? `maxRowsPerSegment` is an Integer in the partitionSpec

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on issue #9559: error on value counter overflow instead of writing sad segments

Posted by GitBox <gi...@apache.org>.
clintropolis commented on issue #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559#issuecomment-604722125
 
 
   >VSizeColumnarInts#get does this calculation
   
   Ah, i forgot to mention in the description that I didn't update `VSizeColumnarIntsSerializer` on purpose since it didn't follow the same pattern as the other implementations, and stores the size in bytes in the header in a checked cast from a long so would fail at indexing time already if it was too big. Also it shouldn't really be used in practice for real segments since it isn't compressed (except maybe for intermediary segments if `indexSpecForIntermediatePersists` is configured to be uncompressed, but these will be smallish segments). Since it reads directly from a `ByteBuffer` it is limited to the maximum size of a `ByteBuffer`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9559: error on value counter overflow instead of writing sad segments

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559#discussion_r398846944
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/data/ColumnarDoublesSerializer.java
 ##########
 @@ -29,5 +29,6 @@
 public interface ColumnarDoublesSerializer extends Serializer
 {
   void open() throws IOException;
+  int size();
 
 Review comment:
   sounds like a good deal to me. I'm almost done reviewing, pressed the wrong button when I made that comment. I don't have anything else yet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9559: error on value counter overflow instead of writing sad segments

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559#discussion_r398815721
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/data/ColumnarDoublesSerializer.java
 ##########
 @@ -29,5 +29,6 @@
 public interface ColumnarDoublesSerializer extends Serializer
 {
   void open() throws IOException;
+  int size();
 
 Review comment:
   javadocs please. What does the size indicate? Number of rows in the column? Or space it's taking up. Reading the code it looks like it's the former

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9559: error on value counter overflow instead of writing sad segments

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559#discussion_r397581834
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/data/ColumnCapacityExceededException.java
 ##########
 @@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.data;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.druid.java.util.common.StringUtils;
+
+public class ColumnCapacityExceededException extends RuntimeException
+{
+  @VisibleForTesting
+  public static String formatMessage(String columnName)
+  {
+    return StringUtils.format(
+        "Too many values to store for %s column, try reducing maxRowsPerSegment",
 
 Review comment:
   +1 for this operator friendly error message

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on issue #9559: error on value counter overflow instead of writing sad segments

Posted by GitBox <gi...@apache.org>.
suneet-s commented on issue #9559: error on value counter overflow instead of writing sad segments
URL: https://github.com/apache/druid/pull/9559#issuecomment-604725370
 
 
   @clintropolis Makes sense. I'm just going to read through all the implementations now...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org