You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/09/18 16:27:45 UTC

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6031: Handle the partitioning mismatch between table config and stream

mcvsubbu commented on a change in pull request #6031:
URL: https://github.com/apache/incubator-pinot/pull/6031#discussion_r491053143



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
##########
@@ -510,6 +511,11 @@ private LLCRealtimeSegmentZKMetadata updateCommittingSegmentZKMetadata(String re
     committingSegmentZKMetadata.setIndexVersion(segmentMetadata.getVersion());
     committingSegmentZKMetadata.setTotalDocs(segmentMetadata.getTotalDocs());
 
+    // Update the partition metadata based on the segment metadata
+    // NOTE: When the stream partition changes, or the records are not properly partitioned from the stream, the
+    //       partition of the segment can be different from the stream partition.

Review comment:
       ```suggestion
       //       partition of the segment is undefined
   ```

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java
##########
@@ -39,6 +39,7 @@
   REALTIME_CONSUMPTION_EXCEPTIONS("exceptions", true),
   REALTIME_OFFSET_COMMITS("commits", true),
   REALTIME_OFFSET_COMMIT_EXCEPTIONS("exceptions", false),
+  REALTIME_PARTITION_MISMATCH("mismatch", false),

Review comment:
       Can we add the metric on the controller instead? If it happens for one stream partition, it is highly likely that it will happen to all partitions, so might as well reduce noise

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
##########
@@ -560,7 +566,8 @@ private void createNewSegmentZKMetadata(TableConfig tableConfig, PartitionLevelS
   }
 
   @Nullable
-  private SegmentPartitionMetadata getPartitionMetadataFromTableConfig(TableConfig tableConfig, int partitionId) {
+  private SegmentPartitionMetadata getPartitionMetadataFromTableConfig(TableConfig tableConfig, int numPartitions,

Review comment:
       ```suggestion
     private SegmentPartitionMetadata getPartitionMetadataFromTableConfig(TableConfig tableConfig, int numStreamPartitions,
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java
##########
@@ -1204,13 +1207,20 @@ public LLRealtimeSegmentDataManager(RealtimeSegmentZKMetadata segmentZKMetadata,
         String partitionColumn = entry.getKey();
         ColumnPartitionConfig columnPartitionConfig = entry.getValue();
         String partitionFunctionName = columnPartitionConfig.getFunctionName();
+
+        // NOTE: Here we compare the number of partitions from the config and the stream, and log a warning and emit a
+        //       metric when they don't match, but use the one from the stream. The mismatch could happen when the
+        //       stream partitions are changed, but the table config has not been updated to reflect the change. In such
+        //       case, picking the number of partitions from the stream can keep the segment properly partitioned as

Review comment:
       We don't recognize new partitions instantly. Everytime the realtime validation manager runs, it checks if the number of aprtitions have changed, and if so, starts a new consuming partition.
   Let us say at time T1 we checked and the partition number did not change
   At time T1 + 10, the partition numbers changed, but we did not know. The stream divided the records into a different partitioning system, thus having mismatched rows in (most likely) all partitions . At time T1 + 50, we check again, and create the new consuming segment for the new partition we detected.
   In this case, all the segments that have the mismatched rows should be marked as not belonging to any partition.
   
   I am not sure this condition is being handled




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org