You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/09/15 22:05:11 UTC

[GitHub] [incubator-pinot] npawar opened a new pull request #6021: List of partitioners in SegmentProcessorFramework

npawar opened a new pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021


   Changing SegmentPrpcessorFramework config to take List of Partitioners. This is to account for table config's partitioning, which might need to be applied regardless of other partitioning configured. For example, we want to partition by date so as to align segments by time, but we also want to apply Murmur on some id column each to further partition and record in segment metadata.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489182763



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##########
@@ -100,8 +110,11 @@ public void map()
       }
 
       // Partitioning
-      // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table config partitioner. Combine both to get final partition.
-      String partition = _partitioner.getPartition(reusableRow);
+      int p = 0;
+      for (Partitioner partitioner : _partitioners) {
+        partitions[p++] = partitioner.getPartition(reusableRow);
+      }
+      String partition = StringUtil.join("_", partitions);

Review comment:
       Use case: say data in input segments is spread across 3 days. In the resulting segments, we want to create a segment for each day. Additionally, we want partitioning on some id column for query purposes.
   
   Partitioning by time column is first step. This doesn't affect segment metadata or broker routing. This is simply used by the framework, and it's scope ends with the framework. It's merely helping create date aligned input files for Segment generation stage.
   Partitioning by id column is second step. This is for queries. This will be whatever is in the table config. Only this partition will get set in the segment metadata. And even that will happen during segment creation.
   See this comment and discussion:https://github.com/apache/incubator-pinot/pull/5934#discussion_r486006754




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489672177



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##########
@@ -100,8 +110,11 @@ public void map()
       }
 
       // Partitioning
-      // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table config partitioner. Combine both to get final partition.
-      String partition = _partitioner.getPartition(reusableRow);
+      int p = 0;
+      for (Partitioner partitioner : _partitioners) {
+        partitions[p++] = partitioner.getPartition(reusableRow);
+      }
+      String partition = StringUtil.join("_", partitions);

Review comment:
       Practically, for the use case I described, it will be 2. But it need not be (there could be more custom logic). Also the json config spec has List of partitions, so I just continued it as List.
   All these things are not set in stone as of now. We will be continuosly re-evaluating, optimizing and editing this framework, as we begin using it (for minion, and merge). It is difficult to predict right now and I prefer to not introduce restrictions on number of partitioners.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489551646



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##########
@@ -100,8 +110,11 @@ public void map()
       }
 
       // Partitioning
-      // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table config partitioner. Combine both to get final partition.
-      String partition = _partitioner.getPartition(reusableRow);
+      int p = 0;
+      for (Partitioner partitioner : _partitioners) {
+        partitions[p++] = partitioner.getPartition(reusableRow);
+      }
+      String partition = StringUtil.join("_", partitions);

Review comment:
       Maybe I am missing something, I will try to find it in the design document.
   1. It seems to me that there may be at most 2 partitioners, so is making that a Pair better? Or, there should be a comment and an assert some place that the size is two.
   2. The word partition confused me into thinking that we are somehow respecting partitioning of data using a partition key with underscores (with brokers constructing some partition keys). That is clearly not the case here. Maybe rename this as a splitKey instead of partition?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489090953



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##########
@@ -100,8 +110,11 @@ public void map()
       }
 
       // Partitioning
-      // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table config partitioner. Combine both to get final partition.
-      String partition = _partitioner.getPartition(reusableRow);
+      int p = 0;
+      for (Partitioner partitioner : _partitioners) {
+        partitions[p++] = partitioner.getPartition(reusableRow);
+      }
+      String partition = StringUtil.join("_", partitions);

Review comment:
       I don't think the framework relies on this to extract the partition info. But agree on introducing a constant for it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489102318



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##########
@@ -100,8 +110,11 @@ public void map()
       }
 
       // Partitioning
-      // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table config partitioner. Combine both to get final partition.
-      String partition = _partitioner.getPartition(reusableRow);
+      int p = 0;
+      for (Partitioner partitioner : _partitioners) {
+        partitions[p++] = partitioner.getPartition(reusableRow);
+      }
+      String partition = StringUtil.join("_", partitions);

Review comment:
       Actually, it is not significant at all. It can be changed, and is not used by any other components. It won't even matter beyond the scope of that joiner line. And hence I don't think it needs to be scoped out of this class, or even out of this method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489089813



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##########
@@ -100,8 +110,11 @@ public void map()
       }
 
       // Partitioning
-      // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table config partitioner. Combine both to get final partition.
-      String partition = _partitioner.getPartition(reusableRow);
+      int p = 0;
+      for (Partitioner partitioner : _partitioners) {
+        partitions[p++] = partitioner.getPartition(reusableRow);
+      }
+      String partition = StringUtil.join("_", partitions);

Review comment:
       The `"_"` here is very significant, right? It cannot be changed, and has to be used the same way across multiple components. Could you please declare it as. a final string in some Constants class as a partition separator or something? 
   
   And then re-use in tests
   
   thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar merged pull request #6021: List of partitioners in SegmentProcessorFramework

Posted by GitBox <gi...@apache.org>.
npawar merged pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489138013



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##########
@@ -100,8 +110,11 @@ public void map()
       }
 
       // Partitioning
-      // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table config partitioner. Combine both to get final partition.
-      String partition = _partitioner.getPartition(reusableRow);
+      int p = 0;
+      for (Partitioner partitioner : _partitioners) {
+        partitions[p++] = partitioner.getPartition(reusableRow);
+      }
+      String partition = StringUtil.join("_", partitions);

Review comment:
       We are partitioning the data, and the brokers have to construct the same partition id in the same order of columns and  with the same partition function right?
   
   Also, what is the use case for partitioning on more than one column?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org