You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/04/30 18:50:58 UTC

[GitHub] [flink] rkhachatryan commented on a change in pull request #11952: [FLINK-16638][runtime][checkpointing] Flink checkStateMappingCompleteness doesn't include UserDefinedOperatorIDs

rkhachatryan commented on a change in pull request #11952:
URL: https://github.com/apache/flink/pull/11952#discussion_r418193478



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PendingCheckpoint.java
##########
@@ -371,7 +371,7 @@ public TaskAcknowledgeResult acknowledgeTask(
 				acknowledgedTasks.add(executionAttemptId);
 			}
 
-			List<OperatorID> operatorIDs = vertex.getJobVertex().getOperatorIDs();
+			List<OperatorID> operatorIDs = vertex.getJobVertex().getOperatorIdPairList().getOperatorIds();

Review comment:
       `vertex.getOperatorIdPairList().getOperatorIds` sounds a bit repetitive to me.
   
   How about `vertex.getOperatorIDs().getGeneratedIDs()`?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionJobVertex.java
##########
@@ -289,21 +270,12 @@ public ExecutionJobVertex(
 	}
 
 	/**
-	 * Returns a list containing the IDs of all operators contained in this execution job vertex.
+	 * Returns a list containing the ID pairs of all operators contained in this execution job vertex.
 	 *
-	 * @return list containing the IDs of all contained operators
+	 * @return list containing the ID pairs of all contained operators
 	 */
-	public List<OperatorID> getOperatorIDs() {
-		return operatorIDs;
-	}
-
-	/**
-	 * Returns a list containing the alternative IDs of all operators contained in this execution job vertex.
-	 *
-	 * @return list containing alternative the IDs of all contained operators
-	 */
-	public List<OperatorID> getUserDefinedOperatorIDs() {
-		return userDefinedOperatorIds;
+	public OperatorIdPairList getOperatorIdPairList() {

Review comment:
       Isn't name `getOperatorIdPairs` enough (instead of `getOperatorIdPairList`)?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/JobVertex.java
##########
@@ -143,27 +145,27 @@ public JobVertex(String name) {
 	public JobVertex(String name, JobVertexID id) {
 		this.name = name == null ? DEFAULT_NAME : name;
 		this.id = id == null ? new JobVertexID() : id;
-		// the id lists must have the same size
-		this.operatorIDs.add(OperatorID.fromJobVertexID(this.id));
-		this.operatorIdsAlternatives.add(null);
+		OperatorIdPair operatorIdPair = new OperatorIdPair(OperatorID.fromJobVertexID(this.id), null);
+		this.operatorIdPairList = new OperatorIdPairList(Collections.singletonList(operatorIdPair));
 	}
 
 	/**
 	 * Constructs a new job vertex and assigns it with the given name.
 	 *
 	 * @param name The name of the new job vertex.
 	 * @param primaryId The id of the job vertex.
-	 * @param alternativeIds The alternative ids of the job vertex.
 	 * @param operatorIds The ids of all operators contained in this job vertex.
 	 * @param alternativeOperatorIds The alternative ids of all operators contained in this job vertex-
 	 */
-	public JobVertex(String name, JobVertexID primaryId, List<JobVertexID> alternativeIds, List<OperatorID> operatorIds, List<OperatorID> alternativeOperatorIds) {
+	public JobVertex(String name, JobVertexID primaryId, List<OperatorID> operatorIds, List<OperatorID> alternativeOperatorIds) {
 		Preconditions.checkArgument(operatorIds.size() == alternativeOperatorIds.size());
 		this.name = name == null ? DEFAULT_NAME : name;
 		this.id = primaryId == null ? new JobVertexID() : primaryId;
-		this.idAlternatives.addAll(alternativeIds);
-		this.operatorIDs.addAll(operatorIds);
-		this.operatorIdsAlternatives.addAll(alternativeOperatorIds);
+		List<OperatorIdPair> operatorIdPairs = new ArrayList<>();
+		for (int i = 0; i < operatorIds.size(); i++) {
+			operatorIdPairs.add(new OperatorIdPair(operatorIds.get(i), alternativeOperatorIds.get(i)));
+		}
+		this.operatorIdPairList = new OperatorIdPairList(operatorIdPairs);

Review comment:
       Ideally, this should be done by the caller (i.e. `StreamingJobGraphGenerator.createJobVertex`) so the constructor receives a list of pairs.
   Also, lists here don't guarantee constant time `get(int)` and it's good to check that both lists have the same size.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/OperatorIdPairList.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobgraph;
+
+import org.apache.flink.runtime.OperatorIdPair;
+
+import java.io.Serializable;
+import java.util.AbstractList;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Convenience class to encapsulate the operator ID pairs of a job vertex. It is convenient because it hides
+ * away the creation of a new list for only operator IDs or user defined operator IDs.
+ * It also hides the iteration over operator ID pairs.
+ */
+public class OperatorIdPairList extends AbstractList<OperatorIdPair> implements Serializable {

Review comment:
       I'm not sure if we need this class:
   - `getUserDefinedOperatorIds` is never used
   - constructor can be inlined
   - instead of `getOperatorIds()` clients could iterate over pairs with minimal changes, e.g. in `StateAssignmentOperation`:
   ```
   List<OperatorIdPair> pairs = executionJobVertex.getOperatorIdPairList().getOperatorIds();
   
   int expectedNumberOfSubTasks = newParallelism * pairs.size();
   
   for (OperatorIdPair pair : operatorIDs) {
     OperatorInstanceID instanceID = OperatorInstanceID.of(subTaskIndex, pair.getOperatorId());
   ```

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorTestingUtils.java
##########
@@ -357,8 +359,11 @@ static ExecutionJobVertex mockExecutionJobVertex(
 		when(executionJobVertex.getParallelism()).thenReturn(parallelism);
 		when(executionJobVertex.getMaxParallelism()).thenReturn(maxParallelism);
 		when(executionJobVertex.isMaxParallelismConfigured()).thenReturn(true);
-		when(executionJobVertex.getOperatorIDs()).thenReturn(jobVertexIDs);
-		when(executionJobVertex.getUserDefinedOperatorIDs()).thenReturn(Arrays.asList(new OperatorID[jobVertexIDs.size()]));
+		List<OperatorIdPair> operatorIdPairs = new ArrayList<>();
+		for (OperatorID operatorID : jobVertexIDs) {
+			operatorIdPairs.add(new OperatorIdPair(operatorID, null));

Review comment:
       What do you think about a factory method to create `OperatorIdPair` from only generated ID?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org