You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/10/18 14:38:48 UTC

[GitHub] [flink] becketqin commented on a change in pull request #13602: [FLINK-19346][coordination] Generate and put ClusterPartitionDescriptor of ClusterPartition in JobResult when job finishes

becketqin commented on a change in pull request #13602:
URL: https://github.com/apache/flink/pull/13602#discussion_r507160527



##########
File path: flink-core/src/main/java/org/apache/flink/api/common/ClusterPartitionDescriptor.java
##########
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common;
+
+import java.io.Serializable;
+
+public interface ClusterPartitionDescriptor extends Serializable {

Review comment:
       Why do we put this class in the public API package? The users should not be aware of the cluster partitions, right?

##########
File path: flink-core/src/main/java/org/apache/flink/api/common/PersistedIntermediateResultDescriptor.java
##########
@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common;
+
+import org.apache.flink.util.AbstractID;
+
+public interface PersistedIntermediateResultDescriptor {

Review comment:
       The same question as `ClusterPartitionDescriptor`, is there any reason to put this interface in the public API package?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobResult.java
##########
@@ -67,12 +69,16 @@
 	@Nullable
 	private final SerializedThrowable serializedThrowable;
 
+	private Map<IntermediateDataSetID, PersistedIntermediateResultDescriptor> persistedIntermediateResult;
+
 	private JobResult(
-			final JobID jobId,
-			final ApplicationStatus applicationStatus,
-			final Map<String, SerializedValue<OptionalFailure<Object>>> accumulatorResults,
-			final long netRuntime,
-			@Nullable final SerializedThrowable serializedThrowable) {
+		final JobID jobId,
+		final ApplicationStatus applicationStatus,
+		final Map<String, SerializedValue<OptionalFailure<Object>>> accumulatorResults,
+		final long netRuntime,
+		@Nullable final SerializedThrowable serializedThrowable,
+		Map<IntermediateDataSetID, PersistedIntermediateResultDescriptor> persistedIntermediateResult) {

Review comment:
       `@Nullable`?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
##########
@@ -1746,4 +1748,63 @@ PartitionReleaseStrategy getPartitionReleaseStrategy() {
 	ExecutionDeploymentListener getExecutionDeploymentListener() {
 		return executionDeploymentListener;
 	}
+
+	/**
+	 * Returns the mapping from intermediate result id to the ClusterPartitionDescriptor of its partitions
+	 *
+	 * @return The mapping from intermediate result id to the ClusterPartitionDescriptor of its partitions
+	 * 		   null, if the job has not yet finished
+	 */
+	public Map<IntermediateDataSetID, PersistedIntermediateResultDescriptor> getPersistedIntermediateResult() {
+		if (!JobStatus.FINISHED.equals(state)) {
+			return null;
+		}
+
+		final List<IntermediateDataSetID> persistedIntermediateDataSetIds = getPersistedIntermediateDataSetID();
+
+		Map<IntermediateDataSetID, PersistedIntermediateResultDescriptor> persistedIntermediateResults =
+			new HashMap<>(persistedIntermediateDataSetIds.size());
+
+		for (IntermediateDataSetID intermediateDataSetID : persistedIntermediateDataSetIds) {
+			PersistedIntermediateResultDescriptor intermediateResultDescriptor =
+				getPersistedIntermediateResultDescriptor(intermediateDataSetID);
+			persistedIntermediateResults.put(intermediateDataSetID, intermediateResultDescriptor);
+		}
+
+		return persistedIntermediateResults;
+	}
+
+	private PersistedIntermediateResultDescriptor getPersistedIntermediateResultDescriptor(
+		IntermediateDataSetID intermediateDataSetID) {
+		final IntermediateResult intermediateResult = intermediateResults.get(intermediateDataSetID);
+		final IntermediateResultPartition[] partitions = intermediateResult.getPartitions();

Review comment:
       Not all the entries in the `intermediateResult` map are persisted, right?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ArchivedExecutionGraph.java
##########
@@ -100,22 +102,25 @@
 	@Nullable
 	private final String stateBackendName;
 
+	private Map<IntermediateDataSetID, PersistedIntermediateResultDescriptor> persistedIntermediateResult;
+
 	public ArchivedExecutionGraph(
-			JobID jobID,
-			String jobName,
-			Map<JobVertexID, ArchivedExecutionJobVertex> tasks,
-			List<ArchivedExecutionJobVertex> verticesInCreationOrder,
-			long[] stateTimestamps,
-			JobStatus state,
-			@Nullable ErrorInfo failureCause,
-			String jsonPlan,
-			StringifiedAccumulatorResult[] archivedUserAccumulators,
-			Map<String, SerializedValue<OptionalFailure<Object>>> serializedUserAccumulators,
-			ArchivedExecutionConfig executionConfig,
-			boolean isStoppable,
-			@Nullable CheckpointCoordinatorConfiguration jobCheckpointingConfiguration,
-			@Nullable CheckpointStatsSnapshot checkpointStatsSnapshot,
-			@Nullable String stateBackendName) {
+		JobID jobID,
+		String jobName,
+		Map<JobVertexID, ArchivedExecutionJobVertex> tasks,
+		List<ArchivedExecutionJobVertex> verticesInCreationOrder,
+		long[] stateTimestamps,
+		JobStatus state,
+		@Nullable ErrorInfo failureCause,
+		String jsonPlan,
+		StringifiedAccumulatorResult[] archivedUserAccumulators,
+		Map<String, SerializedValue<OptionalFailure<Object>>> serializedUserAccumulators,
+		ArchivedExecutionConfig executionConfig,
+		boolean isStoppable,
+		@Nullable CheckpointCoordinatorConfiguration jobCheckpointingConfiguration,
+		@Nullable CheckpointStatsSnapshot checkpointStatsSnapshot,
+		@Nullable String stateBackendName, Map<IntermediateDataSetID, PersistedIntermediateResultDescriptor>

Review comment:
       The argument should probably be put in a separate line with a `@Nullable` annotation as well.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
##########
@@ -1746,4 +1748,63 @@ PartitionReleaseStrategy getPartitionReleaseStrategy() {
 	ExecutionDeploymentListener getExecutionDeploymentListener() {
 		return executionDeploymentListener;
 	}
+
+	/**
+	 * Returns the mapping from intermediate result id to the ClusterPartitionDescriptor of its partitions
+	 *
+	 * @return The mapping from intermediate result id to the ClusterPartitionDescriptor of its partitions
+	 * 		   null, if the job has not yet finished
+	 */
+	public Map<IntermediateDataSetID, PersistedIntermediateResultDescriptor> getPersistedIntermediateResult() {
+		if (!JobStatus.FINISHED.equals(state)) {
+			return null;
+		}
+
+		final List<IntermediateDataSetID> persistedIntermediateDataSetIds = getPersistedIntermediateDataSetID();

Review comment:
       The logic here seems a little over complicated. Would iterating over the entrySet of the IntermediateResult and  handle the Id and partitions array in another method simpler?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ClusterPartitionDescriptorImpl.java
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.api.common.ClusterPartitionDescriptor;
+import org.apache.flink.runtime.shuffle.ShuffleDescriptor;
+
+public class ClusterPartitionDescriptorImpl implements ClusterPartitionDescriptor {
+    private ShuffleDescriptor shuffleDescriptor;
+	private final int numberOfSubpartitions;
+
+    public ClusterPartitionDescriptorImpl(ShuffleDescriptor shuffleDescriptor,
+										  int numberOfSubpartitions) {
+        this.shuffleDescriptor = shuffleDescriptor;
+		this.numberOfSubpartitions = numberOfSubpartitions;
+	}
+
+    public ShuffleDescriptor getShuffleDescriptor() {
+        return shuffleDescriptor;
+    }
+
+    public void setShuffleDescriptor(ShuffleDescriptor shuffleDescriptor) {

Review comment:
       Does this descriptor have to be mutable? In general we prefer the classes to be immutable if possible.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/PersistedIntermediateResultDescriptorImpl.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.api.common.PersistedIntermediateResultDescriptor;
+import org.apache.flink.runtime.io.network.partition.ResultPartitionType;
+import org.apache.flink.runtime.jobgraph.IntermediateDataSetID;
+
+import java.util.ArrayList;
+import java.util.Collection;
+
+public class PersistedIntermediateResultDescriptorImpl implements PersistedIntermediateResultDescriptor {
+	private Collection<ClusterPartitionDescriptorImpl> clusterPartitionDescriptors = new ArrayList<>();
+	private final IntermediateDataSetID intermediateDataSetID;
+	private final ResultPartitionType resultPartitionType;
+
+	public PersistedIntermediateResultDescriptorImpl(IntermediateDataSetID intermediateDataSetID,
+													 ResultPartitionType resultPartitionType) {
+		this.intermediateDataSetID = intermediateDataSetID;
+		this.resultPartitionType = resultPartitionType;
+	}
+
+	public Collection<ClusterPartitionDescriptorImpl> getClusterPartitionDescriptors() {
+		return clusterPartitionDescriptors;
+	}
+
+	public ResultPartitionType getResultPartitionType() {
+		return resultPartitionType;
+	}
+
+	@Override
+	public IntermediateDataSetID getIntermediateDataSetId() {
+		return intermediateDataSetID;
+	}
+
+	public void addClusterPartitionDescriptor(ClusterPartitionDescriptorImpl clusterPartitionDescriptor) {

Review comment:
       If possible, usually we would like to avoid such method which makes the class mutable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org