You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "mingleizhang (JIRA)" <ji...@apache.org> on 2017/06/13 03:08:00 UTC

[jira] [Comment Edited] (FLINK-6682) Improve error message in case parallelism exceeds maxParallelism

    [ https://issues.apache.org/jira/browse/FLINK-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047364#comment-16047364 ] 

mingleizhang edited comment on FLINK-6682 at 6/13/17 3:07 AM:
--------------------------------------------------------------

Hi,[~Zentol] . I put the check conditions at {{assignAttemptState}} method which under {{StateAssignmentOperation}}. What do you think ?  Not very sure about task info I wrote. It would be great helpful if you can check this.
{code}
	private void assignAttemptState(ExecutionJobVertex executionJobVertex, List<OperatorState> operatorStates) {

		List<OperatorID> operatorIDs = executionJobVertex.getOperatorIDs();

		//1. first compute the new parallelism
		checkParallelismPreconditions(operatorStates, executionJobVertex);

		int newParallelism = executionJobVertex.getParallelism();

		if (executionJobVertex.getMaxParallelism() < newParallelism) {
			LOG.error("Task info: {}, maxParallelism number: {}, and parallelism number: {}",
				this.tasks, executionJobVertex.getMaxParallelism(), newParallelism);
		}

		List<KeyGroupRange> keyGroupPartitions = createKeyGroupPartitions(
			executionJobVertex.getMaxParallelism(),
			newParallelism);
{code}


was (Author: mingleizhang):
Hi,[~Zentol] . I put the check conditions at {{assignAttemptState}} method. What do you think ?  Not very sure about task info I wrote. It would be great helpful if you can check this.
{code}
	private void assignAttemptState(ExecutionJobVertex executionJobVertex, List<OperatorState> operatorStates) {

		List<OperatorID> operatorIDs = executionJobVertex.getOperatorIDs();

		//1. first compute the new parallelism
		checkParallelismPreconditions(operatorStates, executionJobVertex);

		int newParallelism = executionJobVertex.getParallelism();

		if (executionJobVertex.getMaxParallelism() < newParallelism) {
			LOG.error("Task info: {}, maxParallelism number: {}, and parallelism number: {}",
				this.tasks, executionJobVertex.getMaxParallelism(), newParallelism);
		}

		List<KeyGroupRange> keyGroupPartitions = createKeyGroupPartitions(
			executionJobVertex.getMaxParallelism(),
			newParallelism);
{code}

> Improve error message in case parallelism exceeds maxParallelism
> ----------------------------------------------------------------
>
>                 Key: FLINK-6682
>                 URL: https://issues.apache.org/jira/browse/FLINK-6682
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.0, 1.4.0
>            Reporter: Chesnay Schepler
>
> When restoring a job with a parallelism that exceeds the maxParallelism we're not providing a useful error message, as all you get is an IllegalArgumentException:
> {code}
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed
>         at org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:343)
>         at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:396)
>         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:467)
>         ... 22 more
> Caused by: java.lang.IllegalArgumentException
>         at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:123)
>         at org.apache.flink.runtime.checkpoint.StateAssignmentOperation.createKeyGroupPartitions(StateAssignmentOperation.java:449)
>         at org.apache.flink.runtime.checkpoint.StateAssignmentOperation.assignAttemptState(StateAssignmentOperation.java:117)
>         at org.apache.flink.runtime.checkpoint.StateAssignmentOperation.assignStates(StateAssignmentOperation.java:102)
>         at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedState(CheckpointCoordinator.java:1038)
>         at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1101)
>         at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1386)
>         at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
>         at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372)
>         at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>         at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>         at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)