You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by bogdanbaraila <bo...@gmail.com> on 2016/09/12 12:31:40 UTC
Spark tasks blockes randomly on standalone cluster
We are having a quite complex application that runs on Spark Standalone.
In some cases the tasks from one of the workers blocks randomly for an
infinite amount of time in the RUNNING state.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27693/SparkStandaloneIssue.png>
Extra info:
- there aren't any errors in the logs
- ran with logger in debug and i didn't saw any relevant messages (i see
when the tasks starts but then there is not activity for it)
- the jobs are working ok if i have just only 1 worker
- the same job may execute the second time without any issues, in a proper
amount of time
- i don't have any really big partitions that could cause delays for some
of the tasks.
- in spark 2.0 i've moved from RDD to Datasets and i have the same issue
- in spark 1.4 i was able to overcome the issue by turning on speculation,
but in spark 2.0 the blocking tasks are from different workers (while in 1.4
i have blocking tasks on only 1 worker) so speculation isn't fixing my
issue.
- i have the issue on more environments so i don't think it's hardware
related.
Did anyone experienced something similar? Any suggestions on how could i
identify the issue?
Thanks a lot!
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-tasks-blockes-randomly-on-standalone-cluster-tp27693.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Spark tasks blockes randomly on standalone cluster
Posted by bogdanbaraila <bo...@gmail.com>.
Does anyone has any ideas o what may be happening?
Regards,
Bogdan
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-tasks-blockes-randomly-on-standalone-cluster-tp27693p27769.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Spark tasks blockes randomly on standalone cluster
Posted by Denis Bolshakov <bo...@gmail.com>.
Hello,
I see such behavior from time to time.
Similar problem is described here:
http://apache-spark-user-list.1001560.n3.nabble.com/Executor-Memory-Task-hangs-td12377.html
We also use speculative as a workaround (our spark version is 1.6.0).
But I would like to share one of observations.
We have two jenkins, one uses java 7 and another java 8.
And sometimes I see the issue during integration testing on jenkins with
java 7 (and never on java 8)
So I really hope that the issue will disappear after we complete our java
migration.
Which java version do you use?
Best regards,
Denis
On 12 September 2016 at 15:31, bogdanbaraila <bo...@gmail.com>
wrote:
> We are having a quite complex application that runs on Spark Standalone.
> In some cases the tasks from one of the workers blocks randomly for an
> infinite amount of time in the RUNNING state.
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27693/
> SparkStandaloneIssue.png>
>
>
> Extra info:
> - there aren't any errors in the logs
> - ran with logger in debug and i didn't saw any relevant messages (i see
> when the tasks starts but then there is not activity for it)
> - the jobs are working ok if i have just only 1 worker
> - the same job may execute the second time without any issues, in a proper
> amount of time
> - i don't have any really big partitions that could cause delays for some
> of the tasks.
> - in spark 2.0 i've moved from RDD to Datasets and i have the same issue
> - in spark 1.4 i was able to overcome the issue by turning on speculation,
> but in spark 2.0 the blocking tasks are from different workers (while in
> 1.4
> i have blocking tasks on only 1 worker) so speculation isn't fixing my
> issue.
> - i have the issue on more environments so i don't think it's hardware
> related.
>
> Did anyone experienced something similar? Any suggestions on how could i
> identify the issue?
>
> Thanks a lot!
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-tasks-blockes-randomly-on-
> standalone-cluster-tp27693.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
--
//with Best Regards
--Denis Bolshakov
e-mail: bolshakov.denis@gmail.com