You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by bogdanbaraila <bo...@gmail.com> on 2016/09/12 12:31:40 UTC

Spark tasks blockes randomly on standalone cluster

We are having a quite complex application that runs on Spark Standalone.
In some cases the tasks from one of the workers blocks randomly for an
infinite amount of time in the RUNNING state.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27693/SparkStandaloneIssue.png>  


Extra info:
- there aren't any errors in the logs
- ran with logger in debug and i didn't saw any relevant messages (i see
when the tasks starts but then there is not activity for it)
- the jobs are working ok if i have just only 1 worker
- the same job may execute the second time without any issues, in a proper
amount of time
- i don't have any really big partitions that could  cause delays for some
of the tasks.
- in spark 2.0 i've moved from RDD to Datasets and i have the same issue
- in spark 1.4 i was able to overcome the issue by turning on speculation,
but in spark 2.0 the blocking tasks are from different workers (while in 1.4
i have blocking tasks on only 1 worker) so speculation isn't fixing my
issue.
- i have the issue on more environments so i don't think it's hardware
related.

Did anyone experienced something similar? Any suggestions on how could i
identify the issue?

Thanks a lot!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-tasks-blockes-randomly-on-standalone-cluster-tp27693.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark tasks blockes randomly on standalone cluster

Posted by bogdanbaraila <bo...@gmail.com>.
Does anyone has any ideas o what may be happening?

Regards,
Bogdan



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-tasks-blockes-randomly-on-standalone-cluster-tp27693p27769.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark tasks blockes randomly on standalone cluster

Posted by Denis Bolshakov <bo...@gmail.com>.
Hello,

I see such behavior from time to time.

Similar problem is described here:
http://apache-spark-user-list.1001560.n3.nabble.com/Executor-Memory-Task-hangs-td12377.html

We also use speculative as a workaround (our spark version is 1.6.0).

But I would like to share one of observations.
We have two jenkins, one uses java 7 and another java 8.

And sometimes I see the issue during integration testing on jenkins with
java 7 (and never on java 8)

So I really hope that the issue will disappear after we complete our java
migration.

Which java version do you use?

Best regards,
Denis

On 12 September 2016 at 15:31, bogdanbaraila <bo...@gmail.com>
wrote:

> We are having a quite complex application that runs on Spark Standalone.
> In some cases the tasks from one of the workers blocks randomly for an
> infinite amount of time in the RUNNING state.
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27693/
> SparkStandaloneIssue.png>
>
>
> Extra info:
> - there aren't any errors in the logs
> - ran with logger in debug and i didn't saw any relevant messages (i see
> when the tasks starts but then there is not activity for it)
> - the jobs are working ok if i have just only 1 worker
> - the same job may execute the second time without any issues, in a proper
> amount of time
> - i don't have any really big partitions that could  cause delays for some
> of the tasks.
> - in spark 2.0 i've moved from RDD to Datasets and i have the same issue
> - in spark 1.4 i was able to overcome the issue by turning on speculation,
> but in spark 2.0 the blocking tasks are from different workers (while in
> 1.4
> i have blocking tasks on only 1 worker) so speculation isn't fixing my
> issue.
> - i have the issue on more environments so i don't think it's hardware
> related.
>
> Did anyone experienced something similar? Any suggestions on how could i
> identify the issue?
>
> Thanks a lot!
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-tasks-blockes-randomly-on-
> standalone-cluster-tp27693.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
//with Best Regards
--Denis Bolshakov
e-mail: bolshakov.denis@gmail.com