You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cody Koeninger (JIRA)" <ji...@apache.org> on 2016/10/12 23:35:20 UTC

[jira] [Closed] (SPARK-15408) Spark streaming app crashes with NotLeaderForPartitionException

     [ https://issues.apache.org/jira/browse/SPARK-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cody Koeninger closed SPARK-15408.
----------------------------------
    Resolution: Cannot Reproduce

> Spark streaming app crashes with NotLeaderForPartitionException 
> ----------------------------------------------------------------
>
>                 Key: SPARK-15408
>                 URL: https://issues.apache.org/jira/browse/SPARK-15408
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.6.0
>         Environment: Ubuntu 64 bit
>            Reporter: Johny Mathew
>            Priority: Critical
>
> We have a spark streaming application reading from kafka (with Kafka Direct API) and it crashed with the exception shown in the next paragraph. We have a 5 node kafka cluster with 19 partitions  (replication factor 3). Even though the the spark application crashed the other kafka consumer apps were running fine. Only one of the 5 kafka node was not working correctly (it did not go down)
> /opt/hadoop/bin/yarn application -status application_1463151451543_0007
> 16/05/13 20:09:56 INFO client.RMProxy: Connecting to ResourceManager at /172.16.130.189:8050
> Application Report :
> 	Application-Id : application_1463151451543_0007
> 	Application-Name : com.ibm.alchemy.eventgen.EventGenMetrics
> 	Application-Type : SPARK
> 	User : stack
> 	Queue : default
> 	Start-Time : 1463155034571
> 	Finish-Time : 1463155310520
> 	Progress : 100%
> 	State : FINISHED
> 	Final-State : FAILED
> 	Tracking-URL : N/A
> 	RPC Port : 0
> 	AM Host : 172.16.130.188
> 	Aggregate Resource Allocation : 9562329 MB-seconds, 2393 vcore-seconds
> 	Diagnostics : User class threw exception: org.apache.spark.SparkException: ArrayBuffer(kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([alchemy-metrics,17], [alchemy-metrics,10], [alchemy-metrics,3], [alchemy-metrics,4], [alchemy-metrics,9], [alchemy-metrics,15], [alchemy-metrics,18], [alchemy-metrics,5]))
> We cleared checkpoint and started the application but it crashed again. Then at the end we found out the misbehaving kafka node and restarted it which fixed the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org