You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Srikanth <sr...@gmail.com> on 2017/02/07 21:34:29 UTC

Exception in spark streaming + kafka direct app

Hello,

I had a spark streaming app that reads from kafka running for a few hours
after which it failed with error

*17/02/07 20:04:10 ERROR JobScheduler: Error generating jobs for time
1486497850000 ms
java.lang.IllegalStateException: No current assignment for partition mt_event-5
	at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:251)
	at org.apache.kafka.clients.consumer.internals.SubscriptionState.needOffsetReset(SubscriptionState.java:315)
	at org.apache.kafka.clients.consumer.KafkaConsumer.seekToEnd(KafkaConsumer.java:1170)
	at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:179)
	at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:196)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)*

....
....

17/02/07 20:04:10 ERROR ApplicationMaster: User class threw exception:
java.lang.IllegalStateException: No current assignment for partition
mt_event-5
java.lang.IllegalStateException: No current assignment for partition mt_event-5

....
....

17/02/07 20:05:00 WARN JobGenerator: Timed out while stopping the job
generator (timeout = 50000)


Driver did not recover from this error and failed. The previous batch
ran 5sec back. There are no indications in the logs that some
rebalance happened.
As per kafka admin, kafka cluster health was good when this happened
and no maintenance was being done.

Any idea what could have gone wrong and why this is a fatal error?

Regards,
Srikanth

Re: Exception in spark streaming + kafka direct app

Posted by Srikanth <sr...@gmail.com>.
This is running in YARN cluster mode. It was restarted automatically and
continued fine.
I was trying to see what went wrong. AFAIK there were no task failure.
Nothing in executor logs. The log I gave is in driver.

After some digging, I did see that there was a rebalance in kafka logs
around this time. So will driver fail and exit in such cases?
I've seen drivers exit after a job has hit max retry attempts. This is
different though rt?

Srikanth


On Tue, Feb 7, 2017 at 5:25 PM, Tathagata Das <ta...@gmail.com>
wrote:

> Does restarting after a few minutes solves the problem? Could be a
> transient issue that lasts long enough for spark task-level retries to all
> fail.
>
> On Tue, Feb 7, 2017 at 4:34 PM, Srikanth <sr...@gmail.com> wrote:
>
>> Hello,
>>
>> I had a spark streaming app that reads from kafka running for a few hours
>> after which it failed with error
>>
>> *17/02/07 20:04:10 ERROR JobScheduler: Error generating jobs for time 1486497850000 ms
>> java.lang.IllegalStateException: No current assignment for partition mt_event-5
>> 	at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:251)
>> 	at org.apache.kafka.clients.consumer.internals.SubscriptionState.needOffsetReset(SubscriptionState.java:315)
>> 	at org.apache.kafka.clients.consumer.KafkaConsumer.seekToEnd(KafkaConsumer.java:1170)
>> 	at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:179)
>> 	at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:196)
>> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)*
>>
>> ....
>> ....
>>
>> 17/02/07 20:04:10 ERROR ApplicationMaster: User class threw exception: java.lang.IllegalStateException: No current assignment for partition mt_event-5
>> java.lang.IllegalStateException: No current assignment for partition mt_event-5
>>
>> ....
>> ....
>>
>> 17/02/07 20:05:00 WARN JobGenerator: Timed out while stopping the job generator (timeout = 50000)
>>
>>
>> Driver did not recover from this error and failed. The previous batch ran 5sec back. There are no indications in the logs that some rebalance happened.
>> As per kafka admin, kafka cluster health was good when this happened and no maintenance was being done.
>>
>> Any idea what could have gone wrong and why this is a fatal error?
>>
>> Regards,
>> Srikanth
>>
>>
>

Re: Exception in spark streaming + kafka direct app

Posted by Tathagata Das <ta...@gmail.com>.
Does restarting after a few minutes solves the problem? Could be a
transient issue that lasts long enough for spark task-level retries to all
fail.

On Tue, Feb 7, 2017 at 4:34 PM, Srikanth <sr...@gmail.com> wrote:

> Hello,
>
> I had a spark streaming app that reads from kafka running for a few hours
> after which it failed with error
>
> *17/02/07 20:04:10 ERROR JobScheduler: Error generating jobs for time 1486497850000 ms
> java.lang.IllegalStateException: No current assignment for partition mt_event-5
> 	at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:251)
> 	at org.apache.kafka.clients.consumer.internals.SubscriptionState.needOffsetReset(SubscriptionState.java:315)
> 	at org.apache.kafka.clients.consumer.KafkaConsumer.seekToEnd(KafkaConsumer.java:1170)
> 	at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:179)
> 	at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:196)
> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)*
>
> ....
> ....
>
> 17/02/07 20:04:10 ERROR ApplicationMaster: User class threw exception: java.lang.IllegalStateException: No current assignment for partition mt_event-5
> java.lang.IllegalStateException: No current assignment for partition mt_event-5
>
> ....
> ....
>
> 17/02/07 20:05:00 WARN JobGenerator: Timed out while stopping the job generator (timeout = 50000)
>
>
> Driver did not recover from this error and failed. The previous batch ran 5sec back. There are no indications in the logs that some rebalance happened.
> As per kafka admin, kafka cluster health was good when this happened and no maintenance was being done.
>
> Any idea what could have gone wrong and why this is a fatal error?
>
> Regards,
> Srikanth
>
>