You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Faisal (JIRA)" <ji...@apache.org> on 2016/04/26 23:31:12 UTC

[jira] [Comment Edited] (SPARK-14737) Kafka Brokers are down - spark stream should retry

    [ https://issues.apache.org/jira/browse/SPARK-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258956#comment-15258956 ] 

Faisal edited comment on SPARK-14737 at 4/26/16 9:30 PM:
---------------------------------------------------------

Should we expect the same behavior if only 1 node of broker is down? like if we have kakfa cluster of 5 nodes, topic created with 8 partitions, and only 1 broker goes down, then should spark continue running or stops? 
we observe that spark executors shut down with following messages in log and so driver.

{quote}
2016-04-26 16:12:58 INFO  RemoteActorRefProvider$RemotingTerminator:74 - Shutting down remote daemon.
2016-04-26 16:12:58 INFO  RemoteActorRefProvider$RemotingTerminator:74 - Remote daemon shut down; proceeding with flushing remote transports.
{quote}


was (Author: faisal.siddiqui):
Should we expect the same behavior if only 1 node of broker is down? like if we have kakfa cluster of 5 nodes and only 1 went down, then should spark continue running or stops? 
we observe that spark executors shut down with following messages in log and so driver.
{quote}
2016-04-26 16:12:58 INFO  RemoteActorRefProvider$RemotingTerminator:74 - Shutting down remote daemon.
2016-04-26 16:12:58 INFO  RemoteActorRefProvider$RemotingTerminator:74 - Remote daemon shut down; proceeding with flushing remote transports.
{quote}

> Kafka Brokers are down - spark stream should retry
> --------------------------------------------------
>
>                 Key: SPARK-14737
>                 URL: https://issues.apache.org/jira/browse/SPARK-14737
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.3.0
>         Environment: Suse Linux, Cloudera Enterprise 5.4.8 (#7 built by jenkins on 20151023-1205 git: d7dbdf29ac1d57ae9fb19958502d50dcf4e4fffd), kafka_2.10-0.8.2.2
>            Reporter: Faisal
>
> I have spark streaming application that uses direct streaming - listening to KAFKA topic.
> {code}
> HashMap<String, String> kafkaParams = new HashMap<String, String>();
>     kafkaParams.put("metadata.broker.list", "broker1,broker2,broker3");
>     kafkaParams.put("auto.offset.reset", "largest");
>     HashSet<String> topicsSet = new HashSet<String>();
>     topicsSet.add("Topic1");
>     JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
>             jssc, 
>             String.class, 
>             String.class,
>             StringDecoder.class, 
>             StringDecoder.class, 
>             kafkaParams, 
>             topicsSet
>     );
> {code}
> I notice when i stop/shutdown kafka brokers, my spark application also shutdown.
> Here is the spark execution script
> {code}
> spark-submit \
> --master yarn-cluster \
> --files /home/siddiquf/spark/log4j-spark.xml
> --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \
> --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \
> --class com.example.MyDataStreamProcessor \
> myapp.jar 
> {code}
> Spark job submitted successfully and i can track the application driver and worker/executor nodes.
> Everything works fine but only concern if kafka borkers are offline or restarted my application controlled by yarn should not shutdown? but it does.
> If this is expected behavior then how to handle such situation with least maintenance? Keeping in mind Kafka cluster is not in hadoop cluster and managed by different team that is why requires our application to be resilient enough.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org