You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/04/20 12:04:25 UTC
[jira] [Commented] (SPARK-14737) Kafka Brokers are down - spark stream should retry

    [ https://issues.apache.org/jira/browse/SPARK-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249586#comment-15249586 ] 

Sean Owen commented on SPARK-14737:
-----------------------------------

If all brokers are down that's a fatal error. It's correct (IMHO) to fail. Your application however could recreate a stream.

> Kafka Brokers are down - spark stream should retry
> --------------------------------------------------
>
>                 Key: SPARK-14737
>                 URL: https://issues.apache.org/jira/browse/SPARK-14737
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.3.0
>         Environment: Suse Linux, Cloudera Enterprise 5.4.8 (#7 built by jenkins on 20151023-1205 git: d7dbdf29ac1d57ae9fb19958502d50dcf4e4fffd), kafka_2.10-0.8.2.2
>            Reporter: Faisal
>
> I have spark streaming application that uses direct streaming - listening to KAFKA topic.
> {code}
> HashMap<String, String> kafkaParams = new HashMap<String, String>();
>     kafkaParams.put("metadata.broker.list", "broker1,broker2,broker3");
>     kafkaParams.put("auto.offset.reset", "largest");
>     HashSet<String> topicsSet = new HashSet<String>();
>     topicsSet.add("Topic1");
>     JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
>             jssc, 
>             String.class, 
>             String.class,
>             StringDecoder.class, 
>             StringDecoder.class, 
>             kafkaParams, 
>             topicsSet
>     );
> {code}
> I notice when i stop/shutdown kafka brokers, my spark application also shutdown.
> Here is the spark execution script
> {code}
> spark-submit \
> --master yarn-cluster \
> --files /home/siddiquf/spark/log4j-spark.xml
> --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \
> --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \
> --class com.example.MyDataStreamProcessor \
> myapp.jar 
> {code}
> Spark job submitted successfully and i can track the application driver and worker/executor nodes.
> Everything works fine but only concern if kafka borkers are offline or restarted my application controlled by yarn should not shutdown? but it does.
> If this is expected behavior then how to handle such situation with least maintenance? Keeping in mind Kafka cluster is not in hadoop cluster and managed by different team that is why requires our application to be resilient enough.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org