You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by "Piotr Nestorow (JIRA)" <ji...@apache.org> on 2017/02/23 11:18:44 UTC

[jira] [Created] (ZEPPELIN-2156) Paragraph with PySpark streaming - running job cannot be canceled

Piotr Nestorow created ZEPPELIN-2156:
----------------------------------------

             Summary: Paragraph with PySpark streaming - running job cannot be canceled
                 Key: ZEPPELIN-2156
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2156
             Project: Zeppelin
          Issue Type: Bug
          Components: pySpark
    Affects Versions: 0.7.0
         Environment: Linux Ubuntu
            Reporter: Piotr Nestorow


In a  'spark.pyspark' paragraph a StreamingContext to a Kafka stream is created. 

The paragraph is started and while the job is running the spark context produces correct output from the code.

The problem is the job cannot be stopped in the Zeppelin web interface.

Installed Kafka version: kafka_2.11-0.8.2.2
Spark Kafka jar: spark-streaming-kafka-0-8_2.11-2.1.0.jar
Zeppelin: zeppelin-0.7.0-bin-all

Tried:
1. Paragraph Cancel ( || button ) has no effect.

2. Zeppelin Job view Stop All has no effect

3. Another paragraph with 
%spark.pyspark
ssc.stop(stopSparkContext=false, stopGracefully=true)

is started by stays in 'Pending'

4. Restarting the 'spark' interpreter stops the job


The example logic:

%spark.pyspark
import sys
import json

from pyspark import SparkContext
from pyspark.streaming import StreamingContext(II())
from pyspark.streaming.kafka import KafkaUtils

zkQuorum, topic, interval = ('localhost:2181', 'airport', 60)

ssc = StreamingContext(sc, interval)

kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})

parsed = kvs.map(lambda (k, v): json.loads(v))
summed = parsed.\
                filter(lambda event: 'kind' in event and event['kind']=='gate').\
                map(lambda event: ('count_all', int(event['value']['passengers']))).\
                reduceByKey(lambda x,y: x + y).\
                map(lambda x: {'sum': x[0], "passengers": x[1]})
summed.pprint()

ssc.start()
ssc.awaitTermination()



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)