You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Panchappanavar, Naveenakumar Gurushantap (Nokia - IN/Bangalore)" <na...@nokia.com> on 2018/07/06 06:43:25 UTC

zeppelin 0.8.0-rc2 pyspark error

Hi All,

I am running the streaming pyspark programme from pyspark interpreter by using zeppelin-0.8.0-rc2 code .

When pyspark streaming programme is being submitted, it is giving following error message, When we see the driver logs.

ERROR [2018-07-06 06:35:14,026] ({JobScheduler} Logging.scala[logError]:91) - Error generating jobs for time 1530858914000 ms
org.apache.zeppelin.py4j.Py4JException: Command Part is unknown: yro464

and following is the pyspark programme

%spark.pyspark
import time
from pyspark.streaming import StreamingContext
ssc = StreamingContext(sc, 1)
rddQueue = []
for i in range(5):
    rddQueue += [ssc.sparkContext.parallelize([j for j in range(1, 1001)], 10)]
    print rddQueue
              #Create the QueueInputDStream and use it do some processing
inputStream = ssc.queueStream(rddQueue)
mappedStream = inputStream.map(lambda x: (x % 10, 1))
reducedStream = mappedStream.reduceByKey(lambda a, b: a + b)
reducedStream.pprint()
ssc.start()
time.sleep(6)
ssc.stop(stopSparkContext=True, stopGraceFully=True)

any idea what we can do for this.

Regards
Naveen