You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Panchappanavar, Naveenakumar Gurushantap (Nokia - IN/Bangalore)" <na...@nokia.com> on 2018/07/06 06:43:25 UTC
zeppelin 0.8.0-rc2 pyspark error
Hi All,
I am running the streaming pyspark programme from pyspark interpreter by using zeppelin-0.8.0-rc2 code .
When pyspark streaming programme is being submitted, it is giving following error message, When we see the driver logs.
ERROR [2018-07-06 06:35:14,026] ({JobScheduler} Logging.scala[logError]:91) - Error generating jobs for time 1530858914000 ms
org.apache.zeppelin.py4j.Py4JException: Command Part is unknown: yro464
and following is the pyspark programme
%spark.pyspark
import time
from pyspark.streaming import StreamingContext
ssc = StreamingContext(sc, 1)
rddQueue = []
for i in range(5):
rddQueue += [ssc.sparkContext.parallelize([j for j in range(1, 1001)], 10)]
print rddQueue
#Create the QueueInputDStream and use it do some processing
inputStream = ssc.queueStream(rddQueue)
mappedStream = inputStream.map(lambda x: (x % 10, 1))
reducedStream = mappedStream.reduceByKey(lambda a, b: a + b)
reducedStream.pprint()
ssc.start()
time.sleep(6)
ssc.stop(stopSparkContext=True, stopGraceFully=True)
any idea what we can do for this.
Regards
Naveen