You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by vr spark <vr...@gmail.com> on 2016/07/26 18:51:48 UTC
read only specific jsons
i am reading data from kafka using spark streaming.
I am reading json and creating dataframe.
kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams)
lines = kvs.map(lambda x: x[1])
lines.foreachRDD(mReport)
def mReport(clickRDD):
clickDF = sqlContext.jsonRDD(clickRDD)
clickDF.registerTempTable("clickstream")
PagesDF = sqlContext.sql(
"SELECT request.clientIP as ip "
"FROM clickstream "
"WHERE request.clientIP is not null "
" limit 2000 "
The problem is that not all the jsons from the stream have the same format.
It works when it reads a json which has ip.
Some of the json strings do not have client ip in their schema.
So i am getting error and my job is failing when it encounters such a json.
How do read only those json which has ip in their schema?
Please suggest.