You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2019/06/18 21:41:41 UTC

Reading JSON RDD in Spark Streaming

Hi,

I have prices coming through Kafka in the following format

key,{JSON data}

The key is needed as part of data post to NoSQL database like Aerospike.

The following is record of topic from Kafka

ba7e6bdc-2a92-4dc3-8e28-a75e1a7d58f2,{"rowkey":"ba7e6bdc-2a92-4dc3-8e28-a75e1a7d58f2","ticker":"SBRY",
"timeissued":"2019-06-18T22:10:26", "price":555.75}

The "key":"value" pairs inside {} are valid JSON as shown below in JSONLint

https://jsonlint.com/

{
 "rowkey": "ba7e6bdc-2a92-4dc3-8e28-a75e1a7d58f2",
 "ticker": "SBRY",
 "timeissued": "2019-06-18T22:10:26",
 "price": 555.75
}

Now I need to extract values from this JSON.

One way would be to go through dstream

    dstream.foreachRDD
    { pricesRDD =>
      if (!pricesRDD.isEmpty)  // data exists in RDD
      {
         for(row <- pricesRDD.collect.toArray)
         {
           println(row)
           println(row._2.split(',').view(0).toString)
println(row._2.split(',').view(1).split(':').view(1).toString)
println(row._2.split(',').view(2).split(':').view(1).toString)
println(row._2.split(',').view(3).split(':').view(1).toString)

And I get hit and miss as shown in the sample below with incorrect parsing


(ba7e6bdc-2a92-4dc3-8e28-a75e1a7d58f2,{"rowkey":"ba7e6bdc-2a92-4dc3-8e28-a75e1a7d58f2","ticker":"SBRY",
"timeissued":"2019-06-18T22:10:26", "price":555.75})
{"rowkey":"ba7e6bdc-2a92-4dc3-8e28-a75e1a7d58f2"
"SBRY"  //corrrect
"2019-06-18T22  // missing half
555.75}  // incorrect

Is there any way reading JSON data systematically?

Thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.