You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SK <sk...@gmail.com> on 2015/08/31 22:56:12 UTC

Parsing nested json objects with variable structure

Hi,

I need to parse a json input file where the nested objects take on a
different structure based on the typeId field, as follows: 

{ "d":
        {  "uid" : "12345"
          "contents": [{"info": {"eventId": "event1"}, "typeId": 19}]
         }
}

{ "d":
        {  "uid" :  "56780"
           "contents": [{"info": {"id": "1"}, "typeId": 1003}, {"info":
{"id": "27"}, "typeId": 13}]
         }
}

In the above, the "contents" field takes on a different structure for typeId
13 and 19. My code is currently as follows:

logs  = sqlc.read.json(sys.argv[1])
logs.registerTempTable("logs")

features = sqlc.sql("SELECT d.uid, d.contents.typeId FROM logs")

I also need to extract the fields in d.contents.info. How can I extract
these fields since they have different names depending on the typeId?  I am
using Pyspark in Spark version 1.4.1. Any guidance in python or scala would
be helpful.

thanks
sudha









--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Parsing-nested-json-objects-with-variable-structure-tp24526.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org