You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Tracy <mi...@gmail.com> on 2015/07/05 03:35:16 UTC

Splitting dataframe using Spark 1.4 for nested json input

Hello, 

I am having issues with splitting contents of a dataframe column using Spark
1.4. The dataframe was created by reading a nested complex json file. I used
df.explode but keep getting error message.

scala> val df = sqlContext.read.json("/Users/xx/target/statsfile.json")
scala> df.show() 
+--------------------+--------------------+
|                  mi|                neid|
+--------------------+--------------------+
|[900,["pmEs","pmS...|[SubNetwork=ONRM_...|
|[900,["pmIcmpInEr...|[SubNetwork=ONRM_...|
|[900,pmUnsuccessf...|[SubNetwork=ONRM_...|

+--------------------+--------------------+
scala> df.printSchema()
root 
 |-- mi: struct (nullable = true)
 |    |-- gp: long (nullable = true)
 |    |-- mt: string (nullable = true)
 |    |-- mts: string (nullable = true)
 |    |-- mv: string (nullable = true)
 |-- neid: struct (nullable = true)
 |    |-- nedn: string (nullable = true)
 |    |-- nesw: string (nullable = true)
 |    |-- neun: string (nullable = true)

scala> val df1=df.select("mi.mv²)
df1: org.apache.spark.sql.DataFrame = [mv: string]

scala> val df1=df.select("mi.mv").show()
+--------------------+
|                  mv|
+--------------------+
|[{"r":[0,0,0],"mo...|
|{"r":[0,4,0,4],"m...|
|{"r":5,"moid":"Ma...|

+--------------------+


scala> df1.explode("mv","mvnew")(mv => mv.split(","))

<console>:28: error: value split is not a member of Nothing
df1.explode("mv","mvnew")(mv => mv.split(","))

The json file format looks like

[   
    {   
        "neid":{  },
        "mi":{   
            "mts":"20100609071500Z",
            "gp":"900",
            "tMOID":"Aal2Ap",
            "mt":[  ],
            "mv":[ 
                {  
                   
"moid":"ManagedElement=1,TransportNetwork=1,Aal2Sp=1,Aal2Ap=r1552q",
                    "r":
                       [ ŠŠ.]
                }, 
                { 
                   
"moid":"ManagedElement=1,TransportNetwork=1,Aal2Sp=1,Aal2Ap=r1556q",
                    "r":
                       [ ŠŠ.]
                } 
            ] 
        } 
    }
] 


Am i doing something wrong? I need to extract data under mi.mv in separate
columns so i can apply some transformations.

Regards

Mike