You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Tracy <mi...@gmail.com> on 2015/07/05 03:35:16 UTC
Splitting dataframe using Spark 1.4 for nested json input
Hello,
I am having issues with splitting contents of a dataframe column using Spark
1.4. The dataframe was created by reading a nested complex json file. I used
df.explode but keep getting error message.
scala> val df = sqlContext.read.json("/Users/xx/target/statsfile.json")
scala> df.show()
+--------------------+--------------------+
| mi| neid|
+--------------------+--------------------+
|[900,["pmEs","pmS...|[SubNetwork=ONRM_...|
|[900,["pmIcmpInEr...|[SubNetwork=ONRM_...|
|[900,pmUnsuccessf...|[SubNetwork=ONRM_...|
+--------------------+--------------------+
scala> df.printSchema()
root
|-- mi: struct (nullable = true)
| |-- gp: long (nullable = true)
| |-- mt: string (nullable = true)
| |-- mts: string (nullable = true)
| |-- mv: string (nullable = true)
|-- neid: struct (nullable = true)
| |-- nedn: string (nullable = true)
| |-- nesw: string (nullable = true)
| |-- neun: string (nullable = true)
scala> val df1=df.select("mi.mv²)
df1: org.apache.spark.sql.DataFrame = [mv: string]
scala> val df1=df.select("mi.mv").show()
+--------------------+
| mv|
+--------------------+
|[{"r":[0,0,0],"mo...|
|{"r":[0,4,0,4],"m...|
|{"r":5,"moid":"Ma...|
+--------------------+
scala> df1.explode("mv","mvnew")(mv => mv.split(","))
<console>:28: error: value split is not a member of Nothing
df1.explode("mv","mvnew")(mv => mv.split(","))
The json file format looks like
[
{
"neid":{ },
"mi":{
"mts":"20100609071500Z",
"gp":"900",
"tMOID":"Aal2Ap",
"mt":[ ],
"mv":[
{
"moid":"ManagedElement=1,TransportNetwork=1,Aal2Sp=1,Aal2Ap=r1552q",
"r":
[ .]
},
{
"moid":"ManagedElement=1,TransportNetwork=1,Aal2Sp=1,Aal2Ap=r1556q",
"r":
[ .]
}
]
}
}
]
Am i doing something wrong? I need to extract data under mi.mv in separate
columns so i can apply some transformations.
Regards
Mike