You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by anu <an...@gmail.com> on 2015/03/18 06:50:33 UTC
Transform a Schema RDD to another Schema RDD with a different
schema
I have a schema RDD with thw following Schema :
scala> mainRDD.printSchema
root
|-- COL1: integer (nullable = false)
|-- COL2: integer (nullable = false)
|-- COL3: string (nullable = true)
|-- COL4: double (nullable = false)
|-- COL5: string (nullable = true)
Now, I transform the mainRDD like this :
scala> val sdf1 = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss.SSS"); val
calendar = Calendar.getInstance()
scala> val mappedRDD : SchemaRDD = intf_ddRDD.map{ r =>
| val end_time = sdf1.parse(r(2).toString);
| calendar.setTime(end_time);
| val r2 = new java.sql.Timestamp(end_time.getTime);
| val hour: Long = calendar.get(Calendar.HOUR_OF_DAY);
| (r(0).toString.toInt, r(1).toString.toInt, r2, hour,
r(3).toString.toDouble, r(4).toString)
| }
scala>mappedRDD.printSchema
root
|-- _1: integer (nullable = false)
|-- _2: integer (nullable = false)
|-- _3: timestamp (nullable = true)
|-- _4: long (nullable = false)
|-- _5: double (nullable = false)
|-- _6: string (nullable = true)
But the issue is, despite specifying the mainRDD as SchemaRDD, it becomes
just an RDD (notice that the column names are lost in mappedRDD)
So, how can I do the above transformation on one SchemaRDD (mainRDD) to get
another SchemaRDD (mappedRDD) with a different Schema.
Please help me out.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Transform-a-Schema-RDD-to-another-Schema-RDD-with-a-different-schema-tp22112.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org