You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by anu <an...@gmail.com> on 2015/03/18 06:50:33 UTC

Transform a Schema RDD to another Schema RDD with a different schema

I have a schema RDD with thw following Schema :

scala> mainRDD.printSchema
root
 |-- COL1: integer (nullable = false)
 |-- COL2: integer (nullable = false)
 |-- COL3: string (nullable = true)
 |-- COL4: double (nullable = false)
 |-- COL5: string (nullable = true)



Now, I transform the mainRDD like this :


scala> val sdf1 = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss.SSS"); val
calendar = Calendar.getInstance()
 
scala> val mappedRDD : SchemaRDD = intf_ddRDD.map{ r => 
	| val end_time = sdf1.parse(r(2).toString); 
	| calendar.setTime(end_time); 
	| val r2 = new java.sql.Timestamp(end_time.getTime); 
	| val hour: Long = calendar.get(Calendar.HOUR_OF_DAY); 
	| (r(0).toString.toInt, r(1).toString.toInt, r2, hour,
r(3).toString.toDouble, r(4).toString)
	| }


scala>mappedRDD.printSchema
root
 |-- _1: integer (nullable = false)
 |-- _2: integer (nullable = false)
 |-- _3: timestamp (nullable = true)
 |-- _4: long (nullable = false)
 |-- _5: double (nullable = false)
 |-- _6: string (nullable = true)


But the issue is, despite specifying the mainRDD as SchemaRDD, it becomes
just an RDD (notice that the column names are lost in mappedRDD)

So, how can I do the above transformation on one SchemaRDD (mainRDD) to get
another SchemaRDD (mappedRDD) with a different Schema.

Please help me out.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Transform-a-Schema-RDD-to-another-Schema-RDD-with-a-different-schema-tp22112.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org