You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arnav kumar <ak...@gmail.com> on 2018/01/29 21:26:27 UTC

Type Casting Error in Spark Data Frame

Hello Experts,

I would need your advice in resolving the below issue when I am trying to
retrieving the data from a dataframe.

Can you please let me know where I am going wrong.

code :


// create the dataframe by parsing the json
// Message Helper describes the JSON Struct
//data out is the json string received from Streaming Engine.

val dataDF = sparkSession.createDataFrame(dataOut, MessageHelper.sqlMapping)
dataDF.printSchema()
/* -- out put of dataDF.printSchema

root
 |-- messageID: string (nullable = true)
 |-- messageType: string (nullable = true)
 |-- meta: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- messageParsedTimestamp: string (nullable = true)
 |    |    |-- ipaddress: string (nullable = true)
 |-- messageData: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- packetID: string (nullable = true)
 |    |    |-- messageID: string (nullable = true)
 |    |    |-- unixTime: string (nullable = true)



*/


dataDF.createOrReplaceTempView("message")
val routeEventDF=sparkSession.sql("select messageId
,messageData.unixTime,messageData.packetID, messageData.messageID from
message")
routeEventDF.show


Error  on routeEventDF.show
Caused by: java.lang.RuntimeException:
org.apache.spark.sql.catalyst.expressions.GenericRow is not a valid
external type for schema of
array<struct<messageParsedTimestamp:string,ipaddress:string,port:string,message:string>>>>
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalIfFalseExpr14$(Unknown
Source)


Appreciate your help

Best Regards
Arnav Kumar.

Re: Type Casting Error in Spark Data Frame

Posted by "vijay.bvp" <bv...@gmail.com>.
formatted
=============
Assuming MessageHelper.sqlMapping schema is correctly mapped with input json 
(it would help if the schema and sample json is shared) 

here is explode function with dataframes similar functionality is available
with SQL 

import sparkSession.implicits._ 
import org.apache.spark.sql.functions._ 
val routeEventDF=dataDF.select($"messageId"
,explode($"messageData").alias("MessageData")) 
                                     .select($"messageId",
$"MessageData.unixTime",$"MessageData.packetID", 
                                                $"MessageData.messageID") 
routeEventDF.show 


thanks 
Vijay



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Type Casting Error in Spark Data Frame

Posted by "vijay.bvp" <bv...@gmail.com>.
Assuming MessageHelper.sqlMapping schema is correctly mapped with input json
(it would help if the schema and sample json is shared)here is explode
function with dataframes similar functionality is available with SQL import
sparkSession.implicits._import org.apache.spark.sql.functions._val
routeEventDF=dataDF.select($"messageId"
,explode($"messageData").alias("MessageData"))                                   
.select($"messageId", $"MessageData.unixTime",$"MessageData.packetID",                                               
$"MessageData.messageID") routeEventDF.show thanksVijay



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Type Casting Error in Spark Data Frame

Posted by Jean Georges Perrin <jg...@jgp.net>.
You can try to create new columns with the nested value,

> On Jan 29, 2018, at 15:26, Arnav kumar <ak...@gmail.com> wrote:
> 
> Hello Experts,
> 
> I would need your advice in resolving the below issue when I am trying to retrieving the data from a dataframe. 
> 
> Can you please let me know where I am going wrong.
> 
> code :
> 
> 
> // create the dataframe by parsing the json 
> // Message Helper describes the JSON Struct
> //data out is the json string received from Streaming Engine. 
> 
> val dataDF = sparkSession.createDataFrame(dataOut, MessageHelper.sqlMapping)
> dataDF.printSchema()
> /* -- out put of dataDF.printSchema
> 
> root
>  |-- messageID: string (nullable = true)
>  |-- messageType: string (nullable = true)
>  |-- meta: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- messageParsedTimestamp: string (nullable = true)
>  |    |    |-- ipaddress: string (nullable = true)
>  |-- messageData: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- packetID: string (nullable = true)
>  |    |    |-- messageID: string (nullable = true)
>  |    |    |-- unixTime: string (nullable = true)
>  
> 
> 
> */
> 
> 
> dataDF.createOrReplaceTempView("message")
> val routeEventDF=sparkSession.sql("select messageId ,messageData.unixTime,messageData.packetID, messageData.messageID from message")
> routeEventDF.show
> 
> 
> Error  on routeEventDF.show
> Caused by: java.lang.RuntimeException: org.apache.spark.sql.catalyst.expressions.GenericRow is not a valid external type for schema of array<struct<messageParsedTimestamp:string,ipaddress:string,port:string,message:string>>>>
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalIfFalseExpr14$(Unknown Source)
> 
> 
> Appreciate your help
> 
> Best Regards
> Arnav Kumar.
> 
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Type Casting Error in Spark Data Frame

Posted by Patrick McCarthy <pm...@dstillery.com>.
You can't select from an array like that, try instead using 'lateral view
explode' in the query for that element, or before the sql stage
(py)spark.sql.functions.explode.

On Mon, Jan 29, 2018 at 4:26 PM, Arnav kumar <ak...@gmail.com> wrote:

> Hello Experts,
>
> I would need your advice in resolving the below issue when I am trying to
> retrieving the data from a dataframe.
>
> Can you please let me know where I am going wrong.
>
> code :
>
>
> // create the dataframe by parsing the json
> // Message Helper describes the JSON Struct
> //data out is the json string received from Streaming Engine.
>
> val dataDF = sparkSession.createDataFrame(dataOut,
> MessageHelper.sqlMapping)
> dataDF.printSchema()
> /* -- out put of dataDF.printSchema
>
> root
>  |-- messageID: string (nullable = true)
>  |-- messageType: string (nullable = true)
>  |-- meta: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- messageParsedTimestamp: string (nullable = true)
>  |    |    |-- ipaddress: string (nullable = true)
>  |-- messageData: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- packetID: string (nullable = true)
>  |    |    |-- messageID: string (nullable = true)
>  |    |    |-- unixTime: string (nullable = true)
>
>
>
> */
>
>
> dataDF.createOrReplaceTempView("message")
> val routeEventDF=sparkSession.sql("select messageId ,messageData.unixTime,messageData.packetID,
> messageData.messageID from message")
> routeEventDF.show
>
>
> Error  on routeEventDF.show
> Caused by: java.lang.RuntimeException: org.apache.spark.sql.catalyst.expressions.GenericRow
> is not a valid external type for schema of array<struct<
> messageParsedTimestamp:string,ipaddress:string,port:string,
> message:string>>>>
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$
> SpecificUnsafeProjection.evalIfFalseExpr14$(Unknown Source)
>
>
> Appreciate your help
>
> Best Regards
> Arnav Kumar.
>
>
>