You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arnav kumar <ak...@gmail.com> on 2018/01/29 21:26:27 UTC
Type Casting Error in Spark Data Frame
Hello Experts,
I would need your advice in resolving the below issue when I am trying to
retrieving the data from a dataframe.
Can you please let me know where I am going wrong.
code :
// create the dataframe by parsing the json
// Message Helper describes the JSON Struct
//data out is the json string received from Streaming Engine.
val dataDF = sparkSession.createDataFrame(dataOut, MessageHelper.sqlMapping)
dataDF.printSchema()
/* -- out put of dataDF.printSchema
root
|-- messageID: string (nullable = true)
|-- messageType: string (nullable = true)
|-- meta: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- messageParsedTimestamp: string (nullable = true)
| | |-- ipaddress: string (nullable = true)
|-- messageData: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- packetID: string (nullable = true)
| | |-- messageID: string (nullable = true)
| | |-- unixTime: string (nullable = true)
*/
dataDF.createOrReplaceTempView("message")
val routeEventDF=sparkSession.sql("select messageId
,messageData.unixTime,messageData.packetID, messageData.messageID from
message")
routeEventDF.show
Error on routeEventDF.show
Caused by: java.lang.RuntimeException:
org.apache.spark.sql.catalyst.expressions.GenericRow is not a valid
external type for schema of
array<struct<messageParsedTimestamp:string,ipaddress:string,port:string,message:string>>>>
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalIfFalseExpr14$(Unknown
Source)
Appreciate your help
Best Regards
Arnav Kumar.
Re: Type Casting Error in Spark Data Frame
Posted by "vijay.bvp" <bv...@gmail.com>.
formatted
=============
Assuming MessageHelper.sqlMapping schema is correctly mapped with input json
(it would help if the schema and sample json is shared)
here is explode function with dataframes similar functionality is available
with SQL
import sparkSession.implicits._
import org.apache.spark.sql.functions._
val routeEventDF=dataDF.select($"messageId"
,explode($"messageData").alias("MessageData"))
.select($"messageId",
$"MessageData.unixTime",$"MessageData.packetID",
$"MessageData.messageID")
routeEventDF.show
thanks
Vijay
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Type Casting Error in Spark Data Frame
Posted by "vijay.bvp" <bv...@gmail.com>.
Assuming MessageHelper.sqlMapping schema is correctly mapped with input json
(it would help if the schema and sample json is shared)here is explode
function with dataframes similar functionality is available with SQL import
sparkSession.implicits._import org.apache.spark.sql.functions._val
routeEventDF=dataDF.select($"messageId"
,explode($"messageData").alias("MessageData"))
.select($"messageId", $"MessageData.unixTime",$"MessageData.packetID",
$"MessageData.messageID") routeEventDF.show thanksVijay
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Re: Type Casting Error in Spark Data Frame
Posted by Jean Georges Perrin <jg...@jgp.net>.
You can try to create new columns with the nested value,
> On Jan 29, 2018, at 15:26, Arnav kumar <ak...@gmail.com> wrote:
>
> Hello Experts,
>
> I would need your advice in resolving the below issue when I am trying to retrieving the data from a dataframe.
>
> Can you please let me know where I am going wrong.
>
> code :
>
>
> // create the dataframe by parsing the json
> // Message Helper describes the JSON Struct
> //data out is the json string received from Streaming Engine.
>
> val dataDF = sparkSession.createDataFrame(dataOut, MessageHelper.sqlMapping)
> dataDF.printSchema()
> /* -- out put of dataDF.printSchema
>
> root
> |-- messageID: string (nullable = true)
> |-- messageType: string (nullable = true)
> |-- meta: array (nullable = true)
> | |-- element: struct (containsNull = true)
> | | |-- messageParsedTimestamp: string (nullable = true)
> | | |-- ipaddress: string (nullable = true)
> |-- messageData: array (nullable = true)
> | |-- element: struct (containsNull = true)
> | | |-- packetID: string (nullable = true)
> | | |-- messageID: string (nullable = true)
> | | |-- unixTime: string (nullable = true)
>
>
>
> */
>
>
> dataDF.createOrReplaceTempView("message")
> val routeEventDF=sparkSession.sql("select messageId ,messageData.unixTime,messageData.packetID, messageData.messageID from message")
> routeEventDF.show
>
>
> Error on routeEventDF.show
> Caused by: java.lang.RuntimeException: org.apache.spark.sql.catalyst.expressions.GenericRow is not a valid external type for schema of array<struct<messageParsedTimestamp:string,ipaddress:string,port:string,message:string>>>>
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalIfFalseExpr14$(Unknown Source)
>
>
> Appreciate your help
>
> Best Regards
> Arnav Kumar.
>
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Type Casting Error in Spark Data Frame
Posted by Patrick McCarthy <pm...@dstillery.com>.
You can't select from an array like that, try instead using 'lateral view
explode' in the query for that element, or before the sql stage
(py)spark.sql.functions.explode.
On Mon, Jan 29, 2018 at 4:26 PM, Arnav kumar <ak...@gmail.com> wrote:
> Hello Experts,
>
> I would need your advice in resolving the below issue when I am trying to
> retrieving the data from a dataframe.
>
> Can you please let me know where I am going wrong.
>
> code :
>
>
> // create the dataframe by parsing the json
> // Message Helper describes the JSON Struct
> //data out is the json string received from Streaming Engine.
>
> val dataDF = sparkSession.createDataFrame(dataOut,
> MessageHelper.sqlMapping)
> dataDF.printSchema()
> /* -- out put of dataDF.printSchema
>
> root
> |-- messageID: string (nullable = true)
> |-- messageType: string (nullable = true)
> |-- meta: array (nullable = true)
> | |-- element: struct (containsNull = true)
> | | |-- messageParsedTimestamp: string (nullable = true)
> | | |-- ipaddress: string (nullable = true)
> |-- messageData: array (nullable = true)
> | |-- element: struct (containsNull = true)
> | | |-- packetID: string (nullable = true)
> | | |-- messageID: string (nullable = true)
> | | |-- unixTime: string (nullable = true)
>
>
>
> */
>
>
> dataDF.createOrReplaceTempView("message")
> val routeEventDF=sparkSession.sql("select messageId ,messageData.unixTime,messageData.packetID,
> messageData.messageID from message")
> routeEventDF.show
>
>
> Error on routeEventDF.show
> Caused by: java.lang.RuntimeException: org.apache.spark.sql.catalyst.expressions.GenericRow
> is not a valid external type for schema of array<struct<
> messageParsedTimestamp:string,ipaddress:string,port:string,
> message:string>>>>
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$
> SpecificUnsafeProjection.evalIfFalseExpr14$(Unknown Source)
>
>
> Appreciate your help
>
> Best Regards
> Arnav Kumar.
>
>
>