You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Exie <tf...@prodevelop.com.au> on 2015/07/24 04:25:41 UTC

SparkR Supported Types - Please add "bigint"

Hi Folks,

Using Spark to read in JSON files and detect the schema, it gives me a
dataframe with a "bigint" filed. R then fails to import the dataframe as it
cant convert the type.

> head(mydf)
Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class ""jobj"" to a data.frame
>
> show(mydf)
DataFrame[localEventDtTm:timestamp, asset:string, assetCategory:string,
assetType:string, event:string,
extras:array<struct&lt;name:string,value:string>>, ipAddress:string,
memberId:string, system:string, timestamp:bigint, title:string,
trackingId:string, version:bigint]
>

I believe this is related to:
https://issues.apache.org/jira/browse/SPARK-8840

A sample record in raw JSON looks like this:
{"version": 1,"event": "view","timestamp": 1427846422377,"system":
"DCDS","asset": "6404476","assetType": "myType","assetCategory":
"myCategory","extras": [{"name": "videoSource","value": "mySource"},{"name":
"playerType","value": "Article"},{"name": "duration","value":
"202088"}],"trackingId": "155629a0-d802-11e4-13ee-6884e43d6000","ipAddress":
"165.69.2.4","title": "myTitle"}

Can someone turn this into a feature request or something for 1.5.0 ?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Supported-Types-Please-add-bigint-tp23975.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: SparkR Supported Types - Please add "bigint"

Posted by "Sun, Rui" <ru...@intel.com>.

Exie,

Reported your issue: https://issues.apache.org/jira/browse/SPARK-9302

SparkR has support for long(bigint) type in serde. This issue is related to support complex Scala types in serde.

-----Original Message-----
From: Exie [mailto:tfindlay@prodevelop.com.au] 
Sent: Friday, July 24, 2015 10:26 AM
To: user@spark.apache.org
Subject: SparkR Supported Types - Please add "bigint"

Hi Folks,

Using Spark to read in JSON files and detect the schema, it gives me a dataframe with a "bigint" filed. R then fails to import the dataframe as it cant convert the type.

> head(mydf)
Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class ""jobj"" to a data.frame
>
> show(mydf)
DataFrame[localEventDtTm:timestamp, asset:string, assetCategory:string, assetType:string, event:string, extras:array<struct&lt;name:string,value:string>>, ipAddress:string, memberId:string, system:string, timestamp:bigint, title:string, trackingId:string, version:bigint]
>

I believe this is related to:
https://issues.apache.org/jira/browse/SPARK-8840

A sample record in raw JSON looks like this:
{"version": 1,"event": "view","timestamp": 1427846422377,"system":
"DCDS","asset": "6404476","assetType": "myType","assetCategory":
"myCategory","extras": [{"name": "videoSource","value": "mySource"},{"name":
"playerType","value": "Article"},{"name": "duration","value":
"202088"}],"trackingId": "155629a0-d802-11e4-13ee-6884e43d6000","ipAddress":
"165.69.2.4","title": "myTitle"}

Can someone turn this into a feature request or something for 1.5.0 ?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Supported-Types-Please-add-bigint-tp23975.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: SparkR Supported Types - Please add "bigint"

Posted by Davies Liu <da...@databricks.com>.

They are actually the same thing, LongType. `long` is friendly for
developer, `bigint` is friendly for database guy, maybe data
scientists.

On Thu, Jul 23, 2015 at 11:33 PM, Sun, Rui <ru...@intel.com> wrote:
> printSchema calls StructField. buildFormattedString() to output schema information. buildFormattedString() use DataType.typeName as string representation of  the data type.
>
> LongType. typeName = "long"
> LongType.simpleString = "bigint"
>
> I am not sure about the difference of these two type name representations.
>
> -----Original Message-----
> From: Exie [mailto:tfindlay@prodevelop.com.au]
> Sent: Friday, July 24, 2015 1:35 PM
> To: user@spark.apache.org
> Subject: Re: SparkR Supported Types - Please add "bigint"
>
> Interestingly, after more digging, df.printSchema() in raw spark shows the columns as a long, not a bigint.
>
> root
>  |-- localEventDtTm: timestamp (nullable = true)
>  |-- asset: string (nullable = true)
>  |-- assetCategory: string (nullable = true)
>  |-- assetType: string (nullable = true)
>  |-- event: string (nullable = true)
>  |-- extras: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- name: string (nullable = true)
>  |    |    |-- value: string (nullable = true)
>  |-- ipAddress: string (nullable = true)
>  |-- memberId: string (nullable = true)
>  |-- system: string (nullable = true)
>  |-- timestamp: long (nullable = true)
>  |-- title: string (nullable = true)
>  |-- trackingId: string (nullable = true)
>  |-- version: long (nullable = true)
>
> I'm going to have to keep digging I guess. :(
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Supported-Types-Please-add-bigint-tp23975p23978.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: SparkR Supported Types - Please add "bigint"

Posted by "Sun, Rui" <ru...@intel.com>.

printSchema calls StructField. buildFormattedString() to output schema information. buildFormattedString() use DataType.typeName as string representation of  the data type.

LongType. typeName = "long"
LongType.simpleString = "bigint"

I am not sure about the difference of these two type name representations.

-----Original Message-----
From: Exie [mailto:tfindlay@prodevelop.com.au] 
Sent: Friday, July 24, 2015 1:35 PM
To: user@spark.apache.org
Subject: Re: SparkR Supported Types - Please add "bigint"

Interestingly, after more digging, df.printSchema() in raw spark shows the columns as a long, not a bigint.

root
 |-- localEventDtTm: timestamp (nullable = true)
 |-- asset: string (nullable = true)
 |-- assetCategory: string (nullable = true)
 |-- assetType: string (nullable = true)
 |-- event: string (nullable = true)
 |-- extras: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |-- ipAddress: string (nullable = true)
 |-- memberId: string (nullable = true)
 |-- system: string (nullable = true)
 |-- timestamp: long (nullable = true)
 |-- title: string (nullable = true)
 |-- trackingId: string (nullable = true)
 |-- version: long (nullable = true)

I'm going to have to keep digging I guess. :(




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Supported-Types-Please-add-bigint-tp23975p23978.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: SparkR Supported Types - Please add "bigint"

Posted by Exie <tf...@prodevelop.com.au>.

Interestingly, after more digging, df.printSchema() in raw spark shows the
columns as a long, not a bigint.

root
 |-- localEventDtTm: timestamp (nullable = true)
 |-- asset: string (nullable = true)
 |-- assetCategory: string (nullable = true)
 |-- assetType: string (nullable = true)
 |-- event: string (nullable = true)
 |-- extras: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- value: string (nullable = true)
 |-- ipAddress: string (nullable = true)
 |-- memberId: string (nullable = true)
 |-- system: string (nullable = true)
 |-- timestamp: long (nullable = true)
 |-- title: string (nullable = true)
 |-- trackingId: string (nullable = true)
 |-- version: long (nullable = true)

I'm going to have to keep digging I guess. :(




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Supported-Types-Please-add-bigint-tp23975p23978.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org