You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hossein Falaki (JIRA)" <ji...@apache.org> on 2016/10/10 19:13:20 UTC

[jira] [Commented] (SPARK-17781) datetime is serialized as double inside dapply()

    [ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563193#comment-15563193 ] 

Hossein Falaki commented on SPARK-17781:
----------------------------------------

I investigated the issue. The root cause is that Date (and Timestamp) types convert to underlying representations when the are put in a list. To see it, do following simple test in an R REPL:

{code}
> l <- lapply(1:2, function(x) { Sys.Date() })
> print(paste("list values", l))
[1] "list values 17084" "list values 17084"
{code}

Similar problem happens with POSIXlt and POSIXct types. Therefore in {{worker.R}} when we call {{computeFunc(inputData)}} we are dealing with a list that contains double values for date fields. 

Right now it seems the safe way to work around it is avoiding Date and Time types and instead use String. [~shivaram] and [~felixcheung] do you have any ideas?

> datetime is serialized as double inside dapply()
> ------------------------------------------------
>
>                 Key: SPARK-17781
>                 URL: https://issues.apache.org/jira/browse/SPARK-17781
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.0.0
>            Reporter: Hossein Falaki
>
> When we ship a SparkDataFrame to workers for dapply family functions, inside the worker DateTime objects are serialized as double.
> To reproduce:
> {code}
> df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date()))
> dapplyCollect(df, function(x) { return(x$date) })
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org