You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hossein Falaki (JIRA)" <ji...@apache.org> on 2016/10/10 19:13:20 UTC
[jira] [Commented] (SPARK-17781) datetime is serialized as double
inside dapply()
[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563193#comment-15563193 ]
Hossein Falaki commented on SPARK-17781:
----------------------------------------
I investigated the issue. The root cause is that Date (and Timestamp) types convert to underlying representations when the are put in a list. To see it, do following simple test in an R REPL:
{code}
> l <- lapply(1:2, function(x) { Sys.Date() })
> print(paste("list values", l))
[1] "list values 17084" "list values 17084"
{code}
Similar problem happens with POSIXlt and POSIXct types. Therefore in {{worker.R}} when we call {{computeFunc(inputData)}} we are dealing with a list that contains double values for date fields.
Right now it seems the safe way to work around it is avoiding Date and Time types and instead use String. [~shivaram] and [~felixcheung] do you have any ideas?
> datetime is serialized as double inside dapply()
> ------------------------------------------------
>
> Key: SPARK-17781
> URL: https://issues.apache.org/jira/browse/SPARK-17781
> Project: Spark
> Issue Type: Bug
> Components: SparkR
> Affects Versions: 2.0.0
> Reporter: Hossein Falaki
>
> When we ship a SparkDataFrame to workers for dapply family functions, inside the worker DateTime objects are serialized as double.
> To reproduce:
> {code}
> df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date()))
> dapplyCollect(df, function(x) { return(x$date) })
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org