You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sun Rui (JIRA)" <ji...@apache.org> on 2015/12/14 10:16:46 UTC
[jira] [Commented] (SPARK-10312) Enhance SerDe to handle atomic
vector
[ https://issues.apache.org/jira/browse/SPARK-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055681#comment-15055681 ]
Sun Rui commented on SPARK-10312:
---------------------------------
The gap between R and Scala/Java is that R has no scalar types.
if we want to support this, pesudo code in SerDe would like:
{code}
if (object is an atomic vector) {
if (length(object) == 1) {
write it as a scalar value
} else {
# length(object) == 0 or length(object) > 1
if (there is any NA in the vector) {
promote it to be a list, and write the list
} else {
write it as an array
}
}
}
{code}
The problem of support this feature is that it may confuse users. Take read.parquet for example:
{code}
read.parquet(sqlContext, c("path1", "path2")) will work,
while read.parquet(sqlContext, c("path1")) won't work, // because method signature does not match on JVM side
but read.parquet(sqlContext, as.list(c("path1"))) will work
{code}
So maybe the current behavior is better, that is:
for a vector, SerDe always write it as a scalar value. In order to fully write a vector, as.list() is required.
> Enhance SerDe to handle atomic vector
> -------------------------------------
>
> Key: SPARK-10312
> URL: https://issues.apache.org/jira/browse/SPARK-10312
> Project: Spark
> Issue Type: Improvement
> Components: SparkR
> Affects Versions: 1.4.1
> Reporter: Sun Rui
>
> Currently, writeObject() does not handle atomic vector well. For an atomic vector, it treats it like a scalar object. For example, if you pass c(1:10) into writeObject, it will write a single integer as 1. You have to explicitly cast an atomic vector, for example, as.list(1:10), to a list, if you want to write the whole vector.
> Could we enhance the SerDe that when the object is an atomic vector whose length >1, convert it to a list and then write?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org