You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sun Rui (JIRA)" <ji...@apache.org> on 2015/12/14 10:16:46 UTC

[jira] [Commented] (SPARK-10312) Enhance SerDe to handle atomic vector

    [ https://issues.apache.org/jira/browse/SPARK-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055681#comment-15055681 ] 

Sun Rui commented on SPARK-10312:
---------------------------------

The gap between R and Scala/Java is that R has no scalar types.
if we want to support this, pesudo code in SerDe would like:
{code}
  if (object is an atomic vector) {
    if (length(object) == 1) {
      write it as a scalar value
    } else {
      # length(object) == 0 or length(object) > 1
      if (there is any NA in the vector) {
        promote it to be a list, and write the list
      } else {
        write it as an array
      }
    }
  }
{code}

The problem of support this feature is that it may confuse users. Take read.parquet for example:
{code}
read.parquet(sqlContext, c("path1", "path2")) will work,
while read.parquet(sqlContext, c("path1")) won't work,  // because method signature does not match on JVM side
but read.parquet(sqlContext, as.list(c("path1"))) will work
{code}

So maybe the current  behavior is better, that is:
for a vector, SerDe always write it as a scalar value. In order to fully write a vector, as.list() is required.

> Enhance SerDe to handle atomic vector
> -------------------------------------
>
>                 Key: SPARK-10312
>                 URL: https://issues.apache.org/jira/browse/SPARK-10312
>             Project: Spark
>          Issue Type: Improvement
>          Components: SparkR
>    Affects Versions: 1.4.1
>            Reporter: Sun Rui
>
> Currently, writeObject() does not handle atomic vector well. For an atomic vector, it treats it like a scalar object. For example, if you pass c(1:10) into writeObject, it will write a single integer as 1. You have to explicitly cast an atomic vector, for example, as.list(1:10), to a list, if you want to write the whole vector.
> Could we enhance the SerDe that when the object is an atomic vector whose length >1, convert it to a list and then write?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org