You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (JIRA)" <ji...@apache.org> on 2017/04/05 14:40:41 UTC
[jira] [Commented] (KUDU-1824) KuduRDD.collect fails because of NoSerializableException

    [ https://issues.apache.org/jira/browse/KUDU-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956957#comment-15956957 ] 

Grant Henke commented on KUDU-1824:
-----------------------------------

I think this has been committed [here|https://github.com/apache/kudu/commit/6db540070a9326190c30996851207dfa0fb8066d].

> KuduRDD.collect fails because of NoSerializableException
> --------------------------------------------------------
>
>                 Key: KUDU-1824
>                 URL: https://issues.apache.org/jira/browse/KUDU-1824
>             Project: Kudu
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 1.2.0
>            Reporter: Dan Burkert
>
> [~sarutak] reported in https://gerrit.cloudera.org/#/c/5496/ that the KuduRDD operations which require shuffling data fail due to serialization issues.
> Some investigation notes:
> * The serialization issues only affect KuduRDD, they do not manifest when working with the DataFrame API, for example:
> {code}
> scala> val df = sqlContext.read.option("kudu.master", "localhost").option("kudu.table", "t1").kudu
> df: org.apache.spark.sql.DataFrame = [a: int, b: string]
> scala> df.collect
> res0: Array[org.apache.spark.sql.Row] = Array([1,1], [2,2], [3,3], [4,4], [5,5])
> scala> val rdd = new KuduContext("localhost").kuduRDD(sc, "t1", List("a", "b"))
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = KuduRDD[3] at RDD at KuduRDD.scala:31
> scala> rdd.collect
> 17/01/06 11:28:45 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> java.io.NotSerializableException: org.apache.kudu.client.RowResult
> Serialization stack:
>         - object not serializable (class: org.apache.kudu.client.RowResult, value: RowResult index: 4, size: 21, schema: org.apache.kudu.Schema@2b4e3275)
>         - field (class: org.apache.kudu.spark.kudu.KuduRow, name: rowResult, type: class org.apache.kudu.client.RowResult)
>         - object (class org.apache.kudu.spark.kudu.KuduRow, RowResult index: 4, size: 21, schema: org.apache.kudu.Schema@2b4e3275)
>         - element of array (index: 0)
>         - array (class [Lorg.apache.spark.sql.Row;, size 5)
>         at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
> {code}
> * Any number of partitions will trigger the issue (I tested with a single table table).
> * The issue manifests even in spark local mode (e.g. default spark-shell)
> * The issue manifests even with a single int32 column table (so it's not a binary or string issue)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)