You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhichao Zhang (JIRA)" <ji...@apache.org> on 2018/09/05 16:02:00 UTC
[jira] [Resolved] (SPARK-25279) Throw exception: zzcclp java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc

     [ https://issues.apache.org/jira/browse/SPARK-25279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhichao  Zhang resolved SPARK-25279.
------------------------------------
    Resolution: Won't Fix

> Throw exception: zzcclp   java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25279
>                 URL: https://issues.apache.org/jira/browse/SPARK-25279
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell, SQL
>    Affects Versions: 2.2.1
>            Reporter: Zhichao  Zhang
>            Priority: Minor
>
> Hi dev: 
>   I am using Spark-Shell to run the example which is in section 
> '[http://spark.apache.org/docs/2.2.2/sql-programming-guide.html#type-safe-user-defined-aggregate-functions'], 
> and there is an error: 
> {code:java}
> Caused by: java.io.NotSerializableException: 
> org.apache.spark.sql.TypedColumn 
> Serialization stack: 
>         - object not serializable (class: org.apache.spark.sql.TypedColumn, value: 
> myaverage() AS `average_salary`) 
>         - field (class: $iw, name: averageSalary, type: class 
> org.apache.spark.sql.TypedColumn) 
>         - object (class $iw, $iw@4b2f8ae9) 
>         - field (class: MyAverage$, name: $outer, type: class $iw) 
>         - object (class MyAverage$, MyAverage$@2be41d90) 
>         - field (class: 
> org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression, 
> name: aggregator, type: class org.apache.spark.sql.expressions.Aggregator) 
>         - object (class 
> org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression, 
> MyAverage(Employee)) 
>         - field (class: 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression, 
> name: aggregateFunction, type: class 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction) 
>         - object (class 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression, 
> partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class Employee)), 
> Some(class Employee), Some(StructType(StructField(name,StringType,true), 
> StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0, 
> Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0, 
> Average, true])).count AS count#26L, newInstance(class Average), input[0, 
> double, false] AS value#24, DoubleType, false, 0, 0)) 
>         - writeObject data (class: 
> scala.collection.immutable.List$SerializationProxy) 
>         - object (class scala.collection.immutable.List$SerializationProxy, 
> scala.collection.immutable.List$SerializationProxy@5e92c46f) 
>         - writeReplace data (class: 
> scala.collection.immutable.List$SerializationProxy) 
>         - object (class scala.collection.immutable.$colon$colon, 
> List(partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class 
> Employee)), Some(class Employee), 
> Some(StructType(StructField(name,StringType,true), 
> StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0, 
> Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0, 
> Average, true])).count AS count#26L, newInstance(class Average), input[0, 
> double, false] AS value#24, DoubleType, false, 0, 0))) 
>         - field (class: 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec, name: 
> aggregateExpressions, type: interface scala.collection.Seq) 
>         - object (class 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec, 
> ObjectHashAggregate(keys=[], 
> functions=[partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class 
> Employee)), Some(class Employee), 
> Some(StructType(StructField(name,StringType,true), 
> StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0, 
> Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0, 
> Average, true])).count AS count#26L, newInstance(class Average), input[0, 
> double, false] AS value#24, DoubleType, false, 0, 0)], output=[buf#37]) 
> +- *FileScan json [name#8,salary#9L] Batched: false, Format: JSON, Location: 
> InMemoryFileIndex[file:/opt/spark2/examples/src/main/resources/employees.json], 
> PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct<name:string,salary:bigint> 
> ) 
>         - field (class: 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1, 
> name: $outer, type: class 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec) 
>         - object (class 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1, 
> <function0>) 
>         - field (class: 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2, 
> name: $outer, type: class 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1) 
>         - object (class 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2, 
> <function1>) 
>         - field (class: org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1, 
> name: f$23, type: interface scala.Function1) 
>         - object (class org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1, 
> <function0>) 
>         - field (class: 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25, 
> name: $outer, type: class 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1) 
>         - object (class 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25, 
> <function3>) 
>         - field (class: org.apache.spark.rdd.MapPartitionsRDD, name: f, type: 
> interface scala.Function3) 
>         - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[9] 
> at show at <console>:62) 
>         - field (class: org.apache.spark.NarrowDependency, name: _rdd, type: class 
> org.apache.spark.rdd.RDD) 
>         - object (class org.apache.spark.OneToOneDependency, 
> org.apache.spark.OneToOneDependency@5bb7895) 
>         - writeObject data (class: 
> scala.collection.immutable.List$SerializationProxy) 
>         - object (class scala.collection.immutable.List$SerializationProxy, 
> scala.collection.immutable.List$SerializationProxy@6e81dca3) 
>         - writeReplace data (class: 
> scala.collection.immutable.List$SerializationProxy) 
>         - object (class scala.collection.immutable.$colon$colon, 
> List(org.apache.spark.OneToOneDependency@5bb7895)) 
>         - field (class: org.apache.spark.rdd.RDD, name: 
> org$apache$spark$rdd$RDD$$dependencies_, type: interface 
> scala.collection.Seq) 
>         - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[10] 
> at show at <console>:62) 
>         - field (class: scala.Tuple2, name: _1, type: class java.lang.Object) 
>         - object (class scala.Tuple2, (MapPartitionsRDD[10] at show at 
> <console>:62,org.apache.spark.ShuffleDependency@421cd28)) 
>   at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) {code}
>  
>   But if I use idea to run the example directly, it works. What is the  
> difference among them? How I run the example sucessfully on Spark-Shell? 
>   Thanks. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org