You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhichao Zhang (JIRA)" <ji...@apache.org> on 2018/08/30 07:37:00 UTC
[jira] [Created] (SPARK-25279) Throw exception: zzcclp
java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in
Spark-shell when run example of doc
Zhichao Zhang created SPARK-25279:
--------------------------------------
Summary: Throw exception: zzcclp java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc
Key: SPARK-25279
URL: https://issues.apache.org/jira/browse/SPARK-25279
Project: Spark
Issue Type: Bug
Components: Spark Shell, SQL
Affects Versions: 2.2.1
Reporter: Zhichao Zhang
Hi dev:
I am using Spark-Shell to run the example which is in section
'[http://spark.apache.org/docs/2.2.2/sql-programming-guide.html#type-safe-user-defined-aggregate-functions'],
and there is an error:
{code:java}
Caused by: java.io.NotSerializableException:
org.apache.spark.sql.TypedColumn
Serialization stack:
- object not serializable (class: org.apache.spark.sql.TypedColumn, value:
myaverage() AS `average_salary`)
- field (class: $iw, name: averageSalary, type: class
org.apache.spark.sql.TypedColumn)
- object (class $iw, $iw@4b2f8ae9)
- field (class: MyAverage$, name: $outer, type: class $iw)
- object (class MyAverage$, MyAverage$@2be41d90)
- field (class:
org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression,
name: aggregator, type: class org.apache.spark.sql.expressions.Aggregator)
- object (class
org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression,
MyAverage(Employee))
- field (class:
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression,
name: aggregateFunction, type: class
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction)
- object (class
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression,
partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class Employee)),
Some(class Employee), Some(StructType(StructField(name,StringType,true),
StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0,
Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0,
Average, true])).count AS count#26L, newInstance(class Average), input[0,
double, false] AS value#24, DoubleType, false, 0, 0))
- writeObject data (class:
scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.List$SerializationProxy,
scala.collection.immutable.List$SerializationProxy@5e92c46f)
- writeReplace data (class:
scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.$colon$colon,
List(partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class
Employee)), Some(class Employee),
Some(StructType(StructField(name,StringType,true),
StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0,
Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0,
Average, true])).count AS count#26L, newInstance(class Average), input[0,
double, false] AS value#24, DoubleType, false, 0, 0)))
- field (class:
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec, name:
aggregateExpressions, type: interface scala.collection.Seq)
- object (class
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec,
ObjectHashAggregate(keys=[],
functions=[partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class
Employee)), Some(class Employee),
Some(StructType(StructField(name,StringType,true),
StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0,
Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0,
Average, true])).count AS count#26L, newInstance(class Average), input[0,
double, false] AS value#24, DoubleType, false, 0, 0)], output=[buf#37])
+- *FileScan json [name#8,salary#9L] Batched: false, Format: JSON, Location:
InMemoryFileIndex[file:/opt/spark2/examples/src/main/resources/employees.json],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<name:string,salary:bigint>
)
- field (class:
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1,
name: $outer, type: class
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec)
- object (class
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1,
<function0>)
- field (class:
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2,
name: $outer, type: class
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1)
- object (class
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2,
<function1>)
- field (class: org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1,
name: f$23, type: interface scala.Function1)
- object (class org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1,
<function0>)
- field (class:
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25,
name: $outer, type: class
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1)
- object (class
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25,
<function3>)
- field (class: org.apache.spark.rdd.MapPartitionsRDD, name: f, type:
interface scala.Function3)
- object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[9]
at show at <console>:62)
- field (class: org.apache.spark.NarrowDependency, name: _rdd, type: class
org.apache.spark.rdd.RDD)
- object (class org.apache.spark.OneToOneDependency,
org.apache.spark.OneToOneDependency@5bb7895)
- writeObject data (class:
scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.List$SerializationProxy,
scala.collection.immutable.List$SerializationProxy@6e81dca3)
- writeReplace data (class:
scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.$colon$colon,
List(org.apache.spark.OneToOneDependency@5bb7895))
- field (class: org.apache.spark.rdd.RDD, name:
org$apache$spark$rdd$RDD$$dependencies_, type: interface
scala.collection.Seq)
- object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[10]
at show at <console>:62)
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, (MapPartitionsRDD[10] at show at
<console>:62,org.apache.spark.ShuffleDependency@421cd28))
at
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) {code}
But if I use idea to run the example directly, it works. What is the
difference among them? How I run the example sucessfully on Spark-Shell?
Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org