You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhichao Zhang (JIRA)" <ji...@apache.org> on 2018/09/03 16:13:00 UTC
[jira] [Comment Edited] (SPARK-25279) Throw exception: zzcclp
java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in
Spark-shell when run example of doc
[ https://issues.apache.org/jira/browse/SPARK-25279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602308#comment-16602308 ]
Zhichao Zhang edited comment on SPARK-25279 at 9/3/18 4:12 PM:
----------------------------------------------------------------
[~dkbiswal], I followed your steps to run code successfully, but if I pasted all code and then ran the code, the error occured:
{code:java}
scala>:paste
...copy all code here
Crtl + D{code}
what is the difference between these two mode?
was (Author: zzcclp):
[~dkbiswal], I followed your steps to run code successfully, but if I pasted all code and then ran the code, the error occured:
scala>:paste
...copy all code here
Crtl + D
what is the difference between these two mode?
> Throw exception: zzcclp java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-25279
> URL: https://issues.apache.org/jira/browse/SPARK-25279
> Project: Spark
> Issue Type: Bug
> Components: Spark Shell, SQL
> Affects Versions: 2.2.1
> Reporter: Zhichao Zhang
> Priority: Minor
>
> Hi dev:
> I am using Spark-Shell to run the example which is in section
> '[http://spark.apache.org/docs/2.2.2/sql-programming-guide.html#type-safe-user-defined-aggregate-functions'],
> and there is an error:
> {code:java}
> Caused by: java.io.NotSerializableException:
> org.apache.spark.sql.TypedColumn
> Serialization stack:
> - object not serializable (class: org.apache.spark.sql.TypedColumn, value:
> myaverage() AS `average_salary`)
> - field (class: $iw, name: averageSalary, type: class
> org.apache.spark.sql.TypedColumn)
> - object (class $iw, $iw@4b2f8ae9)
> - field (class: MyAverage$, name: $outer, type: class $iw)
> - object (class MyAverage$, MyAverage$@2be41d90)
> - field (class:
> org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression,
> name: aggregator, type: class org.apache.spark.sql.expressions.Aggregator)
> - object (class
> org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression,
> MyAverage(Employee))
> - field (class:
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression,
> name: aggregateFunction, type: class
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction)
> - object (class
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression,
> partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class Employee)),
> Some(class Employee), Some(StructType(StructField(name,StringType,true),
> StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0,
> Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0,
> Average, true])).count AS count#26L, newInstance(class Average), input[0,
> double, false] AS value#24, DoubleType, false, 0, 0))
> - writeObject data (class:
> scala.collection.immutable.List$SerializationProxy)
> - object (class scala.collection.immutable.List$SerializationProxy,
> scala.collection.immutable.List$SerializationProxy@5e92c46f)
> - writeReplace data (class:
> scala.collection.immutable.List$SerializationProxy)
> - object (class scala.collection.immutable.$colon$colon,
> List(partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class
> Employee)), Some(class Employee),
> Some(StructType(StructField(name,StringType,true),
> StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0,
> Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0,
> Average, true])).count AS count#26L, newInstance(class Average), input[0,
> double, false] AS value#24, DoubleType, false, 0, 0)))
> - field (class:
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec, name:
> aggregateExpressions, type: interface scala.collection.Seq)
> - object (class
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec,
> ObjectHashAggregate(keys=[],
> functions=[partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class
> Employee)), Some(class Employee),
> Some(StructType(StructField(name,StringType,true),
> StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0,
> Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0,
> Average, true])).count AS count#26L, newInstance(class Average), input[0,
> double, false] AS value#24, DoubleType, false, 0, 0)], output=[buf#37])
> +- *FileScan json [name#8,salary#9L] Batched: false, Format: JSON, Location:
> InMemoryFileIndex[file:/opt/spark2/examples/src/main/resources/employees.json],
> PartitionFilters: [], PushedFilters: [], ReadSchema:
> struct<name:string,salary:bigint>
> )
> - field (class:
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1,
> name: $outer, type: class
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec)
> - object (class
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1,
> <function0>)
> - field (class:
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2,
> name: $outer, type: class
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1)
> - object (class
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2,
> <function1>)
> - field (class: org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1,
> name: f$23, type: interface scala.Function1)
> - object (class org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1,
> <function0>)
> - field (class:
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25,
> name: $outer, type: class
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1)
> - object (class
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25,
> <function3>)
> - field (class: org.apache.spark.rdd.MapPartitionsRDD, name: f, type:
> interface scala.Function3)
> - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[9]
> at show at <console>:62)
> - field (class: org.apache.spark.NarrowDependency, name: _rdd, type: class
> org.apache.spark.rdd.RDD)
> - object (class org.apache.spark.OneToOneDependency,
> org.apache.spark.OneToOneDependency@5bb7895)
> - writeObject data (class:
> scala.collection.immutable.List$SerializationProxy)
> - object (class scala.collection.immutable.List$SerializationProxy,
> scala.collection.immutable.List$SerializationProxy@6e81dca3)
> - writeReplace data (class:
> scala.collection.immutable.List$SerializationProxy)
> - object (class scala.collection.immutable.$colon$colon,
> List(org.apache.spark.OneToOneDependency@5bb7895))
> - field (class: org.apache.spark.rdd.RDD, name:
> org$apache$spark$rdd$RDD$$dependencies_, type: interface
> scala.collection.Seq)
> - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[10]
> at show at <console>:62)
> - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
> - object (class scala.Tuple2, (MapPartitionsRDD[10] at show at
> <console>:62,org.apache.spark.ShuffleDependency@421cd28))
> at
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) {code}
>
> But if I use idea to run the example directly, it works. What is the
> difference among them? How I run the example sucessfully on Spark-Shell?
> Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org