You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shivu Sondur (JIRA)" <ji...@apache.org> on 2019/08/13 04:21:00 UTC

[jira] [Commented] (SPARK-28702) Display useful error message (instead of NPE) for invalid Dataset operations (e.g. calling actions inside of transformations)

    [ https://issues.apache.org/jira/browse/SPARK-28702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905802#comment-16905802 ] 

Shivu Sondur commented on SPARK-28702:
--------------------------------------

i will check this issue

> Display useful error message (instead of NPE) for invalid Dataset operations (e.g. calling actions inside of transformations)
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-28702
>                 URL: https://issues.apache.org/jira/browse/SPARK-28702
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Josh Rosen
>            Priority: Major
>
> In Spark, SparkContext and SparkSession can only be used on the driver, not on executors. For example, this means that you cannot call {{someDataset.collect()}} inside of a Dataset or RDD transformation.
> When Spark serializes RDDs and Datasets, references to SparkContext and SparkSession are null'ed out (by being marked as {{@transient}} or via the Closure Cleaner). As a result, RDD and Dataset methods which reference use these driver-side-only objects (e.g. actions or transformations) will see {{null}} references and may fail with a {{NullPointerException}}. For example, in code which (via a chain of calls) tried to {{collect()}} a dataset inside of a Dataset.map operation:
> {code:java}Caused by: java.lang.NullPointerException
> at <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution$lzycompute(Dataset.scala:3027)
> at <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution(Dataset.scala:3025)
> at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3038)
> at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3036)
> [...] {code}
> The resulting NPE can be _very_ confusing to users.
> In SPARK-5063 I added some logic to throw clearer error messages when performing similar invalid actions on RDDs. This ticket's scope is to implement similar logic for Datasets.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org