You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/20 02:39:15 UTC

[GitHub] [spark] joshrosen-stripe commented on a change in pull request #25503: [SPARK-28702][SQL] Display useful error message (instead of NPE) for invalid Dataset operations

joshrosen-stripe commented on a change in pull request #25503: [SPARK-28702][SQL] Display useful error message (instead of NPE) for invalid Dataset operations 
URL: https://github.com/apache/spark/pull/25503#discussion_r315484406
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##########
 @@ -184,11 +184,26 @@ private[sql] object Dataset {
  */
 @Stable
 class Dataset[T] private[sql](
-    @transient val sparkSession: SparkSession,
+    @transient val _sparkSession: SparkSession,
     @DeveloperApi @Unstable @transient val queryExecution: QueryExecution,
     @DeveloperApi @Unstable @transient val encoder: Encoder[T])
   extends Serializable {
 
+  def sparkSession: SparkSession = {
+    if (_sparkSession == null) {
+      throw new SparkException(
+      "This Dataset lacks a SparkSession. It could happen in the following cases: \n(1) Dataset " +
+      "transformations and actions are NOT invoked by the driver, but inside of other " +
+      "transformations; for example, dataset1.map(x => dataset2.values.count() * x) is invalid " +
+      "because the values transformation and count action cannot be performed inside of the " +
+      "dataset1.map transformation. For more information, see SPARK-28702.\n(2) When a Spark " +
 
 Review comment:
   We may want to either re-word or remove bullet point (2) because it's discussing DStreams but I think those are unlikely to be used with Datasets.
   
   (For reference, https://github.com/apache/spark/pull/11595 added this wording for the RDD version of this patch).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org