You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "MaxGekk (via GitHub)" <gi...@apache.org> on 2023/11/07 06:51:12 UTC

[PR] [WIP][SQL] Add a SQL config for extra traces in `Origin` [spark]

MaxGekk opened a new pull request, #43695:
URL: https://github.com/apache/spark/pull/43695

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on PR #43695:
URL: https://github.com/apache/spark/pull/43695#issuecomment-1826780514

   Merging to master. Thank you, @cloud-fan and @beliefer for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for extra traces in `Origin` [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43695:
URL: https://github.com/apache/spark/pull/43695#discussion_r1387412665


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4531,6 +4531,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val EXTRA_ORIGIN_TRACES = buildConf("spark.sql.extraOriginTraces")
+    .doc("The number of additional non-Spark SQL traces in the captured DataFrame context. " +
+      "When it is set to 0, captured one Spark traces and a followed non-Spark trace.")
+    .version("4.0.0")
+    .intConf
+    .checkValue(_ >= 0, "The number of extra thread traces must be non-negative.")

Review Comment:
   I'm a bit confused.
   Intuitively, I feel that 0 should represent the absence of non-Spark trace.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for extra traces in `Origin` [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #43695:
URL: https://github.com/apache/spark/pull/43695#issuecomment-1825102919

   Can we show the impact to the real error message, instead of `val ctx = try { df.select(explode($"*")) } catch { case e: AnalysisException => e.context.head }`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk closed pull request #43695: [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context
URL: https://github.com/apache/spark/pull/43695


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for extra traces in `Origin` [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43695:
URL: https://github.com/apache/spark/pull/43695#discussion_r1386785069


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4531,6 +4531,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val EXTRA_ORIGIN_TRACES = buildConf("spark.sql.extraOriginTraces")
+    .doc("The number of additional non-Spark SQL traces in the captured DataFrame context. " +
+      "When it is set to 0, captured one Spark traces and a followed non-Spark trace.")
+    .version("4.0.0")
+    .intConf
+    .checkValue(_ >= 0, "The number of extra thread traces must be non-negative.")

Review Comment:
   should it be `> 0`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for extra traces in `Origin` [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on code in PR #43695:
URL: https://github.com/apache/spark/pull/43695#discussion_r1387094720


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4531,6 +4531,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val EXTRA_ORIGIN_TRACES = buildConf("spark.sql.extraOriginTraces")
+    .doc("The number of additional non-Spark SQL traces in the captured DataFrame context. " +
+      "When it is set to 0, captured one Spark traces and a followed non-Spark trace.")
+    .version("4.0.0")
+    .intConf
+    .checkValue(_ >= 0, "The number of extra thread traces must be non-negative.")

Review Comment:
   This is extra traces, see the config description:
   ```
   When it is set to 0, captured one Spark traces and a followed non-Spark trace.
   ```
   For instance
   <img width="862" alt="Screenshot 2023-11-01 at 21 29 18" src="https://github.com/apache/spark/assets/1580697/82692f23-ea71-40b8-9748-d83f7b97f1df">
   when it is 0, we return 4 and 5
   when it is 1, we return 4, 5, 6



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #43695:
URL: https://github.com/apache/spark/pull/43695#issuecomment-1826686951

   @MaxGekk not quite related in this PR, but what if the expression creation is different from the df creation? like 
   ```
   val divCol = lit(1) / lit(0)
   spark.range(1).select(divCol).collect()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on PR #43695:
URL: https://github.com/apache/spark/pull/43695#issuecomment-1826715119

   @cloud-fan Quite the same:
   ```scala
   scala> val divCol = lit(1) / lit(0)
   val divCol: org.apache.spark.sql.Column = `/`(1, 0)
   
   scala> spark.range(1).select(divCol).collect()
   org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012
   == DataFrame ==
   "div" was called from
   <init>(<console>:1)
   <init>(<console>:15)
   .<clinit>(<console>:1)
   ```
   
   but when I create it in an object:
   ```scala
   scala> object Obj1 {
        | val divCol = lit(1) / lit(0)
        | }
   object Obj1
   
   scala> spark.range(1).select(Obj1.divCol).collect()
   org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012
   == DataFrame ==
   "div" was called from
   Obj1$.<init>(<console>:2)
   Obj1$lzycompute$1(<console>:1)
   Obj1(<console>:1)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on PR #43695:
URL: https://github.com/apache/spark/pull/43695#issuecomment-1826388775

   > Can we show the impact to the real error message
   
   @cloud-fan I added an example, please, take a look at the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for extra traces in `Origin` [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #43695:
URL: https://github.com/apache/spark/pull/43695#discussion_r1387790724


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4531,6 +4531,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val EXTRA_ORIGIN_TRACES = buildConf("spark.sql.extraOriginTraces")
+    .doc("The number of additional non-Spark SQL traces in the captured DataFrame context. " +
+      "When it is set to 0, captured one Spark traces and a followed non-Spark trace.")
+    .version("4.0.0")
+    .intConf
+    .checkValue(_ >= 0, "The number of extra thread traces must be non-negative.")

Review Comment:
   I feel that:
   default 1: this is consistent with before.
   The number of `extraOriginTraces`  should consistent with the number of non-Spark trace size.
   Then we use `> 0` as the constraint.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for extra traces in `Origin` [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #43695:
URL: https://github.com/apache/spark/pull/43695#issuecomment-1802095817

   Can we put a real example in the PR description?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45826][SQL] Add a SQL config for extra traces in `Origin` [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on code in PR #43695:
URL: https://github.com/apache/spark/pull/43695#discussion_r1387598312


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4531,6 +4531,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val EXTRA_ORIGIN_TRACES = buildConf("spark.sql.extraOriginTraces")
+    .doc("The number of additional non-Spark SQL traces in the captured DataFrame context. " +
+      "When it is set to 0, captured one Spark traces and a followed non-Spark trace.")
+    .version("4.0.0")
+    .intConf
+    .checkValue(_ >= 0, "The number of extra thread traces must be non-negative.")

Review Comment:
   > Intuitively, I feel that 0 should represent the absence of non-Spark trace.
   
   Actually, it works in this way. Let me modify the config and PR description. The `slice` method excludes the `until` index. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org