You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/20 08:52:43 UTC

[GitHub] [spark] Ngone51 opened a new pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Ngone51 opened a new pull request #31600:
URL: https://github.com/apache/spark/pull/31600


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Set the active SparkSession to `sparkSessionForStream`.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   The active session should be `sparkSessionForStream`. Otherwise, settings like
   
   https://github.com/apache/spark/blob/6b34745cb9b294c91cd126c2ea44c039ee83cb84/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L332-L335
   
   wouldn't take effect if callers access them from the active SQLConf, e.g., the rule of `InsertAdaptiveSparkPlan`. Besides, unlike `InsertAdaptiveSparkPlan` (which skips streaming plan), `CostBasedJoinReorder` seems to have the chance to take effect theoretically.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Tested manually. Before the fix, `InsertAdaptiveSparkPlan` would try to apply AQE on the plan(wouldn't take effect though). After this fix, the rule returns directly.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782594509


   **[Test build #135306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135306/testReport)** for PR 31600 at commit [`52a6999`](https://github.com/apache/spark/commit/52a6999d3c602f4968a3f1e2d137e1cb5d25cd4e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-783057670


   Given we assume that the code might have a bug, it may not be safe if we try to answer the question based on current code. Probably we'd need to try to ask and figure about the "historical" information what sparkSessionForStream is for, and why sparkSession should be co-used even after cloning the session.
   
   cc. @mukulmurthy @zsxwing @jose-torres who authored/reviewed SPARK-26586. also @tdas who is likely having context.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787920832


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135587/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787920832


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135587/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787771309


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40168/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786081531


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40052/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782590065


   cc @viirya @xuanyuanking @HeartSaVioR Please take a look, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-783057670


   Given we assume that the code might have a bug, it may not be safe if we try to answer the question based on current code. Probably we'd need to try to ask and figure about the "historical" information what sparkSessionForStream is for, and why sparkSession should be co-used even after cloning the session. Once we figure out and make clear, we would have to document it into StreamExecution for further references.
   
   cc. @mukulmurthy @zsxwing @jose-torres who authored/reviewed SPARK-26586. also @tdas who is likely having context.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782659981


   **[Test build #135306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135306/testReport)** for PR 31600 at commit [`52a6999`](https://github.com/apache/spark/commit/52a6999d3c602f4968a3f1e2d137e1cb5d25cd4e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787919432


   **[Test build #135587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135587/testReport)** for PR 31600 at commit [`ddbeca7`](https://github.com/apache/spark/commit/ddbeca731c1219dbeb96de64840e80b7a4b715c7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787735324


   Looks reasonable to me. I'll merge it within a few days if there is no objection.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-783001883


   > Just wondering, does anyone know the reason the code uses both sparkSession and sparkSessionForStream? Are there any known conditions when we should still use sparkSession and when we should use sparkSessionForStream instead?
   
   
   That's a good question. After taking another look, IIUC, I think `sparkSessionForStream`  is used for  `runActivatedStream()` only and `sparkSession` is used for everything  out of `runActivatedStream()`.  Thus, I think the better fix should be:
   
   ```scala
   sparkSessionForStream.withActive {
    ...
    runActivatedStream(sparkSessionForStream)
    ...
   }
   ```
   
   What do you guys think?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786068019


   **[Test build #135472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135472/testReport)** for PR 31600 at commit [`ddbeca7`](https://github.com/apache/spark/commit/ddbeca731c1219dbeb96de64840e80b7a4b715c7).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786081531


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40052/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31600:
URL: https://github.com/apache/spark/pull/31600#discussion_r579735375



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
##########
@@ -323,7 +323,7 @@ abstract class StreamExecution(
       startLatch.countDown()
 
       // While active, repeatedly attempt to run batches.
-      SparkSession.setActiveSession(sparkSession)
+      SparkSession.setActiveSession(sparkSessionForStream)

Review comment:
       Can we have a test case by using your PR description, @Ngone51 ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787735717


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786071413


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135472/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782594509


   **[Test build #135306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135306/testReport)** for PR 31600 at commit [`52a6999`](https://github.com/apache/spark/commit/52a6999d3c602f4968a3f1e2d137e1cb5d25cd4e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31600:
URL: https://github.com/apache/spark/pull/31600#discussion_r582967118



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
##########
@@ -323,26 +323,28 @@ abstract class StreamExecution(
       startLatch.countDown()
 
       // While active, repeatedly attempt to run batches.
-      SparkSession.setActiveSession(sparkSession)
-
-      updateStatusMessage("Initializing sources")
-      // force initialization of the logical plan so that the sources can be created
-      logicalPlan
-
-      // Adaptive execution can change num shuffle partitions, disallow
-      sparkSessionForStream.conf.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "false")
-      // Disable cost-based join optimization as we do not want stateful operations to be rearranged
-      sparkSessionForStream.conf.set(SQLConf.CBO_ENABLED.key, "false")
-      offsetSeqMetadata = OffsetSeqMetadata(
-        batchWatermarkMs = 0, batchTimestampMs = 0, sparkSessionForStream.conf)
-
-      if (state.compareAndSet(INITIALIZING, ACTIVE)) {
-        // Unblock `awaitInitialization`
-        initializationLatch.countDown()
-        runActivatedStream(sparkSessionForStream)
-        updateStatusMessage("Stopped")
-      } else {
-        // `stop()` is already called. Let `finally` finish the cleanup.
+      sparkSessionForStream.withActive {

Review comment:
       Using `withActive` to keep safe for the `StreamExecution` to avoid using the wrong active session.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #31600:
URL: https://github.com/apache/spark/pull/31600


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782607576


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39886/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782602186


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39886/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787771268


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40168/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31600:
URL: https://github.com/apache/spark/pull/31600#discussion_r579941535



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
##########
@@ -323,7 +323,7 @@ abstract class StreamExecution(
       startLatch.countDown()
 
       // While active, repeatedly attempt to run batches.
-      SparkSession.setActiveSession(sparkSession)
+      SparkSession.setActiveSession(sparkSessionForStream)

Review comment:
       ok, let me try it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782677205


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135306/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-791164369


   thanks all!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-784700619


   > @Ngone51 could you add a unit test for this?
   
   Yeah, I'm trying...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782607576


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39886/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786071413


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135472/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-790727101


   Thank you so much, @Ngone51 and all!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-783057670


   Given we assume that the code might have a bug, it may not be safe if we try to answer the question based on current code. Probably we'd need to try to ask and figure about the "historical" information what sparkSessionForStream is for, and why sparkSession should be co-used even after cloning the session. Once we figure out and make clear, we would have to document it into StreamExecution for further references.
   
   cc. @mukulmurthy @zsxwing @jose-torres who authored/reviewed SPARK-26586. also @tdas who may likely have context.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782598180


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39886/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786061900


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40052/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787771309


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40168/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786045165


   **[Test build #135472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135472/testReport)** for PR 31600 at commit [`ddbeca7`](https://github.com/apache/spark/commit/ddbeca731c1219dbeb96de64840e80b7a4b715c7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786081491


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40052/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782603551


   Just wondering, does anyone know the reason the code uses both `sparkSession` and `sparkSessionForStream`? Are there any known conditions when we should still use `sparkSession` and when we should use `sparkSessionForStream` instead?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-782677205


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135306/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786045165


   **[Test build #135472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135472/testReport)** for PR 31600 at commit [`ddbeca7`](https://github.com/apache/spark/commit/ddbeca731c1219dbeb96de64840e80b7a4b715c7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787752768


   **[Test build #135587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135587/testReport)** for PR 31600 at commit [`ddbeca7`](https://github.com/apache/spark/commit/ddbeca731c1219dbeb96de64840e80b7a4b715c7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787752768


   **[Test build #135587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135587/testReport)** for PR 31600 at commit [`ddbeca7`](https://github.com/apache/spark/commit/ddbeca731c1219dbeb96de64840e80b7a4b715c7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
zsxwing commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-784672695


   I don't remember why we didn't set the active session to `sparkSessionForStream`. Probably just overlook when we introduced the cloned session for streaming queries.
   
   @Ngone51 could you add a unit test for this?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-786017148


   Hi all, I realized that the issue actually only affect the `StreamExecution.logicalPlan`. (I have added the test for it).
   
   While for the `runActivatedStream(sparkSessionForStream)`, the active session has been correctly set to `sparkSessionForStream`.
   
   I have updated the PR, so please take another look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-787767660


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40168/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for StreamExecution.logicalPlan

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-790666373


   thanks, merging to master/3.1!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #31600: [SPARK-34482][SS] Correct the active SparkSession for streaming query

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #31600:
URL: https://github.com/apache/spark/pull/31600#issuecomment-783057670


   Given we assume that the code might have a bug, it may not be safe if we try to answer the question based on current code. Probably we'd need to try to ask and figure about the "historical" information what sparkSessionForStream is for, and why sparkSession should be co-used even after cloning the session. Once we figure out and make clear, we would have to document it into StreamExecution for further references.
   
   cc. @mukulmurthy @zsxwing @jose-torres who authored/reviewed SPARK-26586. also @tdas who may likely having context.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org