You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/16 02:10:10 UTC

[GitHub] [spark] Yaohua628 opened a new pull request, #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Yaohua628 opened a new pull request, #37905:
URL: https://github.com/apache/spark/pull/37905

### What changes were proposed in this pull request?
Streaming metrics report all 0 (`processedRowsPerSecond`, etc) when selecting `_metadata` column. Because the logical plan from the batch and the actual planned logical are mismatched: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala#L348. So, here we cannot find the plan and collect metrics correctly.

This PR fixes this by replacing the initial `LogicalPlan` with the `LogicalPlan` containing the metadata column

### Why are the changes needed?
Bug fix.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing + New UTs

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r972655128


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>
           val hasFileMetadata = output.exists {

Review Comment:
   looking at the code, seems the problem is we resolve the metadata columns in every micro-batch. Shouldn't we only resolve it once?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r972646144


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala:
##########
@@ -524,10 +525,11 @@ class FileMetadataStructSuite extends QueryTest with SharedSparkSession {
         .select("*", "_metadata")
         .writeStream.format("json")
         .option("checkpointLocation", dir.getCanonicalPath + "/target/checkpoint")
+        .trigger(Trigger.Once())

Review Comment:
   nit: Please change to `Trigger.AvailableNow()` and see whether this breaks the test or not. We are going to produce deprecated warning Trigger.Once() from Spark 3.4.0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yaohua628 commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

Yaohua628 commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r973236942


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>

Review Comment:
   Thanks, I initially thought about that, but we need to know the `output` from `StreamingExecutionRelation(source, output, catalogTable)` to resolve `_metadata` right (L591 ~ L593)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yaohua628 commented on pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

Yaohua628 commented on PR #37905:
URL: https://github.com/apache/spark/pull/37905#issuecomment-1251311583

   > There's conflict in branch-3.3. @Yaohua628 Could you please craft a PR for branch-3.3? Thanks in advance!
   
   Done! https://github.com/apache/spark/pull/37932 - Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r972722394


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>
           val hasFileMetadata = output.exists {

Review Comment:
   It will require Source to indicate the request of metadata column and produce the logical plan accordingly when getBatch is called. My understanding is that DSv1 source does not have an interface to receive the information of which columns will be referred in actual query.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r973335538


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>

Review Comment:
   Yeah, you're right. I missed that.
   
   Btw, looks like my change (tagging catalogTable into LogicalRelation) will also fall into this bug. Thanks for fixing this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yaohua628 commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

Yaohua628 commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r973485968


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>

Review Comment:
   Got it - could you share an example? In this case, does that mean the `leaf : source = 1 : N`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yaohua628 commented on pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

Yaohua628 commented on PR #37905:
URL: https://github.com/apache/spark/pull/37905#issuecomment-1248925323

   Hi, @cloud-fan @HeartSaVioR could you please take a look whenever you have a chance? Thanks! Happy weekend!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r973793512


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala:
##########
@@ -565,6 +569,34 @@ class FileMetadataStructSuite extends QueryTest with SharedSparkSession {
             sourceFileMetadata(METADATA_FILE_MODIFICATION_TIME))
         )
       )
+
+      // Verify self-union
+      val streamQuery1 = streamDf.union(streamDf)
+        .writeStream.format("json")
+        .option("checkpointLocation", dir.getCanonicalPath + "/target/checkpoint_union")
+        .trigger(Trigger.AvailableNow())
+        .start(dir.getCanonicalPath + "/target/new-streaming-data-union")
+      streamQuery1.awaitTermination()
+      val df1 = spark.read.format("json")
+        .load(dir.getCanonicalPath + "/target/new-streaming-data-union")
+      // Verify self-union results
+      assert(streamQuery0.lastProgress.numInputRows == 2L)

Review Comment:
   streamQuery1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r972711729


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>

Review Comment:
   While we are here, probably less intrusive change would be moving (L594 ~ L610) to L567.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r973510971


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>

Review Comment:
   The code comment actually doesn't say much and I'm speculating. Let's just try a best effort, self-union and self-join. `df = spark.readStream... -> df.union(df)` / `df = spark.readStream... -> df.join(df)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR closed pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR closed pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`
URL: https://github.com/apache/spark/pull/37905


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r973345153


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>

Review Comment:
   Maybe, we may want to check the case of self-union / self-join to verify we really didn't break things. This works only when this condition is true `leaf : source = 1 : 1` (otherwise we are overwriting the value in map), while the code comment of ProgressReporter tells there are counter cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r972848982


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -553,7 +554,7 @@ class MicroBatchExecution(
     logDebug(s"Running batch $currentBatchId")
 
     // Request unprocessed data from all sources.
-    newData = reportTimeTaken("getBatch") {
+    val mutableNewData = mutable.Map() ++ reportTimeTaken("getBatch") {

Review Comment:
   ```suggestion
       val mutableNewData = mutable.Map.empty ++ reportTimeTaken("getBatch") {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yaohua628 commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

Yaohua628 commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r973339562


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>

Review Comment:
   Np - an unintentional fix :-)
   Thanks for helping!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Yaohua628 commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

Yaohua628 commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r973821514


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala:
##########
@@ -565,6 +569,34 @@ class FileMetadataStructSuite extends QueryTest with SharedSparkSession {
             sourceFileMetadata(METADATA_FILE_MODIFICATION_TIME))
         )
       )
+
+      // Verify self-union
+      val streamQuery1 = streamDf.union(streamDf)
+        .writeStream.format("json")
+        .option("checkpointLocation", dir.getCanonicalPath + "/target/checkpoint_union")
+        .trigger(Trigger.AvailableNow())
+        .start(dir.getCanonicalPath + "/target/new-streaming-data-union")
+      streamQuery1.awaitTermination()
+      val df1 = spark.read.format("json")
+        .load(dir.getCanonicalPath + "/target/new-streaming-data-union")
+      // Verify self-union results
+      assert(streamQuery0.lastProgress.numInputRows == 2L)

Review Comment:
   oops



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r972722394


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>
           val hasFileMetadata = output.exists {

Review Comment:
   This requires Source to indicate the request of metadata column and produce the logical plan accordingly when getBatch is called. My understanding is that DSv1 source does not have an interface to receive the information of which columns will be referred in actual query.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on PR #37905:
URL: https://github.com/apache/spark/pull/37905#issuecomment-1250588730

   There's conflict in branch-3.3. @Yaohua628 Could you please craft a PR for branch-3.3? Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on PR #37905:
URL: https://github.com/apache/spark/pull/37905#issuecomment-1250586556

   Thanks! Merging to master / 3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #37905:
URL: https://github.com/apache/spark/pull/37905#discussion_r972711729


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -590,7 +591,7 @@ class MicroBatchExecution(
     val newBatchesPlan = logicalPlan transform {
       // For v1 sources.
       case StreamingExecutionRelation(source, output, catalogTable) =>
-        newData.get(source).map { dataPlan =>
+        mutableNewData.get(source).map { dataPlan =>

Review Comment:
   While we are here, probably less intrusive change would be moving (L594 ~ L610) to L567. After the change we wouldn't need to make a change to newData here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #37905: [SPARK-40460][SS] Fix streaming metrics when selecting `_metadata`

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on PR #37905:
URL: https://github.com/apache/spark/pull/37905#issuecomment-1249515682

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org