You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/21 08:24:01 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

HeartSaVioR opened a new pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664
 
 
   ### What changes were proposed in this pull request?
   
   This patch adds the new method `getLatestBatchId()` in CompactibleFileStreamLog in complement of getLatest() which doesn't read the content of the latest batch metadata log file, and apply to `FileStreamSink.addBatch()` to avoid unnecessary latency on reading log file.
   
   ### Why are the changes needed?
   
   Once compacted metadata log file becomes huge, writing outputs for the compact + 1 batch is also affected due to unnecessarily reading the compacted metadata log file. This unnecessary latency can be simply avoided.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT. Also manually tested under query which has huge metadata log on file stream sink:
   
   > before applying the patch
   
   ![Screen Shot 2020-02-21 at 4 20 19 PM](https://user-images.githubusercontent.com/1317309/75016223-d3ffb180-54cd-11ea-9063-49405943049d.png)
   
   > after applying the patch
   
   ![Screen Shot 2020-02-21 at 4 06 18 PM](https://user-images.githubusercontent.com/1317309/75016220-d235ee00-54cd-11ea-81a7-7c03a43c4db4.png)
   
   Peaks are compact batches - please compare the next batch after compact batches.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r382452944
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     ))
   }
 
+  test("getLatestBatchId") {
 
 Review comment:
   I don't add E2E test to simplify the test code, but if we prefer E2E than I'll try to add a new test to FileStreamSinkSuite.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608267829
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608267509
 
 
   **[Test build #120748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120748/testReport)** for PR 27664 at commit [`d270961`](https://github.com/apache/spark/commit/d270961519a4af5a9f8fa390125c567f56c07700).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589550926
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589550931
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23518/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615517890
 
 
   **[Test build #121434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121434/testReport)** for PR 27664 at commit [`ff9078e`](https://github.com/apache/spark/commit/ff9078e11b86311ba4cda4e63c46920057028200).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608408700
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120764/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589655625
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118766/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589550336
 
 
   **[Test build #118766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118766/testReport)** for PR 27664 at commit [`0cd8cda`](https://github.com/apache/spark/commit/0cd8cda68e5371720521feb9d4adcdefc32b8aed).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615574196
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589550931
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23518/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613136970
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608387267
 
 
   **[Test build #120772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120772/testReport)** for PR 27664 at commit [`30338fb`](https://github.com/apache/spark/commit/30338fb49dbe039fc264f4f5a867d5b7bbd7f711).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613856105
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25985/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608408692
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613135438
 
 
   retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615298677
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613855674
 
 
   **[Test build #121301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121301/testReport)** for PR 27664 at commit [`30338fb`](https://github.com/apache/spark/commit/30338fb49dbe039fc264f4f5a867d5b7bbd7f711).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r410212174
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
 ##########
 @@ -142,7 +142,7 @@ class FileStreamSink(
   }
 
   override def addBatch(batchId: Long, data: DataFrame): Unit = {
-    if (batchId <= fileLog.getLatest().map(_._1).getOrElse(-1L)) {
 
 Review comment:
   Nice catch!
   
   https://github.com/apache/spark/blob/7ad6ba36f28b7a5ca548950dec6afcd61e5d68b9/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L196
   
   https://github.com/apache/spark/blob/7ad6ba36f28b7a5ca548950dec6afcd61e5d68b9/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSourceLog.scala#L99
   
   Can these two places also be optimized in this way?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589655618
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589655618
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615298685
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26099/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608387762
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608211443
 
 
   **[Test build #120748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120748/testReport)** for PR 27664 at commit [`d270961`](https://github.com/apache/spark/commit/d270961519a4af5a9f8fa390125c567f56c07700).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613210910
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121229/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613136970
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613136978
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25917/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589550926
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615574196
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613136554
 
 
   **[Test build #121229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121229/testReport)** for PR 27664 at commit [`30338fb`](https://github.com/apache/spark/commit/30338fb49dbe039fc264f4f5a867d5b7bbd7f711).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615302175
 
 
   **[Test build #121418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121418/testReport)** for PR 27664 at commit [`ff9078e`](https://github.com/apache/spark/commit/ff9078e11b86311ba4cda4e63c46920057028200).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615332051
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121418/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608211443
 
 
   **[Test build #120748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120748/testReport)** for PR 27664 at commit [`d270961`](https://github.com/apache/spark/commit/d270961519a4af5a9f8fa390125c567f56c07700).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613999419
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613999419
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608531764
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120772/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608267829
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608531755
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615298685
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26099/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589654656
 
 
   **[Test build #118766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118766/testReport)** for PR 27664 at commit [`0cd8cda`](https://github.com/apache/spark/commit/0cd8cda68e5371720521feb9d4adcdefc32b8aed).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608408700
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120764/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615517890
 
 
   **[Test build #121434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121434/testReport)** for PR 27664 at commit [`ff9078e`](https://github.com/apache/spark/commit/ff9078e11b86311ba4cda4e63c46920057028200).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608211752
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25447/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613999423
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121301/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589550336
 
 
   **[Test build #118766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118766/testReport)** for PR 27664 at commit [`0cd8cda`](https://github.com/apache/spark/commit/0cd8cda68e5371720521feb9d4adcdefc32b8aed).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608282062
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615518136
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26118/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613210903
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613856105
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25985/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613210903
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615518130
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608387762
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608387267
 
 
   **[Test build #120772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120772/testReport)** for PR 27664 at commit [`30338fb`](https://github.com/apache/spark/commit/30338fb49dbe039fc264f4f5a867d5b7bbd7f711).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402800750
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -240,6 +247,44 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     ))
   }
 
+  test("getLatestBatchId") {
+    withCountOpenLocalFileSystemAsLocalFileSystem {
+      val scheme = CountOpenLocalFileSystem.scheme
+      withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+        withTempDir { dir =>
+          val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+            s"$scheme:///${dir.getCanonicalPath}")
+          for (batchId <- 0 to 2) {
+            sinkLog.add(
+              batchId,
+              Array(newFakeSinkFileStatus("/a/b/" + batchId, FileStreamSinkLog.ADD_ACTION)))
+          }
+
+          def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+            val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+            CountOpenLocalFileSystem.pathToNumOpenCalled.get(path).map(_.get()).getOrElse(0)
+          }
+
+          CountOpenLocalFileSystem.resetCount()
+
+          assert(sinkLog.getLatestBatchId() === Some(2L))
+          // getLatestBatchId doesn't open the latest metadata log file
+          (0L to 2L).foreach { batchId =>
+            assert(getCountForOpenOnMetadataFile(batchId) === 0)
 
 Review comment:
   Nit: Just to be consistent with the other parts `s/0/0L`. Same applies to the other 2 occurrence...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608211741
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615573487
 
 
   **[Test build #121434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121434/testReport)** for PR 27664 at commit [`ff9078e`](https://github.com/apache/spark/commit/ff9078e11b86311ba4cda4e63c46920057028200).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402369944
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     ))
   }
 
+  test("getLatestBatchId") {
+    withCountOpenLocalFileSystemAsLocalFileSystem {
+      val scheme = CountOpenLocalFileSystem.scheme
+      withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+        withTempDir { file =>
+          val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+            s"$scheme:///${file.getCanonicalPath}")
+          for (batchId <- 0 to 2) {
+            sinkLog.add(
+              batchId,
+              Array(newFakeSinkFileStatus("/a/b/" + batchId, FileStreamSinkLog.ADD_ACTION)))
+          }
+
+          def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+            val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+            CountOpenLocalFileSystem.pathToNumOpenCalled
+              .get(path).map(_.get()).getOrElse(0)
+          }
+
+          val curCount = getCountForOpenOnMetadataFile(2)
 
 Review comment:
   Nit: s/2/2L/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608267836
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120748/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608387768
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25470/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589556837
 
 
   cc. @tdas @zsxwing @gaborgsomogyi

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615517141
 
 
   retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615574201
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121434/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608284930
 
 
   **[Test build #120764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120764/testReport)** for PR 27664 at commit [`d270961`](https://github.com/apache/spark/commit/d270961519a4af5a9f8fa390125c567f56c07700).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615298677
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613852939
 
 
   retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608211346
 
 
   Thanks for reviewing! Reflect review comments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608211346
 
 
   Thanks for reviewing! Reflected review comments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608530333
 
 
   **[Test build #120772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120772/testReport)** for PR 27664 at commit [`30338fb`](https://github.com/apache/spark/commit/30338fb49dbe039fc264f4f5a867d5b7bbd7f711).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613136978
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25917/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402362508
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     ))
   }
 
+  test("getLatestBatchId") {
+    withCountOpenLocalFileSystemAsLocalFileSystem {
+      val scheme = CountOpenLocalFileSystem.scheme
+      withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+        withTempDir { file =>
 
 Review comment:
   If it's a dir maybe we can call it `dir` or `path`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615518130
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608531755
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613210910
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121229/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r410242868
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
 ##########
 @@ -142,7 +142,7 @@ class FileStreamSink(
   }
 
   override def addBatch(batchId: Long, data: DataFrame): Unit = {
-    if (batchId <= fileLog.getLatest().map(_._1).getOrElse(-1L)) {
 
 Review comment:
   Yeah I think so. Nice finding. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613136554
 
 
   **[Test build #121229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121229/testReport)** for PR 27664 at commit [`30338fb`](https://github.com/apache/spark/commit/30338fb49dbe039fc264f4f5a867d5b7bbd7f711).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608407849
 
 
   **[Test build #120764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120764/testReport)** for PR 27664 at commit [`d270961`](https://github.com/apache/spark/commit/d270961519a4af5a9f8fa390125c567f56c07700).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613998546
 
 
   **[Test build #121301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121301/testReport)** for PR 27664 at commit [`30338fb`](https://github.com/apache/spark/commit/30338fb49dbe039fc264f4f5a867d5b7bbd7f711).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402943806
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -240,6 +247,44 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     ))
   }
 
+  test("getLatestBatchId") {
+    withCountOpenLocalFileSystemAsLocalFileSystem {
+      val scheme = CountOpenLocalFileSystem.scheme
+      withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+        withTempDir { dir =>
+          val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+            s"$scheme:///${dir.getCanonicalPath}")
+          for (batchId <- 0 to 2) {
+            sinkLog.add(
+              batchId,
+              Array(newFakeSinkFileStatus("/a/b/" + batchId, FileStreamSinkLog.ADD_ACTION)))
+          }
+
+          def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+            val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+            CountOpenLocalFileSystem.pathToNumOpenCalled.get(path).map(_.get()).getOrElse(0)
+          }
+
+          CountOpenLocalFileSystem.resetCount()
+
+          assert(sinkLog.getLatestBatchId() === Some(2L))
+          // getLatestBatchId doesn't open the latest metadata log file
+          (0L to 2L).foreach { batchId =>
+            assert(getCountForOpenOnMetadataFile(batchId) === 0)
 
 Review comment:
   Just replaced all constants where the type is Long. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613856090
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402362153
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     ))
   }
 
+  test("getLatestBatchId") {
+    withCountOpenLocalFileSystemAsLocalFileSystem {
+      val scheme = CountOpenLocalFileSystem.scheme
+      withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+        withTempDir { file =>
+          val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+            s"$scheme:///${file.getCanonicalPath}")
+          for (batchId <- 0 to 2) {
+            sinkLog.add(
+              batchId,
+              Array(newFakeSinkFileStatus("/a/b/" + batchId, FileStreamSinkLog.ADD_ACTION)))
+          }
+
+          def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+            val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+            CountOpenLocalFileSystem.pathToNumOpenCalled
+              .get(path).map(_.get()).getOrElse(0)
 
 Review comment:
   Nit: no linebreak needed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608211752
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25447/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615302175
 
 
   **[Test build #121418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121418/testReport)** for PR 27664 at commit [`ff9078e`](https://github.com/apache/spark/commit/ff9078e11b86311ba4cda4e63c46920057028200).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608284930
 
 
   **[Test build #120764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120764/testReport)** for PR 27664 at commit [`d270961`](https://github.com/apache/spark/commit/d270961519a4af5a9f8fa390125c567f56c07700).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613210452
 
 
   **[Test build #121229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121229/testReport)** for PR 27664 at commit [`30338fb`](https://github.com/apache/spark/commit/30338fb49dbe039fc264f4f5a867d5b7bbd7f711).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615332038
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615574201
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121434/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608531764
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120772/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608267836
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120748/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608387768
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25470/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-589655625
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118766/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613856090
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608408692
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r382452367
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -267,4 +308,38 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     val log = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark, input.toString)
     log.allFiles()
   }
+
+  private def withCountOpenLocalFileSystemAsLocalFileSystem(body: => Unit): Unit = {
 
 Review comment:
   The code regarding FileSystem I add here is very similar with what I add in #27620. When either one gets merged, I'll rebase and deduplicate it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615331950
 
 
   **[Test build #121418 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121418/testReport)** for PR 27664 at commit [`ff9078e`](https://github.com/apache/spark/commit/ff9078e11b86311ba4cda4e63c46920057028200).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402386522
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -267,4 +308,38 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     val log = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark, input.toString)
     log.allFiles()
   }
+
+  private def withCountOpenLocalFileSystemAsLocalFileSystem(body: => Unit): Unit = {
+    val optionKey = s"fs.${CountOpenLocalFileSystem.scheme}.impl"
+    val originClassForLocalFileSystem = spark.conf.getOption(optionKey)
+    try {
+      spark.conf.set(optionKey, classOf[CountOpenLocalFileSystem].getName)
+      body
+    } finally {
+      originClassForLocalFileSystem match {
+        case Some(fsClazz) => spark.conf.set(optionKey, fsClazz)
+        case _ => spark.conf.unset(optionKey)
+      }
+    }
+  }
+}
+
+class CountOpenLocalFileSystem extends RawLocalFileSystem {
+  import CountOpenLocalFileSystem._
+
+  override def getUri: URI = {
+    URI.create(s"$scheme:///")
+  }
+
+  override def open(f: Path, bufferSize: Int): FSDataInputStream = {
+    val path = f.toUri.getPath
+    val curVal = pathToNumOpenCalled.getOrElseUpdate(path, new AtomicLong(0))
+    curVal.incrementAndGet()
+    super.open(f, bufferSize)
+  }
+}
+
+object CountOpenLocalFileSystem {
+  val scheme = s"FileStreamSinkLogSuite${math.abs(Random.nextInt)}fs"
+  val pathToNumOpenCalled = new mutable.HashMap[String, AtomicLong]
 
 Review comment:
   Some reset functionality would be good to make it re-usable. This would also make `curCount` disappear.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613999423
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121301/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402369581
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     ))
   }
 
+  test("getLatestBatchId") {
+    withCountOpenLocalFileSystemAsLocalFileSystem {
+      val scheme = CountOpenLocalFileSystem.scheme
+      withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+        withTempDir { file =>
+          val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+            s"$scheme:///${file.getCanonicalPath}")
+          for (batchId <- 0 to 2) {
+            sinkLog.add(
+              batchId,
+              Array(newFakeSinkFileStatus("/a/b/" + batchId, FileStreamSinkLog.ADD_ACTION)))
+          }
+
+          def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+            val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+            CountOpenLocalFileSystem.pathToNumOpenCalled
+              .get(path).map(_.get()).getOrElse(0)
+          }
+
+          val curCount = getCountForOpenOnMetadataFile(2)
+
+          assert(sinkLog.getLatestBatchId() === Some(2))
 
 Review comment:
   Nit: s/2/2L/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r402388241
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLogSuite.scala
 ##########
 @@ -240,6 +247,40 @@ class FileStreamSinkLogSuite extends SparkFunSuite with SharedSparkSession {
     ))
   }
 
+  test("getLatestBatchId") {
+    withCountOpenLocalFileSystemAsLocalFileSystem {
+      val scheme = CountOpenLocalFileSystem.scheme
+      withSQLConf(SQLConf.FILE_SINK_LOG_COMPACT_INTERVAL.key -> "3") {
+        withTempDir { file =>
+          val sinkLog = new FileStreamSinkLog(FileStreamSinkLog.VERSION, spark,
+            s"$scheme:///${file.getCanonicalPath}")
+          for (batchId <- 0 to 2) {
+            sinkLog.add(
+              batchId,
+              Array(newFakeSinkFileStatus("/a/b/" + batchId, FileStreamSinkLog.ADD_ACTION)))
+          }
+
+          def getCountForOpenOnMetadataFile(batchId: Long): Long = {
+            val path = sinkLog.batchIdToPath(batchId).toUri.getPath
+            CountOpenLocalFileSystem.pathToNumOpenCalled
+              .get(path).map(_.get()).getOrElse(0)
+          }
+
+          val curCount = getCountForOpenOnMetadataFile(2)
+
+          assert(sinkLog.getLatestBatchId() === Some(2))
+          // getLatestBatchId doesn't open the latest metadata log file
+          assert(getCountForOpenOnMetadataFile(2L) === curCount)
 
 Review comment:
   Maybe worth to check other batches as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608282010
 
 
   Seems unrelated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#discussion_r382451768
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
 ##########
 @@ -162,6 +162,26 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag](
     batchAdded
   }
 
+  /**
+   * Return the latest batch Id.
+   *
+   * This method is a complement of getLatest() - while metadata log file per batch tends to be
+   * small, it doesn't apply to the compacted log file. This method only checks for existence of
+   * file to avoid huge cost on reading and deserializing compacted log file.
+   */
+  def getLatestBatchId(): Option[Long] = {
+    val batchIds = fileManager.list(metadataPath, batchFilesFilter)
+      .map(f => pathToBatchId(f.getPath))
+      .sorted(Ordering.Long.reverse)
+    for (batchId <- batchIds) {
 
 Review comment:
   I just simply remove reading file here, but as we already get batch IDs from "listing" files, it may not even need to check for existence. It won't be the outstanding latency, though.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615332051
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121418/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615518136
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26118/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-615332038
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-608211741
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27664: [SPARK-30915][SS] FileStreamSink: Avoid reading the metadata log file when finding the latest batch ID
URL: https://github.com/apache/spark/pull/27664#issuecomment-613855674
 
 
   **[Test build #121301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121301/testReport)** for PR 27664 at commit [`30338fb`](https://github.com/apache/spark/commit/30338fb49dbe039fc264f4f5a867d5b7bbd7f711).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org