You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/13 05:11:57 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request #27557: [SPARK-30804] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

HeartSaVioR opened a new pull request #27557: [SPARK-30804] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557
 
 
   ### What changes were proposed in this pull request?
   
   This patch adds some log messages to log elapsed time for "compact" operation in FileStreamSourceLog and FileStreamSinkLog (added in CompactibleFileStreamLog) to help investigating the mysterious latency spike during the batch run.
   
   ### Why are the changes needed?
   
   Tracking latency is a critical aspect of streaming query. While "compact" operation may bring nontrivial latency (it's even synchronous, adding all the latency to the batch run), it's not measured and end users have to guess.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   N/A for UT. Manual test with streaming query using file source & file sink.
   
   > grep "for compact batch" <driver log>
   
   ```
   ...
   20/02/12 21:00:59 INFO FileStreamSourceLog: It took 527 ms to load 116003 entries (33948200 bytes) for compact batch 20199.
   20/02/12 21:00:59 INFO FileStreamSourceLog: It took 469 ms to write 116003 entries for compact batch 20199.
   20/02/12 21:01:16 INFO FileStreamSinkLog: It took 9523 ms to load 1010000 entries (368291864 bytes) for compact batch 20199.
   20/02/12 21:01:23 INFO FileStreamSinkLog: It took 6568 ms to write 1010000 entries for compact batch 20199.
   ...
   ```
   
   NOTE: The output may be a bit different from the code, as I used the patch while debugging in Spark 2.4.5 and I have been adjusting the message several times.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589487056
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613137003
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25920/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-614066768
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589333368
 
 
   >> For streaming workloads, latency is the first class consideration.
   
   >  When the query is not running properly.
   
   OK I admit my major experience had been with "low-latency", but even Spark runs with "micro-batch", it doesn't mean latency is not important. The latency is the thing in streaming workload to "define" whether the query is running properly or not. Even Spark had to claim that a micro-batch could run in sub-second because one of major downside for Spark Streaming has been the latency, and continuous processing had to be introduced.
   
   Higher latency doesn't only mean output will be delayed. When you turn on "latestFirst" (with maxFilesPerTrigger, as this case we assume we can't process all the inputs) to start reading from latest files, then the latency on a batch defines the boundary of inputs.
   
   It's a critical aspect which operators should always observe via their monitoring approaches (alerts, time-series DB and dashboard, etc.), and find out what happens when the latency fluctuates a lot. 
   
   > I think it's debug information which helps developers to find out what's the issue and not users (INFO is more like to users in my understanding).
   
   I'm not sure who do you mean by "users". AFAIK, in many cases (not all cases for sure), users = developers = operators.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589486689
 
 
   **[Test build #118736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118736/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589487060
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118736/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585554601
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23093/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585601585
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613854671
 
 
   retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585620598
 
 
   **[Test build #118345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118345/testReport)** for PR 27557 at commit [`619eee3`](https://github.com/apache/spark/commit/619eee3eb656aff82b5ab72ea70597a3e19e95a5).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589487056
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585554658
 
 
   cc. @tdas @zsxwing @gaborgsomogyi 
   
   The output I added simply represents the necessity to add the log message. That came with reading & writing against "local" filesystem.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613211357
 
 
   **[Test build #121232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121232/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-588279939
 
 
   I think the information which prints out is not necessary for the users but it's useful debug information. I've double checked the printout is not coming often but I would lower the severity to debug.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585601585
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585757871
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613211839
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589656746
 
 
   > is it possible to add the load and write time in the UI chart?
   
   This is from specific data source so we may need to have a way to generalize. I roughly remember that we deferred to support custom metrics on DSv2 (except `observe` which is for users). Once we find time to design the custom metrics, maybe we can represent it to the UI then.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613136994
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613136591
 
 
   **[Test build #121232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121232/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613915300
 
 
   **[Test build #121304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121304/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613926884
 
 
   **[Test build #121315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121315/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613915543
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585757880
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118345/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613926884
 
 
   **[Test build #121315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121315/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613927300
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25999/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-614065435
 
 
   **[Test build #121315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121315/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613925972
 
 
   retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589410166
 
 
   **[Test build #118736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118736/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585601417
 
 
   **[Test build #118336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118336/testReport)** for PR 27557 at commit [`619eee3`](https://github.com/apache/spark/commit/619eee3eb656aff82b5ab72ea70597a3e19e95a5).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613856263
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585615663
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-614066768
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-614066782
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121315/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#discussion_r382560001
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
 ##########
 @@ -177,16 +178,35 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag](
    * corresponding `batchId` file. It will delete expired files as well if enabled.
    */
   private def compact(batchId: Long, logs: Array[T]): Boolean = {
-    val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
-    val allLogs = validBatches.flatMap { id =>
-      super.get(id).getOrElse {
-        throw new IllegalStateException(
-          s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
-            s"(compactInterval: $compactInterval)")
-      }
-    } ++ logs
+    val (allLogs, loadElapsedMs) = Utils.timeTakenMs {
+      val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
+      validBatches.flatMap { id =>
+        super.get(id).getOrElse {
+          throw new IllegalStateException(
+            s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
+              s"(compactInterval: $compactInterval)")
+        }
+      } ++ logs
+    }
+    val compactedLogs = compactLogs(allLogs)
+
     // Return false as there is another writer.
-    super.add(batchId, compactLogs(allLogs).toArray)
+    val (writeSucceed, writeElapsedMs) = Utils.timeTakenMs {
+      super.add(batchId, compactedLogs.toArray)
+    }
+
+    val elapsedMs = loadElapsedMs + writeElapsedMs
+    if (elapsedMs >= COMPACT_LATENCY_WARN_THRESHOLD_MS) {
+      logWarning(s"Compacting took $elapsedMs ms (load: $loadElapsedMs ms," +
 
 Review comment:
   nit: seems these two logs could combine into one? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589409973
 
 
   While I still think the latency information is the important aspect, I agree the information about size of logs (length, memory usage) is not a thing to be interested unless there's an issue on latency.
   
   I've adjusted the severity of log message as WARN and DEBUG, and differentiate the information to provide according to the severity.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585615669
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23102/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#discussion_r378654769
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
 ##########
 @@ -177,16 +178,29 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag](
    * corresponding `batchId` file. It will delete expired files as well if enabled.
    */
   private def compact(batchId: Long, logs: Array[T]): Boolean = {
-    val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
-    val allLogs = validBatches.flatMap { id =>
-      super.get(id).getOrElse {
-        throw new IllegalStateException(
-          s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
-            s"(compactInterval: $compactInterval)")
-      }
-    } ++ logs
+    val (allLogs, loadElapsedMs) = Utils.timeTakenMs {
+      val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
+      validBatches.flatMap { id =>
+        super.get(id).getOrElse {
+          throw new IllegalStateException(
+            s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
+              s"(compactInterval: $compactInterval)")
+        }
+      } ++ logs
+    }
+    logInfo(s"It took $loadElapsedMs ms to load ${allLogs.size} entries " +
 
 Review comment:
   Personally, setting this to INFO won't bother much as it will be only printed per compact interval. InMemoryFileIndex prints the latency information to seek files via INFO which makes sense and it can be reflected here as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585554595
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585554595
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585615669
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23102/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585612635
 
 
   Retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613136994
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585757871
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585554658
 
 
   cc. @tdas @zsxwing @gaborgsomogyi 
   
   The output I added simply represents the necessity to add the log message.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585556311
 
 
   **[Test build #118336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118336/testReport)** for PR 27557 at commit [`619eee3`](https://github.com/apache/spark/commit/619eee3eb656aff82b5ab72ea70597a3e19e95a5).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589333368
 
 
   >> For streaming workloads, latency is the first class consideration.
   >  When the query is not running properly.
   
   OK I admit my major experience had been with "low-latency", but even Spark runs with "micro-batch", it doesn't mean latency is not important. The latency is the thing in streaming workload to "define" whether the query is running properly or not. Even Spark had to claim that a micro-batch could run in sub-second because one of major downside for Spark Streaming has been the latency, and continuous processing had to be introduced.
   
   Higher latency doesn't only mean output will be delayed. When you turn on "latestFirst" (with maxFilesPerTrigger, as this case we assume we can't process all the inputs) to start reading from latest files, then the latency on a batch defines the boundary of inputs.
   
   It's a critical aspect which operators should always observe via their monitoring approaches (alerts, time-series DB and dashboard, etc.), and find out what happens when the latency fluctuates a lot. 
   
   > I think it's debug information which helps developers to find out what's the issue and not users (INFO is more like to users in my understanding).
   
   I'm not sure who do you mean by "users". AFAIK, in many cases (not all cases for sure), users = developers = operators.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585615663
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589410728
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23487/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585757063
 
 
   **[Test build #118345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118345/testReport)** for PR 27557 at commit [`619eee3`](https://github.com/apache/spark/commit/619eee3eb656aff82b5ab72ea70597a3e19e95a5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589487060
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118736/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589410707
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#discussion_r382586700
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
 ##########
 @@ -177,16 +178,35 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag](
    * corresponding `batchId` file. It will delete expired files as well if enabled.
    */
   private def compact(batchId: Long, logs: Array[T]): Boolean = {
-    val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
-    val allLogs = validBatches.flatMap { id =>
-      super.get(id).getOrElse {
-        throw new IllegalStateException(
-          s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
-            s"(compactInterval: $compactInterval)")
-      }
-    } ++ logs
+    val (allLogs, loadElapsedMs) = Utils.timeTakenMs {
+      val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
+      validBatches.flatMap { id =>
+        super.get(id).getOrElse {
+          throw new IllegalStateException(
+            s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
+              s"(compactInterval: $compactInterval)")
+        }
+      } ++ logs
+    }
+    val compactedLogs = compactLogs(allLogs)
+
     // Return false as there is another writer.
-    super.add(batchId, compactLogs(allLogs).toArray)
+    val (writeSucceed, writeElapsedMs) = Utils.timeTakenMs {
+      super.add(batchId, compactedLogs.toArray)
+    }
+
+    val elapsedMs = loadElapsedMs + writeElapsedMs
+    if (elapsedMs >= COMPACT_LATENCY_WARN_THRESHOLD_MS) {
+      logWarning(s"Compacting took $elapsedMs ms (load: $loadElapsedMs ms," +
 
 Review comment:
   It was actually one line and I split it because I felt it's a bit long to have it one-liner, as well as message of second line is only for WARN level.
   But if it helps to correlate I would do. Let's have more voices on this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613855699
 
 
   **[Test build #121304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121304/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613856263
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613856275
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25988/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613211841
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121232/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-588592849
 
 
   > I think the information which prints out is not necessary for the users 
   
   I'm not sure I can agree with. The information is pretty much similar with what InMemoryFileIndex provides the information for listing leaf files in InMemoryFileIndex, which level is set to INFO if I remember correctly.
   
   For streaming workloads, latency is the first class consideration. End users would have no idea why the overall latency suddenly increases per N batches unless they know about the details of metadata on FileStreamSource / FileStreamSink. This is completely different user experience they would experience with Kafka streaming source and sink - they may struggle to find the root cause from another spots like their query or so.
   
   But I'd agree that the information may not be necessary for the users if the latency being added here is not considerable. We could set a threshold (like 1s or 2s?) and only print when the latency exceeds the threshold (still print it with DEBUG level even it doesn't reach threshold), but then that would deserve to have higher severity, WARN.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585556311
 
 
   **[Test build #118336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118336/testReport)** for PR 27557 at commit [`619eee3`](https://github.com/apache/spark/commit/619eee3eb656aff82b5ab72ea70597a3e19e95a5).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613927300
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25999/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589410728
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23487/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-614066782
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121315/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585620598
 
 
   **[Test build #118345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118345/testReport)** for PR 27557 at commit [`619eee3`](https://github.com/apache/spark/commit/619eee3eb656aff82b5ab72ea70597a3e19e95a5).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589409973
 
 
   While I still think the latency information is important aspect, I agree the information about size of logs (length, memory usage) is not a thing to be interested unless there's an issue on latency.
   
   I've adjusted the severity of log message as WARN and DEBUG, and differentiate the information to provide according to the severity.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-588592849
 
 
   > I think the information which prints out is not necessary for the users 
   
   I'm not sure I can agree with. The information is pretty much similar with what InMemoryFileIndex provides the information for listing leaf files in InMemoryFileIndex.
   
   For streaming workloads, latency is the first class consideration. End users would have no idea why the overall latency suddenly increases per N batches unless they know about the details of metadata on FileStreamSource / FileStreamSink. This is completely different user experience they would experience with Kafka streaming source and sink - they may struggle to find the root cause from another spots like their query or so.
   
   But I'd agree that the information may not be necessary for the users if the latency being added here is not considerable. We could set a threshold (like 1s or 2s?) and only print when the latency exceeds the threshold (still print it with DEBUG level even it doesn't reach threshold), but then that would deserve to have higher severity, WARN.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589171432
 
 
   > For streaming workloads, latency is the first class consideration.
   
   When the query is not running properly. Sending it all the time which increase the log file size is different. I think it's debug information which helps developers to find out what's the issue and not users (`INFO` is more like to users in my understanding).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613211841
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121232/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613137003
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25920/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613915543
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585601591
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118336/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613211839
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613927286
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613915554
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121304/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613135216
 
 
   retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613856275
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25988/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613927286
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589410707
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#discussion_r378655218
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
 ##########
 @@ -177,16 +178,29 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag](
    * corresponding `batchId` file. It will delete expired files as well if enabled.
    */
   private def compact(batchId: Long, logs: Array[T]): Boolean = {
-    val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
-    val allLogs = validBatches.flatMap { id =>
-      super.get(id).getOrElse {
-        throw new IllegalStateException(
-          s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
-            s"(compactInterval: $compactInterval)")
-      }
-    } ++ logs
+    val (allLogs, loadElapsedMs) = Utils.timeTakenMs {
+      val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
+      validBatches.flatMap { id =>
+        super.get(id).getOrElse {
+          throw new IllegalStateException(
+            s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
+              s"(compactInterval: $compactInterval)")
+        }
+      } ++ logs
+    }
+    logInfo(s"It took $loadElapsedMs ms to load ${allLogs.size} entries " +
+      s"(${SizeEstimator.estimate(allLogs)} bytes in memory) for compact batch $batchId.")
 
 Review comment:
   I've added estimated memory usage as well, because this is adding the memory usage in "driver".
   (https://issues.apache.org/jira/browse/SPARK-30462)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585554601
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23093/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#discussion_r378655218
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
 ##########
 @@ -177,16 +178,29 @@ abstract class CompactibleFileStreamLog[T <: AnyRef : ClassTag](
    * corresponding `batchId` file. It will delete expired files as well if enabled.
    */
   private def compact(batchId: Long, logs: Array[T]): Boolean = {
-    val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
-    val allLogs = validBatches.flatMap { id =>
-      super.get(id).getOrElse {
-        throw new IllegalStateException(
-          s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
-            s"(compactInterval: $compactInterval)")
-      }
-    } ++ logs
+    val (allLogs, loadElapsedMs) = Utils.timeTakenMs {
+      val validBatches = getValidBatchesBeforeCompactionBatch(batchId, compactInterval)
+      validBatches.flatMap { id =>
+        super.get(id).getOrElse {
+          throw new IllegalStateException(
+            s"${batchIdToPath(id)} doesn't exist when compacting batch $batchId " +
+              s"(compactInterval: $compactInterval)")
+        }
+      } ++ logs
+    }
+    logInfo(s"It took $loadElapsedMs ms to load ${allLogs.size} entries " +
+      s"(${SizeEstimator.estimate(allLogs)} bytes in memory) for compact batch $batchId.")
 
 Review comment:
   I've added estimated memory usage as well, because this is adding the memory usage in "driver", which is easily thought as not too much memory is needed.
   (https://issues.apache.org/jira/browse/SPARK-30462)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585757880
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118345/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613136591
 
 
   **[Test build #121232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121232/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589333368
 
 
   >> For streaming workloads, latency is the first class consideration.
   >  When the query is not running properly.
   
   OK I admit my major experience had been with "low-latency", but even Spark runs with "micro-batch", it doesn't mean latency is not important. The latency is the thing in streaming workload to "define" whether the query is running properly or not. Even Spark had to claim that a micro-batch could run in sub-second because one of major downside for Spark Streaming has been the latency, and continuous processing had to be introduced.
   
   Higher latency doesn't only mean output will be late. When you turn on "latestFirst" (with maxFilesPerTrigger, as this case we assume we can't process all the inputs) to start reading from latest files, then the latency on a batch defines the boundary of inputs.
   
   It's a critical aspect which operators should always observe via their monitoring approaches (alerts, time-series DB and dashboard, etc.), and find out what happens when the latency fluctuates a lot. 
   
   > I think it's debug information which helps developers to find out what's the issue and not users (INFO is more like to users in my understanding).
   
   I'm not sure who do you mean by "users". AFAIK, in many cases (not all cases for sure), users = developers = operators.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613855699
 
 
   **[Test build #121304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121304/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589410166
 
 
   **[Test build #118736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118736/testReport)** for PR 27557 at commit [`b3b2e76`](https://github.com/apache/spark/commit/b3b2e763cf395e0579cf53820950c7ee55ea18a3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-613915554
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121304/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-585601591
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118336/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org