You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/21 18:29:15 UTC

[GitHub] [spark] warrenzhu25 opened a new pull request #28887: [SPARK-32044][SS] Kakfa continuous processing print mislead initial o…

warrenzhu25 opened a new pull request #28887:
URL: https://github.com/apache/spark/pull/28887


   ### What changes were proposed in this pull request?
   
   Use `Optional.orElseGet` instead of `Optional.orElse` to fix unnecessary kafka offset fetch and misleading info log
   
   ### Why are the changes needed?
   
   Fix mislead initial offsets log and unnecessary kafka offset fetch
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Currently, no test for KafkaContinuousReader. Also it's hard to test log.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647782078


   **[Test build #124370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124370/testReport)** for PR 28887 at commit [`a6000f9`](https://github.com/apache/spark/commit/a6000f96f6306ce97649c74825c6f69d2a6cccf0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647782078


   **[Test build #124370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124370/testReport)** for PR 28887 at commit [`a6000f9`](https://github.com/apache/spark/commit/a6000f96f6306ce97649c74825c6f69d2a6cccf0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28887: [SPARK-32044][SS] Kakfa continuous processing print mislead initial o…

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647164637


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #28887: [SPARK-32044][SS] Kakfa continuous processing print mislead initial o…

Posted by GitBox <gi...@apache.org>.
warrenzhu25 commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647238451


   > Probably, it would be nice to explain the difference between Optional.orElse vs Optional.orElseGet for Java a bit in PR description. It's very confusing in Scala developers as we expect lazy evaluation natively.
   > 
   > And one more, please rebase the target branch to master. We normally receive patches against master branch. If the problem no longer exists in master branch, we may want to find and port back the commit which fixed the issue previously, instead of picking up newer commit.
   
   1. Updated description to explain more about Optional.orElse vs Optional.orElseGet
   2. In master branch, KafkaContinuousReader.scala is no longer existed. This class only in branch-2.4. Any suggestions about how should I do?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647793376


   **[Test build #124370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124370/testReport)** for PR 28887 at commit [`a6000f9`](https://github.com/apache/spark/commit/a6000f96f6306ce97649c74825c6f69d2a6cccf0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647782584






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #28887:
URL: https://github.com/apache/spark/pull/28887


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28887: [SPARK-32044][SS] Kakfa continuous processing print mislead initial o…

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28887:
URL: https://github.com/apache/spark/pull/28887#discussion_r443275238



##########
File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousReader.scala
##########
@@ -70,15 +71,17 @@ class KafkaContinuousReader(
 
   private var offset: Offset = _
   override def setStartOffset(start: ju.Optional[Offset]): Unit = {
-    offset = start.orElse {
-      val offsets = initialOffsets match {
-        case EarliestOffsetRangeLimit => KafkaSourceOffset(offsetReader.fetchEarliestOffsets())
-        case LatestOffsetRangeLimit => KafkaSourceOffset(offsetReader.fetchLatestOffsets(None))
-        case SpecificOffsetRangeLimit(p) => offsetReader.fetchSpecificOffsets(p, reportDataLoss)
+    offset = start.orElseGet(new Supplier[Offset] {

Review comment:
       Nice finding! It's quite confusing on Scala-friendly developers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28887: [SPARK-32044][SS] Kakfa continuous processing print mislead initial o…

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647164761


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647798585


   Merged to `branch-2.4`. SPARK-32044 is assigned to you, @warrenzhu25 .
   Also, thanks, @HeartSaVioR .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28887: [SPARK-32044][SS] Kakfa continuous processing print mislead initial o…

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647164637


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28887: [SPARK-32044][SS] Kakfa continuous processing print mislead initial o…

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647469379


   Oh OK. Let's hear committers' voice here. Before that, can we add `[2.4]` after `[SS]` so that the target version is clear?
   
   cc. @zsxwing 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647782584






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 edited a comment on pull request #28887: [SPARK-32044][SS] Kakfa continuous processing print mislead initial o…

Posted by GitBox <gi...@apache.org>.
warrenzhu25 edited a comment on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647238451


   > Probably, it would be nice to explain the difference between Optional.orElse vs Optional.orElseGet for Java a bit in PR description. It's very confusing in Scala developers as we expect lazy evaluation natively.
   > 
   > And one more, please rebase the target branch to master. We normally receive patches against master branch. If the problem no longer exists in master branch, we may want to find and port back the commit which fixed the issue previously, instead of picking up newer commit.
   
   1. Updated description to explain more about Optional.orElse vs Optional.orElseGet
   2. In master branch, KafkaContinuousReader.scala has been refactored out in e75488718. This file is only in branch-2.4. Any suggestions about how should I do?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647164761


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #28887: [SPARK-32044][SS] Kakfa continuous processing print mislead initial o…

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647209978


   Probably, it would be nice to explain the difference between Optional.orElse vs Optional.orElseGet for Java a bit in PR description. It's very confusing in Scala developers as we expect lazy evaluation natively.
   
   And one more, please rebase the target branch to master. We normally receive patches against master branch. If the problem no longer exists in master branch, we may want to find and port back the commit which fixed the issue previously, instead of picking up newer commit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647793520






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647793520






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28887: [SPARK-32044][SS][2.4] Kakfa continuous processing print mislead initial offset

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28887:
URL: https://github.com/apache/spark/pull/28887#issuecomment-647779925


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org