You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/28 21:51:23 UTC

[GitHub] [spark] HeartSaVioR commented on pull request #31944: [SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay.

HeartSaVioR commented on pull request #31944:
URL: https://github.com/apache/spark/pull/31944#issuecomment-828804207


   > > > I've tested it on real cluster and works fine.
   > > > Just a question. How this it intended to use for dynamic allocation?
   > > 
   > > Users can implement this interface in their customized SparkDataStream and know how far falling behind through the progress listener. Maybe this can provide more useful information to guide/trigger the auto scaling.
   
   > This is a valid user-case. But my question is that current offsets in `SourceProgress` should already provide the information the use-case needs (consumed offset, available offset).
   
   That is what understand as well - that is just a matter of "where" we want to put calculation.
   
   I have mixed feeling of this as:
   
   1) If the target persona is human, then I'd rather not let them calculate by themselves. It should be helpful to let Spark calculate and provide the information instead.
   
   2) If the target persona is a "process" (maybe Spark driver or some external app?), then it should not be that hard to calculate by itself.
   
   Not sure which is the actual use case for this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org