You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/24 21:38:36 UTC

[GitHub] [spark] yeskarthik edited a comment on pull request #35548: [SPARK-38234] [SQL] [SS] Added structured streaming monitoring APIs.

yeskarthik edited a comment on pull request #35548:
URL: https://github.com/apache/spark/pull/35548#issuecomment-1050219393


   @HeartSaVioR thank you for your response. 
   
   - I understand the concern of sending the response as store objects which blocks the evolution of their structures between versions. However, these objects are also part of the existing programmatic monitoring APIs as mentioned here - https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#reading-metrics-interactively. Wouldn't the versioning issue also affect those APIs? Do the contracts apply only for REST APIs and not metrics objects or programmatic API responses?
   - On the other hand, what if we create another object with the same structure so that we do not need to break the API response objects but can evolve the `StreamingQueryData` / `StreamingQueryProcess`
   - We did have similar APIs for DStreams like `/streaming/statistics`, `/streaming/receivers` etc. How was this problem handled there?
   - I'm fine with marking this as Developer API - can you show me a precedence of this for a REST API or did you mean we just mark the objects?
   - There are multiple use cases for these APIs - here are some that I can think of:
     1. First one is real-time monitoring, these APIs can be used to build more sophisticated UIs than the one that is present under  the Structured Streaming tab. For example, we can build a live version of it that plots the data on the client side by frequent refreshes without reloading the page.
     2. The other is to show the plot similar graphs about streaming metrics for a specific streaming query in 'Notebooks' under the query cell to enhance the experience.
     3. The simplest of use case is to detect if a job contains streaming code which might be a variable in optimizing usage of resources for long running applications.
     4. Achieving the same via DropWizard / other alternatives would add the overhead of building one more REST layer and we lose the real-time experience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org