You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HeartSaVioR <gi...@git.apache.org> on 2018/09/01 23:11:02 UTC

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21721
  
    I spent more hours to take a look at how SQL UI can update the metrics information before task ends, and now I guess I may understand what was the concern from @cloud-fan here.
    
    This is different from how we allow custom metrics in StateStore. Every SQL metrics even custom metrics in StateStore are accumulators, which are taken care of executor heartbeat (Honestly I didn't notice it. My bad) and UI updates these information. Custom metrics in StateStore is only updated when state operation is going to be finished for each partition, but they are exposed to SQL UI anyway and gets updated dynamically in the UI (I meant the values can be updated even for running batch).
    
    With StreamingQueryProgress, we are also exposing information which are only calculated when they're needed, and now it is when finishTrigger is called, so mostly batch ends. Custom metrics in this patch placed here: they're additional information for StreamingQueryProgress, hence intentional to be updated per batch. They're not actually SQL metrics, but its name would lead someone thinking why it doesn't follow SQL metrics. Maybe the name matters?
    
    So there're two desires to add custom information:
    
    1. metrics to be updated in every heartbeat: they will be exposed to SQL UI, and also can be collected and added to StreamingQueryProgress like custom metrics in StateStore.
    2. information to be updated in each batch: they will be exposed to only StreamingQueryProgress.
    
    And the target of the patch is latter.
    
    But we know 2 is only applied to micro-batch, and current StreamingQueryProgress is not suitable for continuous mode because of these reasons: 1. Unless we stop processing or snapshot metrics once epoch ends, metrics can't be correct for specific epoch. 2. Showing the information for latest epoch (which all partitions finished) no longer represents the most recent. 3. Some metrics are expected to be reset per batch, whereas it doesn't happen in continuous mode. If we reset metrics per epoch, metrics in SQL tab in UI will be really looking odd (because it just shows current state of metrics, not bound to epoch).
    
    So IMHO it's likely that StreamingQueryProgress will not be available for continuous mode even afterwards (not only for custom metrics), and we may want to rely on running SQL metrics. That's actually what other streaming frameworks are providing metrics as of now, but they are also showing these metrics as aggregated values in time window or even time-series. Spark doesn't need to have such feature for batch and micro-batch, but in continuous mode, without that feature these SQL metrics will be really hard to see after long run (say 1 month). That's the hard thing when we want to make modes being transparent: the requirements of metrics for batch/micro-batch and continuous mode are just different, and metrics may not be only issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org