You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2016/11/21 02:33:58 UTC
[jira] [Updated] (SPARK-18516) Separate instantaneous state from
progress performance statistics
[ https://issues.apache.org/jira/browse/SPARK-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-18516:
-------------------------------------
Description:
There are two types of information that you want to be able to extract from a running query: instantaneous _status_ and metrics about the performance as make _progress_ in query processing.
Today, these are conflated in a single {{StreamingQueryStatus}} object. The downside to this approach is that a user now needs to reason about what state the query is in anytime they retrieve a status object. Fields like {{statusMessage}} don't appear in updates that come from listener bus. And inputRate/processingRate statistics are usually {{0}} when you retrieve a status object from the query itself.
I propose we make the follow changes:
- Make {{status}} only report instantaneous things, such as if data is available or a human readable message about what phase we are currently in.
- Have a separate {{progress}} message that we report for each trigger with the other performance information that lives in status today. You should be able to easily retrieve a configurable number of the most recent progress messages instead of just the most recent.
While we are making these changes, I propose that we also change {{id}} to be a globally unique identifier, rather than a JVM unique one. Without this its hard to correlate performance across restarts.
was:
There are two types of information that you want to be able to extract from a running query: instantaneous _status_ and metrics about the performance as make _progress_ in query processing.
Today, these are conflated in a single {{StreamingQueryStatus}} object. The downside to this approach is that a user now needs to reason about what state the query is in anytime they retrieve a status object. Fields like {{statusMessage}} don't appear in messages that come from listener bus. And inputRate/processingRate statistics are usually {{0}} when you retrieve a status object from the query itself.
I propose we make the follow changes:
- Make {{status}} only report instantaneous things, such as if data is available or a human readable message about what phase we are currently in.
- Have a separate {{progress}} message that we report for each trigger with the other performance information that lives in status today. You should be able to easily retrieve a configurable number of the most recent progress messages instead of just the most recent.
While we are making these changes, I propose that we also change {{id}} to be a globally unique identifier, rather than a JVM unique one. Without this its hard to correlate performance across restarts.
> Separate instantaneous state from progress performance statistics
> -----------------------------------------------------------------
>
> Key: SPARK-18516
> URL: https://issues.apache.org/jira/browse/SPARK-18516
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Reporter: Michael Armbrust
> Assignee: Michael Armbrust
> Priority: Blocker
>
> There are two types of information that you want to be able to extract from a running query: instantaneous _status_ and metrics about the performance as make _progress_ in query processing.
> Today, these are conflated in a single {{StreamingQueryStatus}} object. The downside to this approach is that a user now needs to reason about what state the query is in anytime they retrieve a status object. Fields like {{statusMessage}} don't appear in updates that come from listener bus. And inputRate/processingRate statistics are usually {{0}} when you retrieve a status object from the query itself.
> I propose we make the follow changes:
> - Make {{status}} only report instantaneous things, such as if data is available or a human readable message about what phase we are currently in.
> - Have a separate {{progress}} message that we report for each trigger with the other performance information that lives in status today. You should be able to easily retrieve a configurable number of the most recent progress messages instead of just the most recent.
> While we are making these changes, I propose that we also change {{id}} to be a globally unique identifier, rather than a JVM unique one. Without this its hard to correlate performance across restarts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org