You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jacek Laskowski <ja...@japila.pl> on 2017/10/27 10:40:08 UTC

[SS] Why does StreamingQueryManager.notifyQueryTermination use id and runId (not just id)?

Hi,

I'm wondering why StreamingQueryManager.notifyQueryTermination [1] use a
query id to remove it from the activeQueries internal registry [2] while
notifies stateStoreCoordinator using runId [3]?

My understanding is that id is the same across different runs of a query so
once StreamingQueryManager removes the query (by its id) it effectively
knows nothing about the query yet stateStoreCoordinator may have other
instances running (since we only deactivated a single run).

Why is the "inconsistency"?

[1]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#L325

[2]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#L327

[3]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#L335

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

Re: [SS] Why does StreamingQueryManager.notifyQueryTermination use id and runId (not just id)?

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
stateStoreCoordinator uses runId to deal with a small chance that Spark
cannot turn a bad task down. Please see
https://github.com/apache/spark/pull/18355

On Fri, Oct 27, 2017 at 3:40 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> I'm wondering why StreamingQueryManager.notifyQueryTermination [1] use a
> query id to remove it from the activeQueries internal registry [2] while
> notifies stateStoreCoordinator using runId [3]?
>
> My understanding is that id is the same across different runs of a query
> so once StreamingQueryManager removes the query (by its id) it effectively
> knows nothing about the query yet stateStoreCoordinator may have other
> instances running (since we only deactivated a single run).
>
> Why is the "inconsistency"?
>
> [1] https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#
> L325
>
> [2] https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#
> L327
>
> [3] https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala#
> L335
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>