You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Thomas Prelle (Jira)" <ji...@apache.org> on 2022/12/21 12:58:00 UTC

[jira] [Created] (SPARK-41665) Spark streaming query scheduling synchronisation with Trigger Interval

Thomas Prelle created SPARK-41665:
-------------------------------------

             Summary: Spark streaming query scheduling synchronisation  with Trigger Interval
                 Key: SPARK-41665
                 URL: https://issues.apache.org/jira/browse/SPARK-41665
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.3.1, 3.2.2, 3.1.2, 3.0.3, 2.4.8
            Reporter: Thomas Prelle
         Attachments: image-2022-12-21-07-57-18-679.png, image-2022-12-21-07-57-32-654.png

Hi,
We detect a strange behavior on spark streaming when we set a trigger interval for example at 1 minutes all query will start at 0:00:00 0:01:00 0:02:00 no matter the start time of the query.
So all query are "sync", so it's can disturbed a cluster a cluster i do leads to spike of utilisation 
!image-2022-12-21-07-51-54-157.png!



For me the expected behavior should be like this

!image-2022-12-21-07-53-24-367.png!

 

It's because of this line 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala#L98]

as now in intervalMS are long now / intervalMs * intervalMs will just cut in my case the second, as it's explicitely like this i do not know if it's the expected behavior or it's juste because this line it's here since 6 years. So it's affecting all versions since 6 years. 

Regards

Thomas



 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org