You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Abhishek Gupta <ab...@gmail.com> on 2021/02/11 12:56:33 UTC

Trigger on GroupStateTimeout with no new data in group

Hi All,

I had a question about modeling a user session kind of analytics use-case
in Spark Structured Streaming. Is there a way to model something like this
using Arbitrary stateful Spark streaming

User session -> reads a few FAQS on a website and then decides to create a
ticket or not
FAQ Deflection Metrics:
i) Successful Deflection: No issues created within 5 mins of reading the
last FAQ
ii) Failed Deflection: Issue is created within 5 mins of reading FAQ

There are 3 cases here, 2 of which can be done using FlatMapGroupWithState,
not sure about the 3rd i.e
i) Maintain user's last action state, if issue create event happens and
last state is FAQ view within 5 mins -> Failed deflection
ii) Maintain user's last state, if issue create and last state is FAQ view
beyond 5 mins -> Successful deflection
iii) Maintain user's last state with maybe a Processing Time timeout of 5
mins i.e FAQ viewed at T1, no issue creation event from user but time now
is T1 + 5 mins, so we should increment Successful deflection->

Can we do it using Spark GroupStateTimeout? I was confused if a timeout
trigger can happen with no data coming in the group