You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Hamish Whittal <ha...@cloud-fundis.co.za> on 2020/09/04 14:02:15 UTC

Keeping track of how long something has been in a queue

Hi folks,

I have a stream coming from Kafka.

It has this schema:
{
    "id": 4,
    "account_id": 1070998,
    "uid": "green.th",
    "last_activity_time": "2020-09-03 13:04:04.520129"
}

Another event arrives a few milliseconds/seconds later:

{
    "id": 9,
    "account_id": 1070998,
    "uid": "green.th",
    "last_activity_time": "2020-09-03 13:04:05.123456"
}

Later, say 12 seconds later, she's seen again as an event. Now her event is:

{
    "id": 119,
    "account_id": 1070998,
    "uid": "green.th",
    "last_activity_time": "2020-09-03 13:04:17.345678"
}

She's now been in the queue for 13s (and the total users in the queue are
now 43).
I need to keep track of two things:

(1) Since many of these are coming in from many users, I want to know, over
some time period, how many of them are in the queue at any point,

(2) If Ms green.th was first seen at 13:04:04 and then at 13:04:05, she's
been in the queue for 1 second (ignoring the ms).

How does one go about computing these sorts of more complex things in Spark
Streaming? Would one have to keep track of her first-seen-time in a column
and then do a diff the next time she's seen? With append / update mode, how
does one begin doing this sort of thing?

Any help would be most gratefully appreciated.
Hamish

Re: Keeping track of how long something has been in a queue

Posted by Jungtaek Lim <ka...@gmail.com>.
You may want to google around "session window" and "duration", and check
whether the concept fits your requirements. Probably adding some custom
logic on top of the session window would work for you, which requires you
to implement a custom function for flatMapGroupsWithState.

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Fri, Sep 4, 2020 at 11:21 PM Hamish Whittal <ha...@cloud-fundis.co.za>
wrote:

> Sorry, I moved a paragraph,
>
> (2) If Ms green.th was first seen at 13:04:04, then at 13:04:05 and
>> finally at 13:04:17, she's been in the queue for 13 seconds (ignoring the
>> ms).
>>
>

Re: Keeping track of how long something has been in a queue

Posted by Hamish Whittal <ha...@cloud-fundis.co.za>.
Sorry, I moved a paragraph,

(2) If Ms green.th was first seen at 13:04:04, then at 13:04:05 and finally
> at 13:04:17, she's been in the queue for 13 seconds (ignoring the ms).
>