You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by HG <ha...@gmail.com> on 2022/01/26 18:56:43 UTC

Unbounded streaming with table API and large json as one of the columns

Hi,

I need to calculate elapsed times between steps of a transaction.
Each step is an event. All steps belonging to a single transaction have the
same transaction id. Every event has a handling time.
All information is part of a large JSON structure.
But I can have the incoming source supply transactionId and handlingTime
separately.
That would save me retrieving the windowingKey = transactionID and
handlingTime out of the nested JSON
Basically I want to use the SQL api to do:

select transactionId
   , handlingTime - previousHandlingTime as elapsedTime
   , largeJSON from (
      select  transactionId
          , handlingTime
          , lag(handlingTime) over (partition by transactionID order by
handlingTime)  as previousHandlingTime
          , largeJSON
      from source
)

The largeJSON can be about 100K.
Would this work?

Regards Hans-Peter

Re: Unbounded streaming with table API and large json as one of the columns

Posted by HG <ha...@gmail.com>.

Thanks

On Fri, Jan 28, 2022, 07:47 Caizhi Weng <ts...@gmail.com> wrote:

> Hi!
>
> This job will work as long as your SQL statement is valid. Did you meet
> some difficulties? Or what is your concern? A record of 100K is sort of
> large, but I've seen quite a lot of jobs with such record size so it is OK.
>
> HG <ha...@gmail.com> 于2022年1月27日周四 02:57写道：
>
>> Hi,
>>
>> I need to calculate elapsed times between steps of a transaction.
>> Each step is an event. All steps belonging to a single transaction have
>> the same transaction id. Every event has a handling time.
>> All information is part of a large JSON structure.
>> But I can have the incoming source supply transactionId and handlingTime
>> separately.
>> That would save me retrieving the windowingKey = transactionID and
>> handlingTime out of the nested JSON
>> Basically I want to use the SQL api to do:
>>
>> select transactionId
>>    , handlingTime - previousHandlingTime as elapsedTime
>>    , largeJSON from (
>>       select  transactionId
>>           , handlingTime
>>           , lag(handlingTime) over (partition by transactionID order by
>> handlingTime)  as previousHandlingTime
>>           , largeJSON
>>       from source
>> )
>>
>> The largeJSON can be about 100K.
>> Would this work?
>>
>> Regards Hans-Peter
>>
>>

Re: Unbounded streaming with table API and large json as one of the columns

Posted by Caizhi Weng <ts...@gmail.com>.

Hi!

This job will work as long as your SQL statement is valid. Did you meet
some difficulties? Or what is your concern? A record of 100K is sort of
large, but I've seen quite a lot of jobs with such record size so it is OK.

HG <ha...@gmail.com> 于2022年1月27日周四 02:57写道：

> Hi,
>
> I need to calculate elapsed times between steps of a transaction.
> Each step is an event. All steps belonging to a single transaction have
> the same transaction id. Every event has a handling time.
> All information is part of a large JSON structure.
> But I can have the incoming source supply transactionId and handlingTime
> separately.
> That would save me retrieving the windowingKey = transactionID and
> handlingTime out of the nested JSON
> Basically I want to use the SQL api to do:
>
> select transactionId
>    , handlingTime - previousHandlingTime as elapsedTime
>    , largeJSON from (
>       select  transactionId
>           , handlingTime
>           , lag(handlingTime) over (partition by transactionID order by
> handlingTime)  as previousHandlingTime
>           , largeJSON
>       from source
> )
>
> The largeJSON can be about 100K.
> Would this work?
>
> Regards Hans-Peter
>
>