You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by lec ssmi <sh...@gmail.com> on 2020/05/03 22:48:19 UTC

Re: multiple joins in one job

Thanks for your replay.
But as I known, if   the time attribute  will be retained and  the time
attribute field  of both streams is selected in the result after joining,
who is the final time attribute variable?

Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:

> Hi lec,
>
> AFAIK, time attribute will be preserved after time interval join.
> Could you share your DDL and SQL queries with us?
>
> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>
>> Hi:
>>    I need to join multiple stream tables  using  time interval join.  The
>> problem is that the time attribute will disappear  after the jon , and
>> pure  sql cannot declare the time attribute field again . So, to make is
>> success,  I need to insert  the last result of join to kafka ,and consume
>> it and join it with another stream table  in another flink job . This seems
>> troublesome.
>> Any good idea?
>>
>>
>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>

Re: multiple joins in one job

Posted by Fabian Hueske <fh...@gmail.com>.
You can in fact forward both time attributes because Flink makes sure that
the watermark is automatically adjusted to the "slower" of both input
streams.

You can run the following queries in the SQL CLI client (here taken an
example from a Flink SQL training [1]

Flink SQL> CREATE VIEW ridesWithFare AS
> SELECT
>   *
> FROM
>   Rides r,
>   Fares f
> WHERE
>   r.rideId = f.rideId AND
>   NOT r.isStart AND
>   f.payTime BETWEEN r.rideTime - INTERVAL '5' MINUTE AND r.rideTime;
[INFO] View has been created.

Flink SQL> DESCRIBE ridesWithFare;
root
 |-- rideId: BIGINT
 |-- taxiId: BIGINT
 |-- isStart: BOOLEAN
 |-- lon: FLOAT
 |-- lat: FLOAT
 |-- rideTime: TIMESTAMP(3) *ROWTIME*
 |-- psgCnt: INT
 |-- rideId0: BIGINT
 |-- payTime: TIMESTAMP(3) *ROWTIME*
 |-- payMethod: STRING
 |-- tip: FLOAT
 |-- toll: FLOAT
 |-- fare: FLOAT

As you see, both rideTime and payTime are of type TIMESTAMP(3) *ROWTIME*.
Hence, both can be used as time attributes later one. However, typically
you'll just select one of them, e.g., when defining a grouping window.

Cheers,
Fabian

[1]
https://github.com/ververica/sql-training/wiki/Joining-Dynamic-Tables#average-tip-per-hour-of-day

Am Mi., 6. Mai 2020 um 03:52 Uhr schrieb Benchao Li <li...@gmail.com>:

> Yes. The watermark will be propagated correctly, which is the min of two
> inputs.
>
> lec ssmi <sh...@gmail.com> 于2020年5月6日周三 上午9:46写道:
>
>> Even if the time attribute field is retained, will the  related watermark
>> be retained?
>> If not, and there is no sql syntax to declare watermark again, it is
>> equivalent to not being able to do multiple joins in one job.
>>
>> Benchao Li <li...@gmail.com> 于2020年5月5日周二 下午9:23写道:
>>
>>> You cannot select more than one time attribute, the planner will give
>>> you an Exception if you did that.
>>>
>>>
>>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午8:34写道:
>>>
>>>> As  you said, if   I  select  all  the  time  attribute  fields   from
>>>> both  ,  which  will be  the  final  one?
>>>>
>>>> Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:
>>>>
>>>>> Hi lec,
>>>>>
>>>>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`,
>>>>> you just select  the time attribute field
>>>>> from one of the input, then it will be time attribute automatically.
>>>>>
>>>>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>>>>>
>>>>>> But  I  have  not  found  there  is  any  syntax to  specify   time
>>>>>>  attribute  field  and  watermark  again  with  pure  sql.
>>>>>>
>>>>>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>>>>>
>>>>>>> Sure, you can write a SQL query with multiple interval joins that
>>>>>>> preserve event-time attributes and watermarks.
>>>>>>> There's no need to feed data back to Kafka just to inject it again
>>>>>>> to assign new watermarks.
>>>>>>>
>>>>>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>>>>>> shicheng31604@gmail.com>:
>>>>>>>
>>>>>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>>>>>
>>>>>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> If the interval join emits the time attributes of both its inputs,
>>>>>>>>> you can use either of them as a time attribute in a following operator
>>>>>>>>> because the join ensures that the watermark will be aligned with both of
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>> Best, Fabian
>>>>>>>>>
>>>>>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>>>>>> shicheng31604@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Thanks for your replay.
>>>>>>>>>> But as I known, if   the time attribute  will be retained and
>>>>>>>>>> the time attribute field  of both streams is selected in the result after
>>>>>>>>>> joining, who is the final time attribute variable?
>>>>>>>>>>
>>>>>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>>>>>
>>>>>>>>>>> Hi lec,
>>>>>>>>>>>
>>>>>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>>>>>
>>>>>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>>>>>
>>>>>>>>>>>> Hi:
>>>>>>>>>>>>    I need to join multiple stream tables  using  time interval
>>>>>>>>>>>> join.  The problem is that the time attribute will disappear  after the jon
>>>>>>>>>>>> , and  pure  sql cannot declare the time attribute field again . So, to
>>>>>>>>>>>> make is success,  I need to insert  the last result of join to kafka ,and
>>>>>>>>>>>> consume it and join it with another stream table  in another flink job
>>>>>>>>>>>> . This seems troublesome.
>>>>>>>>>>>> Any good idea?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Benchao Li
>>>>>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>>>>>> Tel:+86-15650713730
>>>>>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Benchao Li
>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>> Tel:+86-15650713730
>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>
>>>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>
>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>

Re: multiple joins in one job

Posted by Benchao Li <li...@gmail.com>.
Yes. The watermark will be propagated correctly, which is the min of two
inputs.

lec ssmi <sh...@gmail.com> 于2020年5月6日周三 上午9:46写道:

> Even if the time attribute field is retained, will the  related watermark
> be retained?
> If not, and there is no sql syntax to declare watermark again, it is
> equivalent to not being able to do multiple joins in one job.
>
> Benchao Li <li...@gmail.com> 于2020年5月5日周二 下午9:23写道:
>
>> You cannot select more than one time attribute, the planner will give you
>> an Exception if you did that.
>>
>>
>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午8:34写道:
>>
>>> As  you said, if   I  select  all  the  time  attribute  fields   from
>>> both  ,  which  will be  the  final  one?
>>>
>>> Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:
>>>
>>>> Hi lec,
>>>>
>>>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`,
>>>> you just select  the time attribute field
>>>> from one of the input, then it will be time attribute automatically.
>>>>
>>>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>>>>
>>>>> But  I  have  not  found  there  is  any  syntax to  specify   time
>>>>>  attribute  field  and  watermark  again  with  pure  sql.
>>>>>
>>>>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>>>>
>>>>>> Sure, you can write a SQL query with multiple interval joins that
>>>>>> preserve event-time attributes and watermarks.
>>>>>> There's no need to feed data back to Kafka just to inject it again to
>>>>>> assign new watermarks.
>>>>>>
>>>>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>>>>> shicheng31604@gmail.com>:
>>>>>>
>>>>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>>>>
>>>>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> If the interval join emits the time attributes of both its inputs,
>>>>>>>> you can use either of them as a time attribute in a following operator
>>>>>>>> because the join ensures that the watermark will be aligned with both of
>>>>>>>> them.
>>>>>>>>
>>>>>>>> Best, Fabian
>>>>>>>>
>>>>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>>>>> shicheng31604@gmail.com>:
>>>>>>>>
>>>>>>>>> Thanks for your replay.
>>>>>>>>> But as I known, if   the time attribute  will be retained and  the
>>>>>>>>> time attribute field  of both streams is selected in the result after
>>>>>>>>> joining, who is the final time attribute variable?
>>>>>>>>>
>>>>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>>>>
>>>>>>>>>> Hi lec,
>>>>>>>>>>
>>>>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>>>>
>>>>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>>>>
>>>>>>>>>>> Hi:
>>>>>>>>>>>    I need to join multiple stream tables  using  time interval
>>>>>>>>>>> join.  The problem is that the time attribute will disappear  after the jon
>>>>>>>>>>> , and  pure  sql cannot declare the time attribute field again . So, to
>>>>>>>>>>> make is success,  I need to insert  the last result of join to kafka ,and
>>>>>>>>>>> consume it and join it with another stream table  in another flink job
>>>>>>>>>>> . This seems troublesome.
>>>>>>>>>>> Any good idea?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Benchao Li
>>>>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>>>>> Tel:+86-15650713730
>>>>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> --
>>>>
>>>> Benchao Li
>>>> School of Electronics Engineering and Computer Science, Peking University
>>>> Tel:+86-15650713730
>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>
>>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn

Re: multiple joins in one job

Posted by lec ssmi <sh...@gmail.com>.
Even if the time attribute field is retained, will the  related watermark
be retained?
If not, and there is no sql syntax to declare watermark again, it is
equivalent to not being able to do multiple joins in one job.

Benchao Li <li...@gmail.com> 于2020年5月5日周二 下午9:23写道:

> You cannot select more than one time attribute, the planner will give you
> an Exception if you did that.
>
>
> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午8:34写道:
>
>> As  you said, if   I  select  all  the  time  attribute  fields   from
>> both  ,  which  will be  the  final  one?
>>
>> Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:
>>
>>> Hi lec,
>>>
>>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`,
>>> you just select  the time attribute field
>>> from one of the input, then it will be time attribute automatically.
>>>
>>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>>>
>>>> But  I  have  not  found  there  is  any  syntax to  specify   time
>>>>  attribute  field  and  watermark  again  with  pure  sql.
>>>>
>>>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>>>
>>>>> Sure, you can write a SQL query with multiple interval joins that
>>>>> preserve event-time attributes and watermarks.
>>>>> There's no need to feed data back to Kafka just to inject it again to
>>>>> assign new watermarks.
>>>>>
>>>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>>>> shicheng31604@gmail.com>:
>>>>>
>>>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>>>
>>>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> If the interval join emits the time attributes of both its inputs,
>>>>>>> you can use either of them as a time attribute in a following operator
>>>>>>> because the join ensures that the watermark will be aligned with both of
>>>>>>> them.
>>>>>>>
>>>>>>> Best, Fabian
>>>>>>>
>>>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>>>> shicheng31604@gmail.com>:
>>>>>>>
>>>>>>>> Thanks for your replay.
>>>>>>>> But as I known, if   the time attribute  will be retained and  the
>>>>>>>> time attribute field  of both streams is selected in the result after
>>>>>>>> joining, who is the final time attribute variable?
>>>>>>>>
>>>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>>>
>>>>>>>>> Hi lec,
>>>>>>>>>
>>>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>>>
>>>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>>>
>>>>>>>>>> Hi:
>>>>>>>>>>    I need to join multiple stream tables  using  time interval
>>>>>>>>>> join.  The problem is that the time attribute will disappear  after the jon
>>>>>>>>>> , and  pure  sql cannot declare the time attribute field again . So, to
>>>>>>>>>> make is success,  I need to insert  the last result of join to kafka ,and
>>>>>>>>>> consume it and join it with another stream table  in another flink job
>>>>>>>>>> . This seems troublesome.
>>>>>>>>>> Any good idea?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Benchao Li
>>>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>>>> Tel:+86-15650713730
>>>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>>>
>>>>>>>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>
>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>

Re: multiple joins in one job

Posted by Benchao Li <li...@gmail.com>.
You cannot select more than one time attribute, the planner will give you
an Exception if you did that.


lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午8:34写道:

> As  you said, if   I  select  all  the  time  attribute  fields   from
> both  ,  which  will be  the  final  one?
>
> Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:
>
>> Hi lec,
>>
>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`, you
>> just select  the time attribute field
>> from one of the input, then it will be time attribute automatically.
>>
>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>>
>>> But  I  have  not  found  there  is  any  syntax to  specify   time
>>>  attribute  field  and  watermark  again  with  pure  sql.
>>>
>>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>>
>>>> Sure, you can write a SQL query with multiple interval joins that
>>>> preserve event-time attributes and watermarks.
>>>> There's no need to feed data back to Kafka just to inject it again to
>>>> assign new watermarks.
>>>>
>>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>>> shicheng31604@gmail.com>:
>>>>
>>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>>
>>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> If the interval join emits the time attributes of both its inputs,
>>>>>> you can use either of them as a time attribute in a following operator
>>>>>> because the join ensures that the watermark will be aligned with both of
>>>>>> them.
>>>>>>
>>>>>> Best, Fabian
>>>>>>
>>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>>> shicheng31604@gmail.com>:
>>>>>>
>>>>>>> Thanks for your replay.
>>>>>>> But as I known, if   the time attribute  will be retained and  the
>>>>>>> time attribute field  of both streams is selected in the result after
>>>>>>> joining, who is the final time attribute variable?
>>>>>>>
>>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>>
>>>>>>>> Hi lec,
>>>>>>>>
>>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>>
>>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>>
>>>>>>>>> Hi:
>>>>>>>>>    I need to join multiple stream tables  using  time interval
>>>>>>>>> join.  The problem is that the time attribute will disappear  after the jon
>>>>>>>>> , and  pure  sql cannot declare the time attribute field again . So, to
>>>>>>>>> make is success,  I need to insert  the last result of join to kafka ,and
>>>>>>>>> consume it and join it with another stream table  in another flink job
>>>>>>>>> . This seems troublesome.
>>>>>>>>> Any good idea?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Benchao Li
>>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>>> Tel:+86-15650713730
>>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>>
>>>>>>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn

Re: multiple joins in one job

Posted by lec ssmi <sh...@gmail.com>.
As  you said, if   I  select  all  the  time  attribute  fields   from
both  ,  which  will be  the  final  one?

Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:

> Hi lec,
>
> You don't need to specify time attribute again like `TUMBLE_ROWTIME`, you
> just select  the time attribute field
> from one of the input, then it will be time attribute automatically.
>
> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>
>> But  I  have  not  found  there  is  any  syntax to  specify   time
>>  attribute  field  and  watermark  again  with  pure  sql.
>>
>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>
>>> Sure, you can write a SQL query with multiple interval joins that
>>> preserve event-time attributes and watermarks.
>>> There's no need to feed data back to Kafka just to inject it again to
>>> assign new watermarks.
>>>
>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>> shicheng31604@gmail.com>:
>>>
>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>
>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>
>>>>> Hi,
>>>>>
>>>>> If the interval join emits the time attributes of both its inputs, you
>>>>> can use either of them as a time attribute in a following operator because
>>>>> the join ensures that the watermark will be aligned with both of them.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>> shicheng31604@gmail.com>:
>>>>>
>>>>>> Thanks for your replay.
>>>>>> But as I known, if   the time attribute  will be retained and  the
>>>>>> time attribute field  of both streams is selected in the result after
>>>>>> joining, who is the final time attribute variable?
>>>>>>
>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>
>>>>>>> Hi lec,
>>>>>>>
>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>
>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>
>>>>>>>> Hi:
>>>>>>>>    I need to join multiple stream tables  using  time interval
>>>>>>>> join.  The problem is that the time attribute will disappear  after the jon
>>>>>>>> , and  pure  sql cannot declare the time attribute field again . So, to
>>>>>>>> make is success,  I need to insert  the last result of join to kafka ,and
>>>>>>>> consume it and join it with another stream table  in another flink job
>>>>>>>> . This seems troublesome.
>>>>>>>> Any good idea?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Benchao Li
>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>> Tel:+86-15650713730
>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>
>>>>>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>

Re: multiple joins in one job

Posted by Benchao Li <li...@gmail.com>.
Hi lec,

You don't need to specify time attribute again like `TUMBLE_ROWTIME`, you
just select  the time attribute field
from one of the input, then it will be time attribute automatically.

lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:

> But  I  have  not  found  there  is  any  syntax to  specify   time
>  attribute  field  and  watermark  again  with  pure  sql.
>
> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>
>> Sure, you can write a SQL query with multiple interval joins that
>> preserve event-time attributes and watermarks.
>> There's no need to feed data back to Kafka just to inject it again to
>> assign new watermarks.
>>
>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>> shicheng31604@gmail.com>:
>>
>>> I mean using pure sql statement to make it . Can it be possible?
>>>
>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>
>>>> Hi,
>>>>
>>>> If the interval join emits the time attributes of both its inputs, you
>>>> can use either of them as a time attribute in a following operator because
>>>> the join ensures that the watermark will be aligned with both of them.
>>>>
>>>> Best, Fabian
>>>>
>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>> shicheng31604@gmail.com>:
>>>>
>>>>> Thanks for your replay.
>>>>> But as I known, if   the time attribute  will be retained and  the
>>>>> time attribute field  of both streams is selected in the result after
>>>>> joining, who is the final time attribute variable?
>>>>>
>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>
>>>>>> Hi lec,
>>>>>>
>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>
>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>
>>>>>>> Hi:
>>>>>>>    I need to join multiple stream tables  using  time interval
>>>>>>> join.  The problem is that the time attribute will disappear  after the jon
>>>>>>> , and  pure  sql cannot declare the time attribute field again . So, to
>>>>>>> make is success,  I need to insert  the last result of join to kafka ,and
>>>>>>> consume it and join it with another stream table  in another flink job
>>>>>>> . This seems troublesome.
>>>>>>> Any good idea?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Benchao Li
>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>> Tel:+86-15650713730
>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>
>>>>>>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn

Re: multiple joins in one job

Posted by lec ssmi <sh...@gmail.com>.
But  I  have  not  found  there  is  any  syntax to  specify   time
 attribute  field  and  watermark  again  with  pure  sql.

Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:

> Sure, you can write a SQL query with multiple interval joins that preserve
> event-time attributes and watermarks.
> There's no need to feed data back to Kafka just to inject it again to
> assign new watermarks.
>
> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <shicheng31604@gmail.com
> >:
>
>> I mean using pure sql statement to make it . Can it be possible?
>>
>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>
>>> Hi,
>>>
>>> If the interval join emits the time attributes of both its inputs, you
>>> can use either of them as a time attribute in a following operator because
>>> the join ensures that the watermark will be aligned with both of them.
>>>
>>> Best, Fabian
>>>
>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>> shicheng31604@gmail.com>:
>>>
>>>> Thanks for your replay.
>>>> But as I known, if   the time attribute  will be retained and  the time
>>>> attribute field  of both streams is selected in the result after joining,
>>>> who is the final time attribute variable?
>>>>
>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>
>>>>> Hi lec,
>>>>>
>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>> Could you share your DDL and SQL queries with us?
>>>>>
>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>
>>>>>> Hi:
>>>>>>    I need to join multiple stream tables  using  time interval join.
>>>>>> The problem is that the time attribute will disappear  after the jon , and
>>>>>> pure  sql cannot declare the time attribute field again . So, to make is
>>>>>> success,  I need to insert  the last result of join to kafka ,and consume
>>>>>> it and join it with another stream table  in another flink job . This seems
>>>>>> troublesome.
>>>>>> Any good idea?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Benchao Li
>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>> Tel:+86-15650713730
>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>
>>>>>

Re: multiple joins in one job

Posted by Fabian Hueske <fh...@gmail.com>.
Sure, you can write a SQL query with multiple interval joins that preserve
event-time attributes and watermarks.
There's no need to feed data back to Kafka just to inject it again to
assign new watermarks.

Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <sh...@gmail.com>:

> I mean using pure sql statement to make it . Can it be possible?
>
> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>
>> Hi,
>>
>> If the interval join emits the time attributes of both its inputs, you
>> can use either of them as a time attribute in a following operator because
>> the join ensures that the watermark will be aligned with both of them.
>>
>> Best, Fabian
>>
>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>> shicheng31604@gmail.com>:
>>
>>> Thanks for your replay.
>>> But as I known, if   the time attribute  will be retained and  the time
>>> attribute field  of both streams is selected in the result after joining,
>>> who is the final time attribute variable?
>>>
>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>
>>>> Hi lec,
>>>>
>>>> AFAIK, time attribute will be preserved after time interval join.
>>>> Could you share your DDL and SQL queries with us?
>>>>
>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>
>>>>> Hi:
>>>>>    I need to join multiple stream tables  using  time interval join.
>>>>> The problem is that the time attribute will disappear  after the jon , and
>>>>> pure  sql cannot declare the time attribute field again . So, to make is
>>>>> success,  I need to insert  the last result of join to kafka ,and consume
>>>>> it and join it with another stream table  in another flink job . This seems
>>>>> troublesome.
>>>>> Any good idea?
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Benchao Li
>>>> School of Electronics Engineering and Computer Science, Peking University
>>>> Tel:+86-15650713730
>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>
>>>>

Re: multiple joins in one job

Posted by lec ssmi <sh...@gmail.com>.
I mean using pure sql statement to make it . Can it be possible?

Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:

> Hi,
>
> If the interval join emits the time attributes of both its inputs, you can
> use either of them as a time attribute in a following operator because the
> join ensures that the watermark will be aligned with both of them.
>
> Best, Fabian
>
> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <shicheng31604@gmail.com
> >:
>
>> Thanks for your replay.
>> But as I known, if   the time attribute  will be retained and  the time
>> attribute field  of both streams is selected in the result after joining,
>> who is the final time attribute variable?
>>
>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>
>>> Hi lec,
>>>
>>> AFAIK, time attribute will be preserved after time interval join.
>>> Could you share your DDL and SQL queries with us?
>>>
>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>
>>>> Hi:
>>>>    I need to join multiple stream tables  using  time interval join.
>>>> The problem is that the time attribute will disappear  after the jon , and
>>>> pure  sql cannot declare the time attribute field again . So, to make is
>>>> success,  I need to insert  the last result of join to kafka ,and consume
>>>> it and join it with another stream table  in another flink job . This seems
>>>> troublesome.
>>>> Any good idea?
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>
>>>

Re: multiple joins in one job

Posted by Fabian Hueske <fh...@gmail.com>.
Hi,

If the interval join emits the time attributes of both its inputs, you can
use either of them as a time attribute in a following operator because the
join ensures that the watermark will be aligned with both of them.

Best, Fabian

Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <sh...@gmail.com>:

> Thanks for your replay.
> But as I known, if   the time attribute  will be retained and  the time
> attribute field  of both streams is selected in the result after joining,
> who is the final time attribute variable?
>
> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>
>> Hi lec,
>>
>> AFAIK, time attribute will be preserved after time interval join.
>> Could you share your DDL and SQL queries with us?
>>
>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>
>>> Hi:
>>>    I need to join multiple stream tables  using  time interval join.
>>> The problem is that the time attribute will disappear  after the jon , and
>>> pure  sql cannot declare the time attribute field again . So, to make is
>>> success,  I need to insert  the last result of join to kafka ,and consume
>>> it and join it with another stream table  in another flink job . This seems
>>> troublesome.
>>> Any good idea?
>>>
>>>
>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>