You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by lec ssmi <sh...@gmail.com> on 2020/05/03 22:48:19 UTC
Re: multiple joins in one job
Thanks for your replay.
But as I known, if the time attribute will be retained and the time
attribute field of both streams is selected in the result after joining,
who is the final time attribute variable?
Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
> Hi lec,
>
> AFAIK, time attribute will be preserved after time interval join.
> Could you share your DDL and SQL queries with us?
>
> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>
>> Hi:
>> I need to join multiple stream tables using time interval join. The
>> problem is that the time attribute will disappear after the jon , and
>> pure sql cannot declare the time attribute field again . So, to make is
>> success, I need to insert the last result of join to kafka ,and consume
>> it and join it with another stream table in another flink job . This seems
>> troublesome.
>> Any good idea?
>>
>>
>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>
Re: multiple joins in one job
Posted by Fabian Hueske <fh...@gmail.com>.
You can in fact forward both time attributes because Flink makes sure that
the watermark is automatically adjusted to the "slower" of both input
streams.
You can run the following queries in the SQL CLI client (here taken an
example from a Flink SQL training [1]
Flink SQL> CREATE VIEW ridesWithFare AS
> SELECT
> *
> FROM
> Rides r,
> Fares f
> WHERE
> r.rideId = f.rideId AND
> NOT r.isStart AND
> f.payTime BETWEEN r.rideTime - INTERVAL '5' MINUTE AND r.rideTime;
[INFO] View has been created.
Flink SQL> DESCRIBE ridesWithFare;
root
|-- rideId: BIGINT
|-- taxiId: BIGINT
|-- isStart: BOOLEAN
|-- lon: FLOAT
|-- lat: FLOAT
|-- rideTime: TIMESTAMP(3) *ROWTIME*
|-- psgCnt: INT
|-- rideId0: BIGINT
|-- payTime: TIMESTAMP(3) *ROWTIME*
|-- payMethod: STRING
|-- tip: FLOAT
|-- toll: FLOAT
|-- fare: FLOAT
As you see, both rideTime and payTime are of type TIMESTAMP(3) *ROWTIME*.
Hence, both can be used as time attributes later one. However, typically
you'll just select one of them, e.g., when defining a grouping window.
Cheers,
Fabian
[1]
https://github.com/ververica/sql-training/wiki/Joining-Dynamic-Tables#average-tip-per-hour-of-day
Am Mi., 6. Mai 2020 um 03:52 Uhr schrieb Benchao Li <li...@gmail.com>:
> Yes. The watermark will be propagated correctly, which is the min of two
> inputs.
>
> lec ssmi <sh...@gmail.com> 于2020年5月6日周三 上午9:46写道:
>
>> Even if the time attribute field is retained, will the related watermark
>> be retained?
>> If not, and there is no sql syntax to declare watermark again, it is
>> equivalent to not being able to do multiple joins in one job.
>>
>> Benchao Li <li...@gmail.com> 于2020年5月5日周二 下午9:23写道:
>>
>>> You cannot select more than one time attribute, the planner will give
>>> you an Exception if you did that.
>>>
>>>
>>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午8:34写道:
>>>
>>>> As you said, if I select all the time attribute fields from
>>>> both , which will be the final one?
>>>>
>>>> Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:
>>>>
>>>>> Hi lec,
>>>>>
>>>>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`,
>>>>> you just select the time attribute field
>>>>> from one of the input, then it will be time attribute automatically.
>>>>>
>>>>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>>>>>
>>>>>> But I have not found there is any syntax to specify time
>>>>>> attribute field and watermark again with pure sql.
>>>>>>
>>>>>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>>>>>
>>>>>>> Sure, you can write a SQL query with multiple interval joins that
>>>>>>> preserve event-time attributes and watermarks.
>>>>>>> There's no need to feed data back to Kafka just to inject it again
>>>>>>> to assign new watermarks.
>>>>>>>
>>>>>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>>>>>> shicheng31604@gmail.com>:
>>>>>>>
>>>>>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>>>>>
>>>>>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> If the interval join emits the time attributes of both its inputs,
>>>>>>>>> you can use either of them as a time attribute in a following operator
>>>>>>>>> because the join ensures that the watermark will be aligned with both of
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>> Best, Fabian
>>>>>>>>>
>>>>>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>>>>>> shicheng31604@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Thanks for your replay.
>>>>>>>>>> But as I known, if the time attribute will be retained and
>>>>>>>>>> the time attribute field of both streams is selected in the result after
>>>>>>>>>> joining, who is the final time attribute variable?
>>>>>>>>>>
>>>>>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>>>>>
>>>>>>>>>>> Hi lec,
>>>>>>>>>>>
>>>>>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>>>>>
>>>>>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>>>>>
>>>>>>>>>>>> Hi:
>>>>>>>>>>>> I need to join multiple stream tables using time interval
>>>>>>>>>>>> join. The problem is that the time attribute will disappear after the jon
>>>>>>>>>>>> , and pure sql cannot declare the time attribute field again . So, to
>>>>>>>>>>>> make is success, I need to insert the last result of join to kafka ,and
>>>>>>>>>>>> consume it and join it with another stream table in another flink job
>>>>>>>>>>>> . This seems troublesome.
>>>>>>>>>>>> Any good idea?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Benchao Li
>>>>>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>>>>>> Tel:+86-15650713730
>>>>>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Benchao Li
>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>> Tel:+86-15650713730
>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>
>>>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>
>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>
Re: multiple joins in one job
Posted by Benchao Li <li...@gmail.com>.
Yes. The watermark will be propagated correctly, which is the min of two
inputs.
lec ssmi <sh...@gmail.com> 于2020年5月6日周三 上午9:46写道:
> Even if the time attribute field is retained, will the related watermark
> be retained?
> If not, and there is no sql syntax to declare watermark again, it is
> equivalent to not being able to do multiple joins in one job.
>
> Benchao Li <li...@gmail.com> 于2020年5月5日周二 下午9:23写道:
>
>> You cannot select more than one time attribute, the planner will give you
>> an Exception if you did that.
>>
>>
>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午8:34写道:
>>
>>> As you said, if I select all the time attribute fields from
>>> both , which will be the final one?
>>>
>>> Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:
>>>
>>>> Hi lec,
>>>>
>>>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`,
>>>> you just select the time attribute field
>>>> from one of the input, then it will be time attribute automatically.
>>>>
>>>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>>>>
>>>>> But I have not found there is any syntax to specify time
>>>>> attribute field and watermark again with pure sql.
>>>>>
>>>>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>>>>
>>>>>> Sure, you can write a SQL query with multiple interval joins that
>>>>>> preserve event-time attributes and watermarks.
>>>>>> There's no need to feed data back to Kafka just to inject it again to
>>>>>> assign new watermarks.
>>>>>>
>>>>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>>>>> shicheng31604@gmail.com>:
>>>>>>
>>>>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>>>>
>>>>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> If the interval join emits the time attributes of both its inputs,
>>>>>>>> you can use either of them as a time attribute in a following operator
>>>>>>>> because the join ensures that the watermark will be aligned with both of
>>>>>>>> them.
>>>>>>>>
>>>>>>>> Best, Fabian
>>>>>>>>
>>>>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>>>>> shicheng31604@gmail.com>:
>>>>>>>>
>>>>>>>>> Thanks for your replay.
>>>>>>>>> But as I known, if the time attribute will be retained and the
>>>>>>>>> time attribute field of both streams is selected in the result after
>>>>>>>>> joining, who is the final time attribute variable?
>>>>>>>>>
>>>>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>>>>
>>>>>>>>>> Hi lec,
>>>>>>>>>>
>>>>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>>>>
>>>>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>>>>
>>>>>>>>>>> Hi:
>>>>>>>>>>> I need to join multiple stream tables using time interval
>>>>>>>>>>> join. The problem is that the time attribute will disappear after the jon
>>>>>>>>>>> , and pure sql cannot declare the time attribute field again . So, to
>>>>>>>>>>> make is success, I need to insert the last result of join to kafka ,and
>>>>>>>>>>> consume it and join it with another stream table in another flink job
>>>>>>>>>>> . This seems troublesome.
>>>>>>>>>>> Any good idea?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Benchao Li
>>>>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>>>>> Tel:+86-15650713730
>>>>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> --
>>>>
>>>> Benchao Li
>>>> School of Electronics Engineering and Computer Science, Peking University
>>>> Tel:+86-15650713730
>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>
>>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>
--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn
Re: multiple joins in one job
Posted by lec ssmi <sh...@gmail.com>.
Even if the time attribute field is retained, will the related watermark
be retained?
If not, and there is no sql syntax to declare watermark again, it is
equivalent to not being able to do multiple joins in one job.
Benchao Li <li...@gmail.com> 于2020年5月5日周二 下午9:23写道:
> You cannot select more than one time attribute, the planner will give you
> an Exception if you did that.
>
>
> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午8:34写道:
>
>> As you said, if I select all the time attribute fields from
>> both , which will be the final one?
>>
>> Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:
>>
>>> Hi lec,
>>>
>>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`,
>>> you just select the time attribute field
>>> from one of the input, then it will be time attribute automatically.
>>>
>>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>>>
>>>> But I have not found there is any syntax to specify time
>>>> attribute field and watermark again with pure sql.
>>>>
>>>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>>>
>>>>> Sure, you can write a SQL query with multiple interval joins that
>>>>> preserve event-time attributes and watermarks.
>>>>> There's no need to feed data back to Kafka just to inject it again to
>>>>> assign new watermarks.
>>>>>
>>>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>>>> shicheng31604@gmail.com>:
>>>>>
>>>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>>>
>>>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> If the interval join emits the time attributes of both its inputs,
>>>>>>> you can use either of them as a time attribute in a following operator
>>>>>>> because the join ensures that the watermark will be aligned with both of
>>>>>>> them.
>>>>>>>
>>>>>>> Best, Fabian
>>>>>>>
>>>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>>>> shicheng31604@gmail.com>:
>>>>>>>
>>>>>>>> Thanks for your replay.
>>>>>>>> But as I known, if the time attribute will be retained and the
>>>>>>>> time attribute field of both streams is selected in the result after
>>>>>>>> joining, who is the final time attribute variable?
>>>>>>>>
>>>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>>>
>>>>>>>>> Hi lec,
>>>>>>>>>
>>>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>>>
>>>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>>>
>>>>>>>>>> Hi:
>>>>>>>>>> I need to join multiple stream tables using time interval
>>>>>>>>>> join. The problem is that the time attribute will disappear after the jon
>>>>>>>>>> , and pure sql cannot declare the time attribute field again . So, to
>>>>>>>>>> make is success, I need to insert the last result of join to kafka ,and
>>>>>>>>>> consume it and join it with another stream table in another flink job
>>>>>>>>>> . This seems troublesome.
>>>>>>>>>> Any good idea?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Benchao Li
>>>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>>>> Tel:+86-15650713730
>>>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>>>
>>>>>>>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>
>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>
Re: multiple joins in one job
Posted by Benchao Li <li...@gmail.com>.
You cannot select more than one time attribute, the planner will give you
an Exception if you did that.
lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午8:34写道:
> As you said, if I select all the time attribute fields from
> both , which will be the final one?
>
> Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:
>
>> Hi lec,
>>
>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`, you
>> just select the time attribute field
>> from one of the input, then it will be time attribute automatically.
>>
>> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>>
>>> But I have not found there is any syntax to specify time
>>> attribute field and watermark again with pure sql.
>>>
>>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>>
>>>> Sure, you can write a SQL query with multiple interval joins that
>>>> preserve event-time attributes and watermarks.
>>>> There's no need to feed data back to Kafka just to inject it again to
>>>> assign new watermarks.
>>>>
>>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>>> shicheng31604@gmail.com>:
>>>>
>>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>>
>>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> If the interval join emits the time attributes of both its inputs,
>>>>>> you can use either of them as a time attribute in a following operator
>>>>>> because the join ensures that the watermark will be aligned with both of
>>>>>> them.
>>>>>>
>>>>>> Best, Fabian
>>>>>>
>>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>>> shicheng31604@gmail.com>:
>>>>>>
>>>>>>> Thanks for your replay.
>>>>>>> But as I known, if the time attribute will be retained and the
>>>>>>> time attribute field of both streams is selected in the result after
>>>>>>> joining, who is the final time attribute variable?
>>>>>>>
>>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>>
>>>>>>>> Hi lec,
>>>>>>>>
>>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>>
>>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>>
>>>>>>>>> Hi:
>>>>>>>>> I need to join multiple stream tables using time interval
>>>>>>>>> join. The problem is that the time attribute will disappear after the jon
>>>>>>>>> , and pure sql cannot declare the time attribute field again . So, to
>>>>>>>>> make is success, I need to insert the last result of join to kafka ,and
>>>>>>>>> consume it and join it with another stream table in another flink job
>>>>>>>>> . This seems troublesome.
>>>>>>>>> Any good idea?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Benchao Li
>>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>>> Tel:+86-15650713730
>>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>>
>>>>>>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>
--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn
Re: multiple joins in one job
Posted by lec ssmi <sh...@gmail.com>.
As you said, if I select all the time attribute fields from
both , which will be the final one?
Benchao Li <li...@gmail.com> 于 2020年5月5日周二 17:26写道:
> Hi lec,
>
> You don't need to specify time attribute again like `TUMBLE_ROWTIME`, you
> just select the time attribute field
> from one of the input, then it will be time attribute automatically.
>
> lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
>
>> But I have not found there is any syntax to specify time
>> attribute field and watermark again with pure sql.
>>
>> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>>
>>> Sure, you can write a SQL query with multiple interval joins that
>>> preserve event-time attributes and watermarks.
>>> There's no need to feed data back to Kafka just to inject it again to
>>> assign new watermarks.
>>>
>>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>>> shicheng31604@gmail.com>:
>>>
>>>> I mean using pure sql statement to make it . Can it be possible?
>>>>
>>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>>
>>>>> Hi,
>>>>>
>>>>> If the interval join emits the time attributes of both its inputs, you
>>>>> can use either of them as a time attribute in a following operator because
>>>>> the join ensures that the watermark will be aligned with both of them.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>>> shicheng31604@gmail.com>:
>>>>>
>>>>>> Thanks for your replay.
>>>>>> But as I known, if the time attribute will be retained and the
>>>>>> time attribute field of both streams is selected in the result after
>>>>>> joining, who is the final time attribute variable?
>>>>>>
>>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>>
>>>>>>> Hi lec,
>>>>>>>
>>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>>
>>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>>
>>>>>>>> Hi:
>>>>>>>> I need to join multiple stream tables using time interval
>>>>>>>> join. The problem is that the time attribute will disappear after the jon
>>>>>>>> , and pure sql cannot declare the time attribute field again . So, to
>>>>>>>> make is success, I need to insert the last result of join to kafka ,and
>>>>>>>> consume it and join it with another stream table in another flink job
>>>>>>>> . This seems troublesome.
>>>>>>>> Any good idea?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Benchao Li
>>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>>> Tel:+86-15650713730
>>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>>
>>>>>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>
Re: multiple joins in one job
Posted by Benchao Li <li...@gmail.com>.
Hi lec,
You don't need to specify time attribute again like `TUMBLE_ROWTIME`, you
just select the time attribute field
from one of the input, then it will be time attribute automatically.
lec ssmi <sh...@gmail.com> 于2020年5月5日周二 下午4:42写道:
> But I have not found there is any syntax to specify time
> attribute field and watermark again with pure sql.
>
> Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
>
>> Sure, you can write a SQL query with multiple interval joins that
>> preserve event-time attributes and watermarks.
>> There's no need to feed data back to Kafka just to inject it again to
>> assign new watermarks.
>>
>> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <
>> shicheng31604@gmail.com>:
>>
>>> I mean using pure sql statement to make it . Can it be possible?
>>>
>>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>>
>>>> Hi,
>>>>
>>>> If the interval join emits the time attributes of both its inputs, you
>>>> can use either of them as a time attribute in a following operator because
>>>> the join ensures that the watermark will be aligned with both of them.
>>>>
>>>> Best, Fabian
>>>>
>>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>>> shicheng31604@gmail.com>:
>>>>
>>>>> Thanks for your replay.
>>>>> But as I known, if the time attribute will be retained and the
>>>>> time attribute field of both streams is selected in the result after
>>>>> joining, who is the final time attribute variable?
>>>>>
>>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>>
>>>>>> Hi lec,
>>>>>>
>>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>>> Could you share your DDL and SQL queries with us?
>>>>>>
>>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>>
>>>>>>> Hi:
>>>>>>> I need to join multiple stream tables using time interval
>>>>>>> join. The problem is that the time attribute will disappear after the jon
>>>>>>> , and pure sql cannot declare the time attribute field again . So, to
>>>>>>> make is success, I need to insert the last result of join to kafka ,and
>>>>>>> consume it and join it with another stream table in another flink job
>>>>>>> . This seems troublesome.
>>>>>>> Any good idea?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Benchao Li
>>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>>> Tel:+86-15650713730
>>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>>
>>>>>>
--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn
Re: multiple joins in one job
Posted by lec ssmi <sh...@gmail.com>.
But I have not found there is any syntax to specify time
attribute field and watermark again with pure sql.
Fabian Hueske <fh...@gmail.com> 于 2020年5月5日周二 15:47写道:
> Sure, you can write a SQL query with multiple interval joins that preserve
> event-time attributes and watermarks.
> There's no need to feed data back to Kafka just to inject it again to
> assign new watermarks.
>
> Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <shicheng31604@gmail.com
> >:
>
>> I mean using pure sql statement to make it . Can it be possible?
>>
>> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>>
>>> Hi,
>>>
>>> If the interval join emits the time attributes of both its inputs, you
>>> can use either of them as a time attribute in a following operator because
>>> the join ensures that the watermark will be aligned with both of them.
>>>
>>> Best, Fabian
>>>
>>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>>> shicheng31604@gmail.com>:
>>>
>>>> Thanks for your replay.
>>>> But as I known, if the time attribute will be retained and the time
>>>> attribute field of both streams is selected in the result after joining,
>>>> who is the final time attribute variable?
>>>>
>>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>>
>>>>> Hi lec,
>>>>>
>>>>> AFAIK, time attribute will be preserved after time interval join.
>>>>> Could you share your DDL and SQL queries with us?
>>>>>
>>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>>
>>>>>> Hi:
>>>>>> I need to join multiple stream tables using time interval join.
>>>>>> The problem is that the time attribute will disappear after the jon , and
>>>>>> pure sql cannot declare the time attribute field again . So, to make is
>>>>>> success, I need to insert the last result of join to kafka ,and consume
>>>>>> it and join it with another stream table in another flink job . This seems
>>>>>> troublesome.
>>>>>> Any good idea?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Benchao Li
>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>> Tel:+86-15650713730
>>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>>
>>>>>
Re: multiple joins in one job
Posted by Fabian Hueske <fh...@gmail.com>.
Sure, you can write a SQL query with multiple interval joins that preserve
event-time attributes and watermarks.
There's no need to feed data back to Kafka just to inject it again to
assign new watermarks.
Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <sh...@gmail.com>:
> I mean using pure sql statement to make it . Can it be possible?
>
> Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
>
>> Hi,
>>
>> If the interval join emits the time attributes of both its inputs, you
>> can use either of them as a time attribute in a following operator because
>> the join ensures that the watermark will be aligned with both of them.
>>
>> Best, Fabian
>>
>> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <
>> shicheng31604@gmail.com>:
>>
>>> Thanks for your replay.
>>> But as I known, if the time attribute will be retained and the time
>>> attribute field of both streams is selected in the result after joining,
>>> who is the final time attribute variable?
>>>
>>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>>
>>>> Hi lec,
>>>>
>>>> AFAIK, time attribute will be preserved after time interval join.
>>>> Could you share your DDL and SQL queries with us?
>>>>
>>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>>
>>>>> Hi:
>>>>> I need to join multiple stream tables using time interval join.
>>>>> The problem is that the time attribute will disappear after the jon , and
>>>>> pure sql cannot declare the time attribute field again . So, to make is
>>>>> success, I need to insert the last result of join to kafka ,and consume
>>>>> it and join it with another stream table in another flink job . This seems
>>>>> troublesome.
>>>>> Any good idea?
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Benchao Li
>>>> School of Electronics Engineering and Computer Science, Peking University
>>>> Tel:+86-15650713730
>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>
>>>>
Re: multiple joins in one job
Posted by lec ssmi <sh...@gmail.com>.
I mean using pure sql statement to make it . Can it be possible?
Fabian Hueske <fh...@gmail.com> 于2020年5月4日周一 下午4:04写道:
> Hi,
>
> If the interval join emits the time attributes of both its inputs, you can
> use either of them as a time attribute in a following operator because the
> join ensures that the watermark will be aligned with both of them.
>
> Best, Fabian
>
> Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <shicheng31604@gmail.com
> >:
>
>> Thanks for your replay.
>> But as I known, if the time attribute will be retained and the time
>> attribute field of both streams is selected in the result after joining,
>> who is the final time attribute variable?
>>
>> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>>
>>> Hi lec,
>>>
>>> AFAIK, time attribute will be preserved after time interval join.
>>> Could you share your DDL and SQL queries with us?
>>>
>>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>>
>>>> Hi:
>>>> I need to join multiple stream tables using time interval join.
>>>> The problem is that the time attribute will disappear after the jon , and
>>>> pure sql cannot declare the time attribute field again . So, to make is
>>>> success, I need to insert the last result of join to kafka ,and consume
>>>> it and join it with another stream table in another flink job . This seems
>>>> troublesome.
>>>> Any good idea?
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>
>>>
Re: multiple joins in one job
Posted by Fabian Hueske <fh...@gmail.com>.
Hi,
If the interval join emits the time attributes of both its inputs, you can
use either of them as a time attribute in a following operator because the
join ensures that the watermark will be aligned with both of them.
Best, Fabian
Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <sh...@gmail.com>:
> Thanks for your replay.
> But as I known, if the time attribute will be retained and the time
> attribute field of both streams is selected in the result after joining,
> who is the final time attribute variable?
>
> Benchao Li <li...@gmail.com> 于2020年4月30日周四 下午8:25写道:
>
>> Hi lec,
>>
>> AFAIK, time attribute will be preserved after time interval join.
>> Could you share your DDL and SQL queries with us?
>>
>> lec ssmi <sh...@gmail.com> 于2020年4月30日周四 下午5:48写道:
>>
>>> Hi:
>>> I need to join multiple stream tables using time interval join.
>>> The problem is that the time attribute will disappear after the jon , and
>>> pure sql cannot declare the time attribute field again . So, to make is
>>> success, I need to insert the last result of join to kafka ,and consume
>>> it and join it with another stream table in another flink job . This seems
>>> troublesome.
>>> Any good idea?
>>>
>>>
>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>