You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Gregory Fee <gf...@lyft.com> on 2018/05/22 21:52:14 UTC

Limitations with Retract Streams on SQL

I'm trying to get a stream of data from a Table I've formed with roughly
this SQL:

SELECT
    user_id,
    count(msg),
    HOP_END(rowtime, INTERVAL '1' second, INTERVAL '1' minute)
FROM (SELECT rowtime, user_id, action_name AS msg FROM
          event_client_action
        WHERE /* various clauses */
        UNION SELECT rowtime, user_id, action_type AS msg FROM
           event_server_action
           WHERE /* various clauses */
      )
GROUP BY
HOP(rowtime, INTERVAL '1' second, INTERVAL '1' minute), user_id

If I try to get an append stream it tells me the table is not append only.
If I try to get a retract stream it tells me:

Retraction on windowed GroupBy aggregation is not supported yet. Note:
Windowed GroupBy aggregation should not follow a non-windowed GroupBy
aggregation.

Note: The same query without the union clause works just fine.

The error message doesn't make sense to me because I do not think I'm doing
a non-windowed GroupBy anywhere. Can anyone help me?

--
Gregory Fee
Engineer

Re: Limitations with Retract Streams on SQL

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Gregory,

Rong's analysis is correct. The UNION with duplicate elimination is
translated into a UNION ALL and a subsequent grouping operator on all
attributes without an aggregation function.
Flink assumes that all grouping operators can produce retractions (updates)
and window-grouped aggregates cannot handle retraction (yet).

However, there are few things to add here:
1) UNION ALL does not produce retractions, i.e., we never have to retract a
row that we previously forwarded (unless the operator receives a
retraction, however this is a different case as the operator only forwards
but not produces a retraction).
2) We can implement a special operator UNION ALL operator for tables with
timestamp attributes that is more efficient and able to automatically clean
its state based on the progress of watermarks.

I've added JIRAs for both issues I've described above [1] [2].

Best, Fabian

[1] https://issues.apache.org/jira/browse/FLINK-9419
[2] https://issues.apache.org/jira/browse/FLINK-9422

2018-05-23 2:00 GMT+02:00 Rong Rong <wa...@gmail.com>:

> The SQL UNION is the reason here that's causing (a) the table is not
> append only, and (b) the inner GroupBy.
>
> If you check out the UNION operator[1], it suggests that: "Any duplicate
> records are automatically removed unless UNION ALL is used".
> So: (1) it is definitely not append-only operation as you need to revision
> when duplicate records are generated. and
> (2) I think Calcite optimizer is translating the entire execution into two
> individual projection operations followed by a all column GroupBy to dedup
> the messages.
>
> Thanks,
> Rong
>
> Reference:
> [1] https://en.wikipedia.org/wiki/Set_operations_(SQL)#UNION_operator
>
> On Tue, May 22, 2018 at 2:52 PM, Gregory Fee <gf...@lyft.com> wrote:
>
>> I'm trying to get a stream of data from a Table I've formed with roughly
>> this SQL:
>>
>> SELECT
>>     user_id,
>>     count(msg),
>>     HOP_END(rowtime, INTERVAL '1' second, INTERVAL '1' minute)
>> FROM (SELECT rowtime, user_id, action_name AS msg FROM
>>           event_client_action
>>         WHERE /* various clauses */
>>         UNION SELECT rowtime, user_id, action_type AS msg FROM
>>            event_server_action
>>            WHERE /* various clauses */
>>       )
>> GROUP BY
>> HOP(rowtime, INTERVAL '1' second, INTERVAL '1' minute), user_id
>>
>> If I try to get an append stream it tells me the table is not append
>> only. If I try to get a retract stream it tells me:
>>
>> Retraction on windowed GroupBy aggregation is not supported yet. Note:
>> Windowed GroupBy aggregation should not follow a non-windowed GroupBy
>> aggregation.
>>
>> Note: The same query without the union clause works just fine.
>>
>> The error message doesn't make sense to me because I do not think I'm
>> doing a non-windowed GroupBy anywhere. Can anyone help me?
>>
>> --
>> Gregory Fee
>> Engineer
>>
>
>

Re: Limitations with Retract Streams on SQL

Posted by Rong Rong <wa...@gmail.com>.
The SQL UNION is the reason here that's causing (a) the table is not append
only, and (b) the inner GroupBy.

If you check out the UNION operator[1], it suggests that: "Any duplicate
records are automatically removed unless UNION ALL is used".
So: (1) it is definitely not append-only operation as you need to revision
when duplicate records are generated. and
(2) I think Calcite optimizer is translating the entire execution into two
individual projection operations followed by a all column GroupBy to dedup
the messages.

Thanks,
Rong

Reference:
[1] https://en.wikipedia.org/wiki/Set_operations_(SQL)#UNION_operator

On Tue, May 22, 2018 at 2:52 PM, Gregory Fee <gf...@lyft.com> wrote:

> I'm trying to get a stream of data from a Table I've formed with roughly
> this SQL:
>
> SELECT
>     user_id,
>     count(msg),
>     HOP_END(rowtime, INTERVAL '1' second, INTERVAL '1' minute)
> FROM (SELECT rowtime, user_id, action_name AS msg FROM
>           event_client_action
>         WHERE /* various clauses */
>         UNION SELECT rowtime, user_id, action_type AS msg FROM
>            event_server_action
>            WHERE /* various clauses */
>       )
> GROUP BY
> HOP(rowtime, INTERVAL '1' second, INTERVAL '1' minute), user_id
>
> If I try to get an append stream it tells me the table is not append only.
> If I try to get a retract stream it tells me:
>
> Retraction on windowed GroupBy aggregation is not supported yet. Note:
> Windowed GroupBy aggregation should not follow a non-windowed GroupBy
> aggregation.
>
> Note: The same query without the union clause works just fine.
>
> The error message doesn't make sense to me because I do not think I'm
> doing a non-windowed GroupBy anywhere. Can anyone help me?
>
> --
> Gregory Fee
> Engineer
>