You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Rui Wang (Jira)" <ji...@apache.org> on 2020/01/23 22:16:00 UTC
[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function

    [ https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022551#comment-17022551 ] 

Rui Wang edited comment on CALCITE-3737 at 1/23/20 10:15 PM:
-------------------------------------------------------------

Addressed your comments and have two responses to two of the comments:

> Can HOP and TUMBLE share implementation?
I tried to share most of the code and just implemented the windowing part (computing window_start and window_end). Later I gave it up cause hopping need call one function to return a list of hopping's window_start and window_end, and we won't know the size of the list so we cannot really write a for loop in Java. (note that I need to build a list of lin4j expressions and you can check discussion here: [link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]).

Also considering later I will add per-key sessionazation and bucket_gap_filling table functions, they will have even more complicated code to write and is also less sharable. For example, per-key sessionazation will need know all data first and then apply sorting to find window start and window end. Thus I will prefer implement those by the way that  implements hopping (e.g. provide a AbstractEnumerable<Object[]> implementation).

As I am building more table functions and add support for streaming sql, if I want better way to unified table functions implementation, I will add patches for that.

>Changes to reference.md need some copy-editing.
I tried to check the changes in reference.md and made some changes. However I am not a native English speaker so I might not really fix what in your mind before. 




was (Author: amaliujia):
Addressed your comments and have two responses to two of the comments:

> Can HOP and TUMBLE share implementation?
I tried to share most of the code and just implemented the windowing part (computing window_start and window_end). Later I gave it up cause hopping need call one function to return a list of hopping's window_start and window_end, and we won't know the size of the list so we cannot really write a for loop in Java. (note that I need to build a list of lin4j expressions and you can check discussion here: [link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]).

Also considering later I will add per-key sessionazation and bucket_gap_filling table functions, they will have even more complicated code to write thus I will prefer implement those by the way that  implements hopping (e.g. provide a AbstractEnumerable<Object[]> implementation).


>Changes to reference.md need some copy-editing.
I tried to check the changes in reference.md and made some changes. However I am not a native English speaker so I might not really fix what in your mind before. 



> HOP Table-valued Function
> -------------------------
>
>                 Key: CALCITE-3737
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3737
>             Project: Calcite
>          Issue Type: Sub-task
>            Reporter: Rui Wang
>            Assignee: Rui Wang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hopping windows place intervals of a fixed size evenly spaced across event time. Most importantly, in the most common use a given event time timestamp will generally fall into more than one window.
> The table-valued function Hop may produce zero, one, or multiple rows corresponding to each row of input.  Hop takes four required parameters and one optional parameter. All parameters are analogous to those for Tumble except for hopsize, which specifies the duration between the starting points (and endpoints) of the hopping windows, allowing for overlapping windows (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful).
> {code:java}
> Hop (data , timecol , dur, hopsize)
> {code}
> The return value of Hop is a relation that includes all columns of data as well as additional event time columns wstart and wend. Here is an example (from https://s.apache.org/streaming-beam-sql ):
> {code:sql}
> SELECT *
>       FROM Hop (
>         data    => TABLE Bids ,
>         timecol => DESCRIPTOR ( bidtime ) ,
>         dur     => INTERVAL '10' MINUTES ,
>         hopsize => INTERVAL '5' MINUTES );
> ------------------------------------------
> | wstart | wend | bidtime | price | item |
> ------------------------------------------
> | 8:00   | 8:10 | 8:07    | $2    | A    |
> | 8:05   | 8:15 | 8:07    | $2    | A    |
> | 8:05   | 8:15 | 8:11    | $3    | B    |
> | 8:10   | 8:20 | 8:11    | $3    | B    |
> | 8:00   | 8:10 | 8:05    | $4    | C    |
> | 8:05   | 8:15 | 8:05    | $4    | C    |
> | 8:00   | 8:10 | 8:09    | $5    | D    |
> | 8:05   | 8:15 | 8:09    | $5    | D    |
> | 8:05   | 8:15 | 8:13    | $1    | E    |
> | 8:10   | 8:20 | 8:13    | $1    | E    |
> | 8:10   | 8:20 | 8:17    | $6    | F    |
> | 8:15   | 8:25 | 8:17    | $6    | F    |
> ------------------------------------------
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)