You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/28 10:59:06 UTC

[GitHub] [doris] FreeOnePlus opened a new issue, #10473: [Feature] About supporting sequenceCount function and retention function

FreeOnePlus opened a new issue, #10473:
URL: https://github.com/apache/doris/issues/10473

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   In ClickHouse, support for sequenceCount function and retention function is provided, but Doris currently does not have these two functions, and hopes to gain support
   
   ### Use case
   
   > The following are the explanations and use cases of the two functions in ClickHouse
   
   ## sequenceCount(pattern)(time, cond1, cond2, …)
   Counts the number of event chains that matched the pattern. The function searches event chains that do not overlap. It starts to search for the next chain after the current chain is matched.
   
   WARNING
   Events that occur at the same second may lay in the sequence in an undefined order affecting the result.
   
   sequenceCount(pattern)(timestamp, cond1, cond2, ...)
   
   Arguments
   
   timestamp — Column considered to contain time data. Typical data types are Date and DateTime. You can also use any of the supported [UInt](https://clickhouse.com/docs/en/sql-reference/data-types/int-uint) data types.
   
   cond1, cond2 — Conditions that describe the chain of events. Data type: UInt8. You can pass up to 32 condition arguments. The function takes only the events described in these conditions into account. If the sequence contains data that isn’t described in a condition, the function skips them.
   
   Parameters
   
   pattern — Pattern string. See [Pattern syntax](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/parametric-functions/#sequence-function-pattern-syntax).
   Returned values
   
   Number of non-overlapping event chains that are matched.
   Type: UInt64.
   
   Example
   
   Consider data in the t table:
   
   ┌─time─┬─number─┐
   │    1 │      1 │
   │    2 │      3 │
   │    3 │      2 │
   │    4 │      1 │
   │    5 │      3 │
   │    6 │      2 │
   └──────┴────────┘
   
   Count how many times the number 2 occurs after the number 1 with any amount of other numbers between them:
   
   SELECT sequenceCount('(?1).*(?2)')(time, number = 1, number = 2) FROM t
   
   ┌─sequenceCount('(?1).*(?2)')(time, equals(number, 1), equals(number, 2))─┐
   │                                                                       2 │
   └─────────────────────────────────────────────────────────────────────────┘
   
   
   
   ## retention function
   The function takes as arguments a set of conditions from 1 to 32 arguments of type UInt8 that indicate whether a certain condition was met for the event. Any condition can be specified as an argument (as in [WHERE](https://clickhouse.com/docs/en/sql-reference/statements/select/where#select-where)).
   
   The conditions, except the first, apply in pairs: the result of the second will be true if the first and second are true, of the third if the first and third are true, etc.
   
   Syntax
   
   retention(cond1, cond2, ..., cond32);
   
   Arguments
   
   cond — An expression that returns a UInt8 result (1 or 0).
   Returned value
   
   The array of 1 or 0.
   
   1 — Condition was met for the event.
   0 — Condition wasn’t met for the event.
   Type: UInt8.
   
   Example
   
   Let’s consider an example of calculating the retention function to determine site traffic.
   
   1. Сreate a table to illustrate an example.
   
   CREATE TABLE retention_test(date Date, uid Int32) ENGINE = Memory;
   
   INSERT INTO retention_test SELECT '2020-01-01', number FROM numbers(5);
   INSERT INTO retention_test SELECT '2020-01-02', number FROM numbers(10);
   INSERT INTO retention_test SELECT '2020-01-03', number FROM numbers(15);
   
   Input table:
   
   Query:
   
   SELECT * FROM retention_test
   
   Result:
   
   ┌───────date─┬─uid─┐
   │ 2020-01-01 │   0 │
   │ 2020-01-01 │   1 │
   │ 2020-01-01 │   2 │
   │ 2020-01-01 │   3 │
   │ 2020-01-01 │   4 │
   └────────────┴─────┘
   ┌───────date─┬─uid─┐
   │ 2020-01-02 │   0 │
   │ 2020-01-02 │   1 │
   │ 2020-01-02 │   2 │
   │ 2020-01-02 │   3 │
   │ 2020-01-02 │   4 │
   │ 2020-01-02 │   5 │
   │ 2020-01-02 │   6 │
   │ 2020-01-02 │   7 │
   │ 2020-01-02 │   8 │
   │ 2020-01-02 │   9 │
   └────────────┴─────┘
   ┌───────date─┬─uid─┐
   │ 2020-01-03 │   0 │
   │ 2020-01-03 │   1 │
   │ 2020-01-03 │   2 │
   │ 2020-01-03 │   3 │
   │ 2020-01-03 │   4 │
   │ 2020-01-03 │   5 │
   │ 2020-01-03 │   6 │
   │ 2020-01-03 │   7 │
   │ 2020-01-03 │   8 │
   │ 2020-01-03 │   9 │
   │ 2020-01-03 │  10 │
   │ 2020-01-03 │  11 │
   │ 2020-01-03 │  12 │
   │ 2020-01-03 │  13 │
   │ 2020-01-03 │  14 │
   └────────────┴─────┘
   
   2. Group users by unique ID uid using the retention function.
   
   Query:
   
   SELECT
       uid,
       retention(date = '2020-01-01', date = '2020-01-02', date = '2020-01-03') AS r
   FROM retention_test
   WHERE date IN ('2020-01-01', '2020-01-02', '2020-01-03')
   GROUP BY uid
   ORDER BY uid ASC
   
   Result:
   
   ┌─uid─┬─r───────┐
   │   0 │ [1,1,1] │
   │   1 │ [1,1,1] │
   │   2 │ [1,1,1] │
   │   3 │ [1,1,1] │
   │   4 │ [1,1,1] │
   │   5 │ [0,0,0] │
   │   6 │ [0,0,0] │
   │   7 │ [0,0,0] │
   │   8 │ [0,0,0] │
   │   9 │ [0,0,0] │
   │  10 │ [0,0,0] │
   │  11 │ [0,0,0] │
   │  12 │ [0,0,0] │
   │  13 │ [0,0,0] │
   │  14 │ [0,0,0] │
   └─────┴─────────┘
   
   3. Calculate the total number of site visits per day.
   
   Query:
   
   SELECT
       sum(r[1]) AS r1,
       sum(r[2]) AS r2,
       sum(r[3]) AS r3
   FROM
   (
       SELECT
           uid,
           retention(date = '2020-01-01', date = '2020-01-02', date = '2020-01-03') AS r
       FROM retention_test
       WHERE date IN ('2020-01-01', '2020-01-02', '2020-01-03')
       GROUP BY uid
   )
   
   
   Result:
   
   ┌─r1─┬─r2─┬─r3─┐
   │  5 │  5 │  5 │
   └────┴────┴────┘
   
   Where:
   
   r1- the number of unique visitors who visited the site during 2020-01-01 (the cond1 condition).
   r2- the number of unique visitors who visited the site during a specific time period between 2020-01-01 and 2020-01-02 (cond1 and cond2 conditions).
   r3- the number of unique visitors who visited the site during a specific time period between 2020-01-01 and 2020-01-03 (cond1 and cond3 conditions).
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] isHuangXin commented on issue #10473: [Feature] About supporting sequenceCount function and retention function

Posted by GitBox <gi...@apache.org>.
isHuangXin commented on issue #10473:
URL: https://github.com/apache/doris/issues/10473#issuecomment-1253156361

   Plz unassign me. Because doris does not support building the project in the M1 ARM programming environment. At that time, my computer is an M1 chip.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] isHuangXin commented on issue #10473: [Feature] About supporting sequenceCount function and retention function

Posted by GitBox <gi...@apache.org>.
isHuangXin commented on issue #10473:
URL: https://github.com/apache/doris/issues/10473#issuecomment-1231566121

   I want to try it. please assign this issue to me~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] isHuangXin commented on issue #10473: [Feature] About supporting sequenceCount function and retention function

Posted by GitBox <gi...@apache.org>.
isHuangXin commented on issue #10473:
URL: https://github.com/apache/doris/issues/10473#issuecomment-1253172378

   > Thank you very much! In fact, I am also using m1 macbook. I usually develop on macbook and compile on Alibaba Cloud server. This is my suggestion and I hope it can help you. 😄
   
   Get it. Thanks for your advice and I'll try that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-LIAN commented on issue #10473: [Feature] About supporting sequenceCount function and retention function

Posted by GitBox <gi...@apache.org>.
Yukang-LIAN commented on issue #10473:
URL: https://github.com/apache/doris/issues/10473#issuecomment-1253153297

   > [WeOpen-Star] I want to try it. please assign this issue to me~
   
   Hello handsome, my mentor asked me to try to implement the retention function, and I see thst you are trying to implement it. I think if you haven't implemented it yet, could you please give me this task (just Retention function)? thank you very much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring commented on issue #10473: [Feature] About supporting sequenceCount function and retention function

Posted by GitBox <gi...@apache.org>.
dataroaring commented on issue #10473:
URL: https://github.com/apache/doris/issues/10473#issuecomment-1255832542

   retention is picked by a guy. https://github.com/apache/doris/issues/12800.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-LIAN commented on issue #10473: [Feature] About supporting sequenceCount function and retention function

Posted by GitBox <gi...@apache.org>.
Yukang-LIAN commented on issue #10473:
URL: https://github.com/apache/doris/issues/10473#issuecomment-1253169815

   Thank you very much! In fact, I am also using m1 macbook. I usually develop on macbook and compile on Alibaba Cloud server. This is my suggestion and I hope it can help you. 😄


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org