You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2016/03/03 19:29:18 UTC

[jira] [Comment Edited] (MADLIB-917) Path - window functions (multiple matches per partition, 1 window per match)

    [ https://issues.apache.org/jira/browse/MADLIB-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988666#comment-14988666 ] 

Frank McQuillan edited comment on MADLIB-917 at 3/3/16 6:28 PM:
----------------------------------------------------------------

3) Ecommerce data set for path test 3.csv

theme for test:
* multiple pattern matches per partition

Pattern match:  IMPR->CLICK->CONV
i.e., exactly 1 of each in succession

notes:
* an IMPR is defined by no CLICK and no CONV
* all CONV events also have a CLICK event, but they are considered to be only CONV events
* pattern matches do not cross day boundaries
* “Path Match” column indicates a path that matches the above pattern

output:

a) partition by user ID
total revenue and total margin by user ID
101121
rev = 131
margin = 28

101331
rev = 568
margin = 113


b) partition by user ID, by day
total revenue and total margin by user ID by calendar day
101121
4/16/2012
rev = 131
margin = 28

101331
4/15/2012
rev = 112
margin = 36

101331
4/16/2012
rev = 456
margin = 77

c) partition by day
total revenue and total margin by calendar day
4/15/2012
no matches (remember we are partitioning by date not user ID)

4/16/2012
rev = 456
margin = 77
i.e., only the last 3 rows of the file match the pattern


was (Author: fmcquillan):
3) Ecommerce data set for path test 3.csv

theme for test:
* multiple pattern matches per partition

Pattern match:  IMPR->CLICK->CONV
i.e., exactly 1 of each in succession

notes:
* an IMPR is defined by no CLICK and no CONV
* all CONV events also have a CLICK event, but they are considered to be only CONV events
* pattern matches do not cross day boundaries
* “Path Match” column indicates a path that matches the above pattern

output:

a) partition by user ID
total revenue and total margin by user ID
101121
rev = 131
margin = 28

101331
rev = 568
margin = 113


b) partition by user ID, by day
total revenue and total margin by user ID by calendar day
101121
4/16/2012
rev = 131
margin = 28

101331
4/15/2012
rev = 112
margin = 36

101331
4/16/2012
rev = 456
margin = 77

c) partition by day
total revenue and total margin by calendar day
4/15/2012
rev = 112
margin = 36

4/16/2012
rev = 587
margin = 105

> Path - window functions (multiple matches per partition, 1 window per match)
> ----------------------------------------------------------------------------
>
>                 Key: MADLIB-917
>                 URL: https://issues.apache.org/jira/browse/MADLIB-917
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>    Affects Versions: v1.9
>            Reporter: Frank McQuillan
>            Assignee: Rahul Iyer
>             Fix For: v1.9
>
>         Attachments: Ecommerce data set for path test 3.csv, path query3.sql
>
>
> Story
> As a user, I want to define symbols so that I can define a regular expression of symbols to identify sequences of events that I care about.
> Partition:
> 1) Multiple matches per partition in this story.
> 2) Note that the match in the data might not span the whole partition, that is, that matched rows could just be a subset of the rows in the partition.
> Window:
> 1) Limited to 1 window per partition.
> Other
> 1) Club rows together in the case where there are multiple matches per partition, when doing aggregate/window functions.  E.g., if doing sum of a revenue column, then sum all rows from all matches (as opposed to a separate sum for each match).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)