You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2016/08/15 23:55:20 UTC

[jira] [Closed] (MADLIB-995) Path - overlapping partitions

     [ https://issues.apache.org/jira/browse/MADLIB-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan closed MADLIB-995.
----------------------------------

> Path - overlapping partitions
> -----------------------------
>
>                 Key: MADLIB-995
>                 URL: https://issues.apache.org/jira/browse/MADLIB-995
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v1.9.1
>
>         Attachments: Ecommerce data set for path test 3.csv, path-overlapping-patterns.ipynb
>
>
> Story
> As a data scientist, I want to be able to define multiple symbols that result in overlapping partitions.
> See
> http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
> for a description of what a symbol is.
> Currently in 1.9, overlapping partitions are not supported. The default is non-overlapping, where the path algo begins the next pattern search at the row that follows the last pattern match (like how grep works in UNIX).
> In the case of overlapping, the path algo needs to find every occurrence of the pattern in the partition, regardless of whether it might have been part of a previously found match. This means one row can match multiple symbols in a given matched pattern so there is a dependency on https://issues.apache.org/jira/browse/MADLIB-943 .  There is (small) chance that this story is a no-op once https://issues.apache.org/jira/browse/MADLIB-943 is done.
> Need to add a new optional BOOLEAN parameter to the interface called "overlapping_patterns".  Default is FALSE.
> (While you are at it please fix the docs to indicate that the "persist_rows" param is optional with default FALSE.)
> Acceptance
> The attached data set and query should should produce the following output:
> Event Timestamp	User ID	Age Group	Income Group	Gender	Region	Household Size	Click Event	Purchase Event	Revenue	Margin	Match ID
> 4/15/12 7:02	100821	1	4	Unknown	West	3	1	1	118	39	1
> 4/15/12 8:51	102201	3	3	Female	East	3	0	0	0	0	1
> 4/15/12 9:28	101121	2	2	Unknown	West	4	1	1	103	32	1,2
> 4/15/12 10:19	103711	4	3	Female	Central	5	0	0	0	0	2
> 4/15/12 11:40	100821	1	4	Unknown	West	3	0	0	0	0	2
> 4/16/12 2:12	100821	1	4	Unknown	West	3	1	1	153	26	3
> 4/16/12 4:20	102201	3	3	Female	East	3	0	0	0	0	3
> 4/16/12 5:38	101121	2	2	Unknown	West	4	1	0	0	0	3
> 4/16/12 20:46	101121	2	2	Unknown	West	4	1	1	131	28	4
> 4/16/12 21:11	101331	2	4	Female	East	5	1	1	127	27	4
> 4/16/12 22:35	101121	2	2	Unknown	West	4	0	0	0	0	4
> There are 4 pattern matches.  The 1st and the 2nd overlap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)