You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2016/03/26 19:34:25 UTC

[jira] [Closed] (MADLIB-903) Path functions (phase 1)

     [ https://issues.apache.org/jira/browse/MADLIB-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan closed MADLIB-903.
----------------------------------

> Path functions (phase 1)
> ------------------------
>
>                 Key: MADLIB-903
>                 URL: https://issues.apache.org/jira/browse/MADLIB-903
>             Project: Apache MADlib
>          Issue Type: Epic
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Assignee: Rahul Iyer
>             Fix For: v1.9
>
>
> Story
> The goal of the MADlib path function is to perform regular pattern matching over a sequence of rows, and to extract useful information about the matches.  The useful information could be a simple count of matches or something more involved like window functions and aggregations.
> Put another way, the problem statement is:
> “Given a set of rows of interest, we want to divide it up into one or more sequences (ordered lists of rows),  and then search each sequence for a given pattern. We want to produce a result row per sequence (or per match, in case there are multiple matches). A result row may simply indicate that a match occurred or it may return interesting information about the match.”
> So the progression is:
> 1) Identify rows of interest from raw table or view (using symbols).
> 2) Pattern match across rows (using regex on symbols).
> 3) Define one or more windows on the matches.
> 4) Apply standard PostgreSQL window functions or aggregations on the windows.
> Terminology that we will use:
> * Partition
> * Pattern match (within a partition)
> * Window (within a pattern match)
> * Function (calculated on a window)
> Use cases
> * Web analytics (clickstream)
> * Marketing revenue attribution
> * Telephone calling patterns
> * Stock market trading sequences
> * Predictive maintenance
> * Genomics sequencing
> References
> [1] Time series blog on window functions #1
> http://blog.pivotal.io/data-science-pivotal/products/time-series-analysis-1-introduction-to-window-functions
> [2] Time series blog on window functions #2
> http://blog.pivotal.io/data-science-pivotal/products/time-series-analysis-2-recognizing-patterns-within-a-time-series
> [3] GPDB window functions
> http://gpdb.docs.pivotal.io/4320/admin_guide/query.html#topic30
> [4] PostgreSQL pattern matching
> http://www.postgresql.org/docs/9.4/static/functions-matching.html
> [5] DFA
> https://en.wikipedia.org/wiki/Deterministic_finite_automaton
> [6] PostgreSQL aggregate functions
> http://www.postgresql.org/docs/9.4/static/functions-aggregate.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)