You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2016/03/09 04:11:41 UTC

[jira] [Created] (MADLIB-977) Path functions (phase 2)

Frank McQuillan created MADLIB-977:
--------------------------------------

             Summary: Path functions (phase 2)
                 Key: MADLIB-977
                 URL: https://issues.apache.org/jira/browse/MADLIB-977
             Project: Apache MADlib
          Issue Type: Epic
          Components: Module: Utilities
            Reporter: Frank McQuillan
            Assignee: Rahul Iyer
             Fix For: v1.9


Story

The goal of the MADlib path function is to perform regular pattern matching over a sequence of rows, and to extract useful information about the matches.  The useful information could be a simple count of matches or something more involved like window functions and aggregations.

Put another way, the problem statement is:

“Given a set of rows of interest, we want to divide it up into one or more sequences (ordered lists of rows),  and then search each sequence for a given pattern. We want to produce a result row per sequence (or per match, in case there are multiple matches). A result row may simply indicate that a match occurred or it may return interesting information about the match.”

So the progression is:

1) Identify rows of interest from raw table or view (using symbols).
2) Pattern match across rows (using regex on symbols).
3) Define one or more windows on the matches.
4) Apply standard PostgreSQL window functions or aggregations on the windows.

Terminology that we will use:
* Partition
* Pattern match (within a partition)
* Window (within a pattern match)
* Function (calculated on a window)

Use cases

* Web analytics (clickstream)
* Marketing revenue attribution
* Telephone calling patterns
* Stock market trading sequences
* Predictive maintenance
* Genomics sequencing

References

[1] Time series blog on window functions #1
http://blog.pivotal.io/data-science-pivotal/products/time-series-analysis-1-introduction-to-window-functions

[2] Time series blog on window functions #2
http://blog.pivotal.io/data-science-pivotal/products/time-series-analysis-2-recognizing-patterns-within-a-time-series

[3] GPDB window functions
http://gpdb.docs.pivotal.io/4320/admin_guide/query.html#topic30

[4] PostgreSQL pattern matching
http://www.postgresql.org/docs/9.4/static/functions-matching.html

[5] DFA
https://en.wikipedia.org/wiki/Deterministic_finite_automaton

[6] PostgreSQL aggregate functions
http://www.postgresql.org/docs/9.4/static/functions-aggregate.html




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)