You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-commits@db.apache.org by Apache Wiki <wi...@apache.org> on 2008/02/05 13:02:49 UTC

[Db-derby Wiki] Update of "OLAPRowNumber" by ThomasNielsen

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by ThomasNielsen:
http://wiki.apache.org/db-derby/OLAPRowNumber

------------------------------------------------------------------------------
  The ROW_NUMBER() window function is currently not implemented, and thereby not supported.
  
  == Proposed Changes ==
+ 
+ 
  ''DRAFT''
  
+ === Overview ===
+ The changes involved to implement ROW_NUMBER() will touch several areas of the Derbys engine. This work includes getting some of the framework needed for future window function extensions in place.
- The steps we propose taking are
-  * The new syntax must be added to the SQL grammar 
-  * A {{{RowNumberColumnNode}}} is added to the querytree for every instance of {{{the ROW_NUMBER()}}} in the query during compilation
-  * During code generation in {{{ResultColumnList.generateCore()}}} we trap instances of {{{RowNumberColumnNode}}} in the querytree and generate code to virtually invoke {{{BaseActivation.getSetRowNumberValue()}}}
-  * During execution of the generated code {{{BaseActivation.getSetRowNumberValue()}}} handles the incrementation and caching of rownumber values based for a given column.
-  * Optimization should cause the execution to stop once possible {{{WHERE}}} clauses are satisfied.
  
+ The largest changes will likely be:
+  * The new syntax must be added to the SQL grammar.
+  * New query tree node subclasses must be added for the window specification and window function columns.
+  * Optimization should avoid flattening and enable materialization.
+  * A new result set class implementing the window functions must be added.
+  * Statistics gathering must be implemented.
+ 
+ Implementation of the ROW_NUMBER() window function is likely to affect other areas as well but to a lesser extent, often only as helper methods.
+ 
+ For the first increment we will only implement support for empty, unnamed windows. This means a window spanning the complete result set.
+ 
+ === Query processing ===
+ 
+ The new ROW_NUMBER() syntax is added to the SQL grammar in {{{sqlgrammar.jj}}} to support empty, unnamed windows.
+ 
+ While walking the ResultColumns of a SelectNode, we add a RowNumberColumnNode to the ResultColumnList in the AST. The window specification is attached as a WindowNode below the RowNumberColumnNode. This is analogous to how a FROM list, a ORDER BY clause, etc, is done for the SelectNode itself. The WindowNode takes a set of parameters including the name, partition definition, ordering info, and the window frame definition. The WindowNode should serve as a basis for future support of named windows.
+ 
+ The interesting part of the AST looks like this for a SelectNode with a window function once parsing is completed:
+ 
+ {{{
+ ...
+  |
+ SelectNode
+  * ...
+  * orderByList -> ...
+  * resultColumnList -> * ResultColumn : table T, column A
+  |                     * ...
+  |                     * RowNumberColumnNode -> WindowNode
+  |                     * ...
+  | 
+ ...
+ }}}
+ 
+ === Query Optimization ===
+ 
+ ==== Avoid flattening ====
+ We avoid flattening queries with window functions during optimization.
+ The main rationales behind this are
+  * avoid rewrite of the query
+ 
+ {{{
+    SELECT * FROM (
+           SELECT row_number() over () as r, t.* FROM T
+    ) AS tmp WHERE r <= 3;
+ }}}
+ 
+ into
+ 
+ {{{
+    SELECT row_number() over () as r, t.* FROM T WHERE R <= 3;
+ }}}
+ 
+ which will fail as '''r''' is not a column in table '''T'''. 
+ 
+  * window function results that span multiple rows, like a moving average, will benefit from materialization. For the nested select query above we materialize the subquery select result, and have the outer SELECT pull rows from the materialized result.
+ 
+ ==== Modification of access paths ====
+ At the last stages of optimization the AST is made into a queryplan by modification of access paths. 
+ 
+ At this stage a SelectNode is exchanged for a ProjectRestrictNode (PRN) that handles projection and restriction, and possible index use is considered etc. 
+ The SelectNode clauses ORDER BY, GROUP BY, and FROM are made into their respective nodes in the queryplan. See example.
+ 
+ Given this select node in the AST
+ 
+ {{{
+ ...
+  |
+ SelectNode
+  * ...
+  * From -> ...
+  * OrderBy -> ...
+  * GroupBy -> ...
+  * Distict
+  * Where -> ...
+  * resultColumnList -> * ResultColumn: table T, column A
+  |                     * ...
+  |                     * RowNumberColumnNode -> WindowNode
+  |                     * ...
+  | 
+ ...
+ }}}
+ 
+ this is made into the following queryplan:
+ 
+ {{{
+ ProjectRestrictNode
+  |
+ WindowNode -> ...
+  |
+ OrderByNode -> ...
+  |
+ DistinctNode -> ...
+  |
+ GroupByNode -> ...
+  |
+ ProjectRestrictNode
+  |
+ <from table T>
+ }}}
+ 
+ A PRN is inserted on top of whatever is in the select FROM clause. This PRN does projection of the non-window function columns, and applies any restriction (where clause) on these columns. To the top of this PRN we add any GroupByNode, DistinctNode, and OrderByNode as needed.
+ 
+ In the case of a window function column we then pull the WindowNode up from under the RowNumberColumnNode, and merge this with the result we get from below. The WindowNode needs to go on top so that ordering, grouping etc are performed properly. We add one WindowNode for each window function column, and they are evaluated left to right in the result column list, with the right most column being the top WindowNode in the query plan.
+ 
+ Yet another PRN is used to top off the upper WindowNode. This PRN is effectively a no-op, and is eliminated during code generation. The PRN is mainly there due to other parts of the code expecting the top node to be a PRN once we are done with the access path modification.
+ 
+ === Code generation === 
+ 
+ During code generation, we lay out code to generate a WindowResultSet, and pass the source resultset information as well as column info to it. The logic for ROW_NUMBER() over the empty window specification is simply a counter in the WindowResultSet, so this change should be seen as part of the window function framework. The intended use of this result set is mainly other window functions that may need row caching to calculate moving averages over a window or other features and functionallity needed for the window function being implemented.
+ 
+ === Code execution ===
+ 
+ During code execution the WindowResult behaves as a regular part of the ResultSet chain in the engine. Rows are fetched from this ResultSet with a call to getNextRowCore() which in turn fetches rows from the lower/child ResultSet in similar fashion. The window function column is added, and the row passed up the chain.
+