You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2020/02/25 15:57:14 UTC

[GitHub] [calcite] rubenada opened a new pull request #1826: [CALCITE-3820] EnumerableDefaults#orderBy should be lazily computed + support enumerator re-initialization

rubenada opened a new pull request #1826: [CALCITE-3820] EnumerableDefaults#orderBy should be lazily computed + support enumerator re-initialization
URL: https://github.com/apache/calcite/pull/1826
 
 
   Jira ticket: [CALCITE-3820](https://issues.apache.org/jira/browse/CALCITE-3820)
   
   The current implementation of EnumerableDefaults#orderBy has the following disadvantages:
   
   1) The TreeMap that performs the actual sorting is eagerly computed. This operation can be quite heavy if we have a very big source. The fact that it is eagerly computed might lead to bad performance in situations where the sort is executed, but it is actually not needed. For example, in a NestedLoopJoin, if the outer enumerator returns no results, the inner one is not even accessed. If we had a huge orderBy as inner, today it would be computed in that scenario, even though it is not actually required. For this reason, in terms of performance it seems clearly preferable to delay the sort operation as much as possible.
   
   2) The Enumerable / Enumerator returned by EnumerableDefaults#orderBy cannot be re-evaluated. Since the map, and the subsequent LookupImpl, the Enumerable relies on is eagerly computed, every called to enumerator will return the same values, even if the source Enumerable has changed. This is a corner case, but we can see it if, for example, we have a MergeJoin, with EnumerableSort (i.e. using EnumerableDefaults#orderBy) inside a RepeatUnion (a.k.a. recursive union). In this situation, the RepeatUnion re-evaluates its right child, so that the output of the iteration N-1 is used as input for the iteration N. In this scenario, the EnumerableDefaults#orderBy will not work as expected, since it will not be actually re-computed (see EnumerableMethods#repeatUnion).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [calcite] rubenada merged pull request #1826: [CALCITE-3820] EnumerableDefaults#orderBy should be lazily computed + support enumerator re-initialization

Posted by GitBox <gi...@apache.org>.
rubenada merged pull request #1826: [CALCITE-3820] EnumerableDefaults#orderBy should be lazily computed + support enumerator re-initialization
URL: https://github.com/apache/calcite/pull/1826
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services