You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2019/03/21 21:00:43 UTC

[GitHub] [calcite] julianhyde commented on a change in pull request #1101: [CALCITE-2909] Optimize Enumerable SemiJoin with lazy computation of innerLookup (WIP, blocked by CALCITE-2937) (Ruben Quesada Lopez)

julianhyde commented on a change in pull request #1101: [CALCITE-2909] Optimize Enumerable SemiJoin with lazy computation of innerLookup (WIP, blocked by CALCITE-2937) (Ruben Quesada Lopez)
URL: https://github.com/apache/calcite/pull/1101#discussion_r267954959
 
 

 ##########
 File path: linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java
 ##########
 @@ -1298,11 +1298,11 @@ private void closeInner() {
       final EqualityComparer<TKey> comparer) {
     return new AbstractEnumerable<TSource>() {
       public Enumerator<TSource> enumerator() {
-        final Enumerable<TKey> innerLookup =
-            comparer == null
-                ? inner.select(innerKeySelector).distinct()
-                : inner.select(innerKeySelector).distinct(comparer);
-
+        // CALCITE-2909 Delay the computation of the innerLookup until the moment when we are sure
+        // that it will be really needed, i.e. when the first outer enumerator item is processed
+        final Enumerable<TKey> innerLookup = comparer == null
+                ? Linq4j.lazyEnumerable(() -> inner.select(innerKeySelector).distinct())
+                : Linq4j.lazyEnumerable(() -> inner.select(innerKeySelector).distinct(comparer));
 
 Review comment:
   Rather than inventing `LazyEnumerable`, could you change the type of `innerLookup` from `Enumerable<TKey>` to `Supplier<Enumerable<TKey>>` or (better) `Supplier<Enumerator<TKey>>` and use Guava's `MemoizingSupplier`
   
   The goal, after all, is that the hash table is computed 0 times if not needed, and 1 time if needed 1 or more times. A memoizing supplier achieves that very well.
   
   I don't see anything in the code that requires `innerLookup` to be an Enumerable.
   
   My concern with LazyEnumerable is that Enumerable is, by design, a lazy data structure. When you invent another layer of laziness, it will be difficult to figure out which of the 2 lazy layers to put new features. It is already confusing enough.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services