You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2020/07/02 04:40:43 UTC

[GitHub] [calcite] liyafan82 opened a new pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

liyafan82 opened a new pull request #2053:
URL: https://github.com/apache/calcite/pull/2053


   1. Improve the use of data structures
   2. Extract common methods
   3. Avoid expensive operations


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] chunweilei commented on a change in pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
chunweilei commented on a change in pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#discussion_r454061401



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Aggregate.java
##########
@@ -548,19 +538,22 @@ public static boolean isRollup(ImmutableBitSet groupSet,
      *
      * @see #isRollup(ImmutableBitSet, List) */
     public static List<Integer> getRollup(List<ImmutableBitSet> groupSets) {
-      final Set<Integer> set = new LinkedHashSet<>();
+      final List<Integer> rollUpBits = new ArrayList<>(groupSets.size() - 1);
       ImmutableBitSet g = null;
       for (ImmutableBitSet bitSet : groupSets) {
         if (g == null) {
           // First item must equal groupSet
         } else {
           // Each subsequent items must be a subset with one fewer bit than the
           // previous item
-          set.addAll(g.except(bitSet).toList());
+          ImmutableBitSet diff = g.except(bitSet);

Review comment:
       Would there be duplicate values since Set is changed to List?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] liyafan82 commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-655878557


   > FWIW, I intentionally used `HashMap` rather than `HashSet`. `HashSet` uses a `HashMap` under the covers. I figured I'd save an object allocation. Yours is a bit clearer, a bit less efficient.
   
   Thanks a lot for your feedback. It is clever to `HashMap` to avoid object allocation.
   Fortunately, the value object for `HashSet` is declared static, so it is allocated once, and shared by all `HashSet` objects. 
   
   > `ret` is not a great name for a variable; also declare it as `List<Integer>` not `ArrayList<Integer>`.
   
   Good suggestion. I have renamed `ret` to `rollUpBits`.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] zinking commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
zinking commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-652972245


   looks all equivalent rewrites for the refactoring, 
   any explanation for the improvements


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] liyafan82 commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-659118142


   > > > FWIW, I intentionally used `HashMap` rather than `HashSet`. `HashSet` uses a `HashMap` under the covers. I figured I'd save an object allocation. Yours is a bit clearer, a bit less efficient.
   > 
   > > Thanks a lot for your feedback. It is clever to `HashMap` to avoid object allocation.
   > > Fortunately, the value object for `HashSet` is declared static, so it is allocated once, and shared by all `HashSet` objects.
   > 
   > I think what @julianhyde means is that the HashSet holds a pointer to HashMap, with HashSet object header, it will use 16 bytes more than bare HashMap.
   
   @hsyuan Thanks for your kind reminder. I have reverted the code to use `HashMap`, and made it clear in the comment that this is by design. BTW, why it is 16 more bytes?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
hsyuan commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-658871226


   > > FWIW, I intentionally used `HashMap` rather than `HashSet`. `HashSet` uses a `HashMap` under the covers. I figured I'd save an object allocation. Yours is a bit clearer, a bit less efficient.
   
   > Thanks a lot for your feedback. It is clever to `HashMap` to avoid object allocation.
   Fortunately, the value object for `HashSet` is declared static, so it is allocated once, and shared by all `HashSet` objects. 
   
   I think what @julianhyde means is that the HashSet holds a pointer to HashMap, with HashSet object header, it will use 16 bytes more than bare HashMap. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] liyafan82 commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-653371215


   > looks all equivalent rewrites for the refactoring,
   > any explanation for the improvements
   
   Sure. Thanks a lot for your feedback. Below are some explanations:
   
   1. We extract a common method for ImmutableBitSet, which is repeated multiple times in the code base. We also find it useful in our code base. This makes the code simpler and easier to maintain. 
   2. In Aggregate#getRollup, we replace the LinkedHashSet with an ArrayList, because we can make sure that the newly added element must be distinct (according to the definition of a rollup). This makes adding data faster (LinkedHashSet takes O(logn) time, whereas ArrayList takes O(1)).
   3. When checking if a group set is a roll up, we replace the except operation with cardinality minus, which is much faster. We can make this optimization, because when a set contains another, the cardinality of the difference is the difference of the cardiality.
   4. In Util#firstDuplicate method, we replace the HashMap with a HashSet, because the value in the HashMap is not really used. This makes the logic clearer. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] liyafan82 commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-655471902


   > make sense.
   > Can you update the commit log to describe the content of the PR shortly,
   > and add your name in the end.
   > 
   > See the guide in https://calcite.apache.org/develop/#contributing
   
   @yanlin-Lynn Thanks a lot for your kind reminder. 
   I have updated the commit message. Please check it. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] liyafan82 commented on a change in pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#discussion_r449390948



##########
File path: core/src/main/java/org/apache/calcite/util/ImmutableBitSet.java
##########
@@ -901,6 +902,19 @@ public ImmutableBitSet shift(int offset) {
     return builder.build();
   }
 
+  /**
+   * Checks if all bit sets contain a particular bit.
+   */
+  public static boolean allContain(Collection<ImmutableBitSet> bitSets,
+                                   int bit) {

Review comment:
       Revised. Thank you for your kind reminder. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] yanlin-Lynn commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
yanlin-Lynn commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-654743183


   make sense.
   Can you update the commit log to describe the content of the PR shortly,
   and add your name in the end.
   
   See the guide in https://calcite.apache.org/develop/#contributing


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan merged pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
hsyuan merged pull request #2053:
URL: https://github.com/apache/calcite/pull/2053


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] julianhyde commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
julianhyde commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-655853211


   FWIW, I intentionally used `HashMap` rather than `HashSet`. `HashSet` uses a `HashMap` under the covers. I figured I'd save an object allocation. Yours is a bit clearer, a bit less efficient.
   
   `ret` is not a great name for a variable; also declare it as `List<Integer>` not `ArrayList<Integer>`.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] yanlin-Lynn commented on a change in pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
yanlin-Lynn commented on a change in pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#discussion_r449334906



##########
File path: core/src/main/java/org/apache/calcite/util/ImmutableBitSet.java
##########
@@ -901,6 +902,19 @@ public ImmutableBitSet shift(int offset) {
     return builder.build();
   }
 
+  /**
+   * Checks if all bit sets contain a particular bit.
+   */
+  public static boolean allContain(Collection<ImmutableBitSet> bitSets,
+                                   int bit) {

Review comment:
       we do not format like this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] liyafan82 commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-659119855


   > > I have reverted the code to use HashMap, and made it clear in the comment that this is by design. BTW, why it is 16 more bytes?
   > 
   > Suppose in Hotspot 64-bit VM, with COOPS enabled, the object header is 12 bytes, the pointer to HashMap uses 4 bytes. So it is 16 bytes.
   
   Awesome! Thanks for your clarification.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] hsyuan commented on pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
hsyuan commented on pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#issuecomment-659119236


   > I have reverted the code to use HashMap, and made it clear in the comment that this is by design. BTW, why it is 16 more bytes?
   
   Suppose in Hotspot 64-bit VM, with COOPS enabled, the object header is 12 bytes, the pointer to HashMap uses 4 bytes. So it is 16 bytes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [calcite] liyafan82 commented on a change in pull request #2053: [CALCITE-4102] Some improvements to aggregate related operations

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #2053:
URL: https://github.com/apache/calcite/pull/2053#discussion_r454123317



##########
File path: core/src/main/java/org/apache/calcite/rel/core/Aggregate.java
##########
@@ -548,19 +538,22 @@ public static boolean isRollup(ImmutableBitSet groupSet,
      *
      * @see #isRollup(ImmutableBitSet, List) */
     public static List<Integer> getRollup(List<ImmutableBitSet> groupSets) {
-      final Set<Integer> set = new LinkedHashSet<>();
+      final List<Integer> rollUpBits = new ArrayList<>(groupSets.size() - 1);
       ImmutableBitSet g = null;
       for (ImmutableBitSet bitSet : groupSets) {
         if (g == null) {
           // First item must equal groupSet
         } else {
           // Each subsequent items must be a subset with one fewer bit than the
           // previous item
-          set.addAll(g.except(bitSet).toList());
+          ImmutableBitSet diff = g.except(bitSet);

Review comment:
       @chunweilei Thanks a lot for your feedback.
   According to the definition of Rollup (https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java#L508-L513), there can be no duplicate. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org