You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "mkaravel (via GitHub)" <gi...@apache.org> on 2023/05/02 21:45:46 UTC

[GitHub] [spark] mkaravel opened a new pull request, #41021: [SPARK-16484][SQL] Use 8-bit registers for representing DataSketches

mkaravel opened a new pull request, #41021:
URL: https://github.com/apache/spark/pull/41021

   ### What changes were proposed in this pull request?
   
   This is a follow-up PR to #40615.
   
   In this PR we nake sure that all sketches produces by the new expressions `hll_sketch_agg`, `hll_union_agg` and `hll_union` are actually HLL DataSketches represented with 8-nit registers.
   
   Using 8-bit registers improves the performance of the expressions at the expense of using more memory(the current implementation uses 4-bit registers by default).
   
   ### Why are the changes needed?
   
   To improve the performance of the new HLL-related expressions.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Existing tests suffice.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a diff in pull request #41021: [SPARK-16484][SQL] Use 8-bit registers for representing DataSketches

Posted by "mkaravel (via GitHub)" <gi...@apache.org>.
mkaravel commented on code in PR #41021:
URL: https://github.com/apache/spark/pull/41021#discussion_r1183099184


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala:
##########
@@ -234,6 +237,9 @@ case class HllUnionAgg(
     right.eval().asInstanceOf[Boolean]
   }
 
+  // The default target type (register size) to use.
+  private val targetType = TgtHllType.HLL_8

Review Comment:
   Let's do it in a follow up PR. I prefer that we merge this change ASAP and then figure out how to refactor code properly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #41021: [SPARK-16484][SQL] Use 8-bit registers for representing DataSketches

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #41021: [SPARK-16484][SQL] Use 8-bit registers for representing DataSketches
URL: https://github.com/apache/spark/pull/41021


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #41021: [SPARK-16484][SQL] Use 8-bit registers for representing DataSketches

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #41021:
URL: https://github.com/apache/spark/pull/41021#issuecomment-1532656290

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a diff in pull request #41021: [SPARK-16484][SQL] Use 8-bit registers for representing DataSketches

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on code in PR #41021:
URL: https://github.com/apache/spark/pull/41021#discussion_r1183097205


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala:
##########
@@ -234,6 +237,9 @@ case class HllUnionAgg(
     right.eval().asInstanceOf[Boolean]
   }
 
+  // The default target type (register size) to use.
+  private val targetType = TgtHllType.HLL_8

Review Comment:
   this is duplicated with L71 above, can we dedup to one place? Same for the change in `datasketchesExpressions.scala`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org