You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Fucun Chu (Jira)" <ji...@apache.org> on 2021/08/16 06:37:00 UTC

[jira] [Created] (IMPALA-10864) Optimize ds_hll_sketch() function

Fucun Chu created IMPALA-10864:
----------------------------------

             Summary: Optimize ds_hll_sketch() function
                 Key: IMPALA-10864
                 URL: https://issues.apache.org/jira/browse/IMPALA-10864
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Fucun Chu


[https://lists.apache.org/thread.html/r2591247f7ca0813a0c80fb0d06b4b8fd614298ea44ea730f6ed7cfe1%40%3Cdev.impala.apache.org%3E]

 

??Regarding deserialization. I see in some cases that a sketch constructor is called just to replace this instance with a deserialized one. This extra construction seems unnecessary. ??

??[https://github.com/apache/impala/blob/3d365276ea00f349df3629944b731eb4408d2c4f/be/src/exprs/aggregate-functions-ir.cc#L1819]??

?? Looking at this DsHllMerge function ??

??[https://github.com/apache/impala/blob/3d365276ea00f349df3629944b731eb4408d2c4f/be/src/exprs/aggregate-functions-ir.cc#L1759]??

??it seems that the merge is done pairwise. Is it possible to arrange this process as init, multiple merges and finalize (serialize) at the end? It is quite costly to initialize a union, update it with two sketches and then call get_result(). If many such merges happen, the overhead of initializing a fresh union and finalizing it for each pair can be substantial.??



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org