You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/03/12 10:08:45 UTC

[GitHub] [incubator-doris] whatsgh opened a new issue #5509: Doris 聚合模型bitmap聚合类型性能差的问题

whatsgh opened a new issue #5509:
URL: https://github.com/apache/incubator-doris/issues/5509


   doris 版本0.13
   表结构:
   ![image](https://user-images.githubusercontent.com/18162700/110923751-312a0780-835c-11eb-9c72-85ec117bce38.png)
   base 表一天差不多1亿左右数据,rollup表聚合后是一天1万条左右
   分区字段是dayStr,分桶字段是channel、pageId,分桶数是10
   问题:命中相同的rollup表,用bitmap聚合的时间是不用bitmap聚合的50-100倍左右。不清楚哪块的问题造成bitmap性能这么差
   以下是bitmap聚合查询sql和耗时:
   select dayStr dt,hourStr hour,channel,pageId,eventId,sum(pv) pv,bitmap_count(bitmap_union(uv)) uv   from aggr_user_action_event_rt_v2  where dayStr=20210312 and hourStr=9 and channel != '-'  group by dayStr,   hourStr,   channel,   pageId,   eventId;
   ![image](https://user-images.githubusercontent.com/18162700/110924957-96322d00-835d-11eb-80ac-2de7f188a88f.png)
   以下是不加bitmap聚合查询sql和耗时:
   select dayStr dt,hourStr hour,channel,pageId,eventId,sum(pv) pv   from aggr_user_action_event_rt_v2  where dayStr=20210312 and hourStr=9 and channel != '-'  group by dayStr,   hourStr,   channel,   pageId,   eventId;
   ![image](https://user-images.githubusercontent.com/18162700/110925078-b8c44600-835d-11eb-8b64-f605f507da1c.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] whatsgh closed issue #5509: Doris 聚合模型bitmap聚合类型性能差的问题

Posted by GitBox <gi...@apache.org>.
whatsgh closed issue #5509:
URL: https://github.com/apache/incubator-doris/issues/5509


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 commented on issue #5509: Doris 聚合模型bitmap聚合类型性能差的问题

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 commented on issue #5509:
URL: https://github.com/apache/incubator-doris/issues/5509#issuecomment-797417753


   The bitmap algorithm itself does not have very good computational performance when the cardinality is large and the distribution is loose. Performance tuning needs to be combined with modeling and real data distribution.
   1. Through the global dictionary, the value distribution of bitmap is changed from sparse to compact
   2. Observe your own data distribution to see if you can use the udaf orthogonal bitmap function
   How to use: http://doris.apache.org/master/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.html
   
   By the way, I would like to recommend an article to you: A performance tuning case for small programs using bitmap to achieve precision marketing.
   https://blog.csdn.net/weixin_47452131/article/details/113393764
   
   bitmap算法本身在基数大且分布松散的情况下,计算性能不是很好。需要结合建模,真实数据分布进行性能调优。
   1. 通过全局字典的方式,使得bitmap的数值分布从稀疏改为紧凑
   2. 观察自己的数据分布,看是否能利用上udaf 正交bitmap函数
   使用方式: http://doris.apache.org/master/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.html
   
   顺便给你推荐个文章:小程序使用 bitmap 实现精准营销的性能调优案例。
   https://blog.csdn.net/weixin_47452131/article/details/113393764


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wangbo commented on issue #5509: Doris 聚合模型bitmap聚合类型性能差的问题

Posted by GitBox <gi...@apache.org>.
wangbo commented on issue #5509:
URL: https://github.com/apache/incubator-doris/issues/5509#issuecomment-797394279


   你可以加一个profile的指标看下
   即使在命中rollup的情况,在基数比较高时,目前bitmap的主要计算开销在bitmap列的反序列化与基数计算上
   还有一部分开销在内存池的分配和释放上
   对于普通用户来说,目前bitmap比较快速的的优化方式如下
   1 增加并发度,包括单机的查询并发以及增加更多be节点
   2 优化输入数据的分布。bitmap当保存的数据比较连续且数据较小时,存储空间和查询性能较好,这个只能用户自己对输入的数据再做一层编码的映射
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org