You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2023/01/14 08:13:16 UTC

[GitHub] [doris] ShawshankLin opened a new issue, #15927: [Feature] bitmap int uses the built-in dictionary table

ShawshankLin opened a new issue, #15927:
URL: https://github.com/apache/doris/issues/15927

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   we need use bitmap, but now doris bitmap only supports int columns. our userid column is varchar type, so we need to map the string userid to an int value.  we maintain a dictionary table of userids, and join it with our real-time data.
   
   ![image](https://user-images.githubusercontent.com/16248686/212461964-3234437c-3d98-4b11-9675-c2055c5ee645.png)
   
   but there are several problems with this:
   1.flink uses dimension table association, dimension table is relatively large, and the efficiency of reference is slow.
   2.with multiple bitmaps, multiple dictionary tables need to be maintained, and there is a delay in refreshing dimension tables.
   
   
   so we were wondering if we could support  some special functions are supported for use with dictionaries in queries. combining dictionaries with functions is simpler and more efficient than combining JOIN operations with reference tables. like ck function:
   
   dictGet('dict_name', attr_names, id_expr)
   dictGetOrDefault('dict_name', attr_names, id_expr, default_value_expr)
   dictGetOrNull('dict_name', attr_name, id_expr)
   https://clickhouse.com/docs/en/sql-reference/functions/ext-dict-functions/
   
   the ck dict design seems to have more features, and I'm not sure it's quite what I need.
   
   ![image](https://user-images.githubusercontent.com/16248686/212462415-cec3246b-5220-45d6-9364-326d3beb4862.png)
   
   
   我们需要使用bitmap,但是现在doris bitmap只支持int类型。而我们的用户id是varchar类型,因此我们需要映射varchar成一个int值,索引我们维护了一个字典表,用于在实时数据进行关联,转换成用户id int 值。
   
   但是这样会有两个问题:
   1. flink 实时数据对维表进行join,如果是大维表,关联效率是比较差的。
   2. 如果有多个bitmap,就需要维护多个字典表,字典表需要效率计算,并且也会代表延迟
   
   因此我们希望可以支持一些特殊函数配合字典在查询中使用,将字典与函数结合使用比将  JOIN  操作与引用表结合使用更简单、更有效。比如ck的字典函数
   
   dictGet('dict_name', attr_names, id_expr)
   dictGetOrDefault('dict_name', attr_names, id_expr, default_value_expr)
   dictGetOrNull('dict_name', attr_name, id_expr)
   https://clickhouse.com/docs/en/sql-reference/functions/ext-dict-functions/
   
   ck的字典设计好像功能更复杂,我不确定他的是否和我需要的完全一样。
   
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org