You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/02 12:29:00 UTC

[GitHub] [incubator-doris] harryhan1989 opened a new issue, #9346: [Enhancement] 物化视图增强

harryhan1989 opened a new issue, #9346:
URL: https://github.com/apache/incubator-doris/issues/9346

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   物化视图限制太多了,功能太弱了。可否提供不带任何限制的物化视图,可以不要物化视图实时更新,定时增量刷新数据也行。现在的物化视图不支持 同一列的同时min max聚合,还不支持唯一健模型,也不支持计算列,简直是鸡肋的不行
   
   ### Solution
   
   物化视图限制太多了,功能太弱了。可否提供不带任何限制的物化视图,可以不要物化视图实时更新,定时增量刷新数据也行。现在的物化视图不支持 同一列的同时min max聚合,还不支持唯一健模型,也不支持计算列,简直是鸡肋的不行
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt commented on issue #9346: [Enhancement] 物化视图增强

Posted by "BiteTheDDDDt (via GitHub)" <gi...@apache.org>.
BiteTheDDDDt commented on issue #9346:
URL: https://github.com/apache/doris/issues/9346#issuecomment-1559087773

   > 
   
   Hello, to_bitmap(num_col) can be changed to bitmap_hash(num_col) in 2.0.
   
   As for why the case you listed is restricted, it is because the aggregation type of a certain column on the materialized view/rollup needs to be consistent with the base table. The mv with group by actually creates a corresponding agg table for the base table.
   
   For example, consider the following situations:
   The base table type is dup, with a value column k1 on it, and a materialized view with a sum column k1 can be created (the base table has no aggregation type).
   The base table type is agg, with sum column k1 on it, and a materialized view with sum column k1 can be created (the aggregation type is the same).
   The base table type is uniq, and there is a value column k1 on it. At this time, the aggregation type equivalent to k1 is replace, so the materialized view of k1 with the sum aggregation type cannot be created. (Different aggregation types)
   
   For specific reasons, you can understand the principle of the agg table, and think about what problems will exist if this restriction is not imposed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] stalary commented on issue #9346: [Enhancement] 物化视图增强

Posted by GitBox <gi...@apache.org>.
stalary commented on issue #9346:
URL: https://github.com/apache/incubator-doris/issues/9346#issuecomment-1119407776

   I'll enhance it later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on issue #9346: [Enhancement] 物化视图增强

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on issue #9346:
URL: https://github.com/apache/incubator-doris/issues/9346#issuecomment-1120611965

   在之后的版本里,向量化引擎的物化视图可能会提供更多计算功能


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] liuanxin commented on issue #9346: [Enhancement] 物化视图增强

Posted by "liuanxin (via GitHub)" <gi...@apache.org>.
liuanxin commented on issue #9346:
URL: https://github.com/apache/doris/issues/9346#issuecomment-1558961369

   `count(*)` 会扫描每个聚合 key 导致性能会差一点认了, `count(distinct varchar_column)` 为什么也不支持(居然只能支持数字字段?), 
   
   数据会重复导入, `unique` 模型不支持函数, `aggregate` 模型对函数的支持也是一堆限制, 那 `duplicate` 模型要怎么基于重复数据做 `sum(int)` 呢?
   
   2.0 也依然没有解决这些很基础的 sql 问题
   
   ```sql
   drop table if exists temp;
   create table temp (
       id varchar(8),
       `date` date,
       user_id varchar(8),
       channel_id varchar(8) replace, /* 如果改成 unique 或 duplicate 模型则去掉这里的 replace */
       `amount` int replace
   )
   aggregate key (id, `date`, user_id) /* unique = aggregate + 上面的 replace  */
   partition by range(`date`) (
       partition p202305 values less than ("2023-06-01")
   )
   distributed by hash(id) buckets 1
   properties (
       /* "enable_unique_key_merge_on_write" = "true", -- 如果用 unique key 可以使用 */
       "replication_allocation" = "tag.location.default: 1"
   );
   
   insert into temp(id, `date`, user_id, channel_id, amount) values
   ('a', '2023-05-01', 'aa', '123a', '50'),
   ('a', '2023-05-01', 'aa', '123a', '50'),
   ('b', '2023-05-01', 'aa', '123b', '10'),
   ('c', '2023-05-02', 'cc', '123c', '20'),
   ('d', '2023-05-03', 'dd', '123d', '30');
   
   create materialized view mv1 as /* 如果是 aggregate 或 unique 模型无法建立 */
   select `date`, user_id, count(distinct channel_id) /* 只有 duplicate 模型 + bitmap_union(to_bitmap(col)) 可以, 然而 col 不能是 varchar 这样基础类型 */
   from temp
   group by `date`, user_id
   order by `date` desc, user_id;
   /* 报错信息也很奇怪: SQL error(1105): errCode = 2, detailMessage = The function count must match pattern:count(column) */
   
   
   create materialized view mv1 as /* 如果是 aggregate 或 unique 模型无法建立 */
   select `date`, user_id, sum(amount) /* 如果是 duplicate 模型无法去重 */
   from temp
   group by `date`, user_id
   order by `date` desc, user_id;
   ```
   
   限制实在是太多了, 我有理由相信, 支持的只有文档中给出来的示例


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ShawshankLin commented on issue #9346: [Enhancement] 物化视图增强

Posted by GitBox <gi...@apache.org>.
ShawshankLin commented on issue #9346:
URL: https://github.com/apache/incubator-doris/issues/9346#issuecomment-1120422429

   > I'll enhance it later
   we also need materialized views of computable columns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ShawshankLin commented on issue #9346: [Enhancement] 物化视图增强

Posted by GitBox <gi...@apache.org>.
ShawshankLin commented on issue #9346:
URL: https://github.com/apache/incubator-doris/issues/9346#issuecomment-1120422486

   > I'll enhance it later
   
   we also need materialized views of computable columns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt commented on issue #9346: [Enhancement] 物化视图增强

Posted by "BiteTheDDDDt (via GitHub)" <gi...@apache.org>.
BiteTheDDDDt commented on issue #9346:
URL: https://github.com/apache/doris/issues/9346#issuecomment-1558807584

   > 
   
   master or the upcoming doris 2.0 will have new features in this regard.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 commented on issue #9346: [Enhancement] 物化视图增强

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 commented on issue #9346:
URL: https://github.com/apache/incubator-doris/issues/9346#issuecomment-1120637301

   > 也不支持计算列
   之所以支持的计算列不够丰富是考虑到:后续会更新物化视图计计算引擎,所以没必要在旧的引擎上去丰富计算列的功能。
   在 @BiteTheDDDDt  将物化视图切换到向量化引擎后,就可以支持丰富的计算列了。
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] liuanxin commented on issue #9346: [Enhancement] 物化视图增强

Posted by "liuanxin (via GitHub)" <gi...@apache.org>.
liuanxin commented on issue #9346:
URL: https://github.com/apache/doris/issues/9346#issuecomment-1558743076

   一年过去了, 还是像个玩具
   
   One year later, still like a toy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt commented on issue #9346: [Enhancement] 物化视图增强

Posted by "BiteTheDDDDt (via GitHub)" <gi...@apache.org>.
BiteTheDDDDt commented on issue #9346:
URL: https://github.com/apache/doris/issues/9346#issuecomment-1559088510

   > `count(*)` 会扫描每个聚合 key 导致性能会差一点认了, `count(distinct varchar_column)` 为什么也不支持(居然只能支持数字字段)
   > 
   > 数据如果会重复导入, 那就三种模型来说: `unique` 模型不支持函数; `aggregate` 模型没有提到像 `unique` 一样那么多限制, 但是在实际的过程中感觉跟 `unique` 是一样的, 各种不支持(如果只能改变顺序, 叫啥物化视图呢, 叫索引不更合适一点?); `duplicate` 模型要怎么基于重复数据做 sum 函数呢?
   > 
   > 2.0 也依然没有解决这些很基础的 sql 问题
   > 
   > ```sql
   > /* doris-version: 1.2.4.1 */
   > drop table if exists temp;
   > create table temp (
   >     id varchar(8),
   >     `date` date,
   >     user_id varchar(8),
   >     channel_id varchar(8) replace, /* 如果改成 unique 或 duplicate 模型则去掉这里的 replace */
   >     `amount` int replace
   > )
   > aggregate key (id, `date`, user_id) /* unique = aggregate + 上面的 replace  */
   > partition by range(`date`) (
   >     partition p202305 values less than ("2023-06-01")
   > )
   > distributed by hash(id) buckets 1
   > properties (
   >     /* "enable_unique_key_merge_on_write" = "true", -- 如果用 unique key 可以使用 */
   >     "replication_allocation" = "tag.location.default: 1"
   > );
   > 
   > insert into temp(id, `date`, user_id, channel_id, amount) values
   > ('a', '2023-05-01', 'aa', '123a', '50'),
   > ('a', '2023-05-01', 'aa', '123a', '50'),  /* 重复数据 */
   > ('b', '2023-05-01', 'aa', '123b', '10'),
   > ('c', '2023-05-02', 'cc', '123c', '20'),
   > ('d', '2023-05-03', 'dd', '123d', '30');
   > 
   > create materialized view mv1 as /* 如果是 aggregate 或 unique 模型无法建立 */
   > select `date`, user_id, count(distinct channel_id)
   > from temp /* 只有 duplicate 模型 + bitmap_union(to_bitmap(num_col)) 可以, 然而 num_col 不能是 varchar 类型 */
   > group by `date`, user_id
   > order by `date` desc, user_id;
   > /* SQL error(1105): errCode = 2, detailMessage = The function count must match pattern:count(column) */
   > 
   > 
   > create materialized view mv2 as /* 如果是 aggregate 或 unique 模型无法建立 */
   > select `date`, user_id, sum(amount) /* 如果是 duplicate 模型无法去重 */
   > from temp
   > group by `date`, user_id
   > order by `date` desc, user_id;
   > /* SQL error(1105): errCode = 2, detailMessage = The aggregation type of 
   > column[amount] must be same as the aggregate type of base column in aggregate table */
   > ```
   > 
   > 限制实在是太多了, 我有理由相信, 支持的只有文档中给出来的示例
   
   Hello, to_bitmap(num_col) can be changed to bitmap_hash(num_col) in 2.0.
   
   As for why the case you listed is restricted, it is because the aggregation type of a certain column on the materialized view/rollup needs to be consistent with the base table. The mv with group by actually creates a corresponding agg table for the base table.
   
   For example, consider the following situations:
   The base table type is dup, with a value column k1 on it, and a materialized view with a sum column k1 can be created (the base table has no aggregation type).
   The base table type is agg, with sum column k1 on it, and a materialized view with sum column k1 can be created (the aggregation type is the same).
   The base table type is uniq, and there is a value column k1 on it. At this time, the aggregation type equivalent to k1 is replace, so the materialized view of k1 with the sum aggregation type cannot be created. (Different aggregation types)
   
   For specific reasons, you can understand the principle of the agg table, and think about what problems will exist if this restriction is not imposed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org