You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/04/26 09:05:34 UTC

[GitHub] [incubator-doris] liuxiahuiyi opened a new issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

liuxiahuiyi opened a new issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] caiconghui commented on issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

Posted by GitBox <gi...@apache.org>.
caiconghui commented on issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708#issuecomment-827291835


   According to your description, you can only set the aggregation type of the column to replace to remove duplicates,now, Doris cannot support replace and sum aggreation at the same time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangmingjie2018 commented on issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

Posted by GitBox <gi...@apache.org>.
yangmingjie2018 commented on issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708#issuecomment-827224064


   并没有看明白你的问题,能举个栗子吗?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] liuxiahuiyi commented on issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

Posted by GitBox <gi...@apache.org>.
liuxiahuiyi commented on issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708#issuecomment-827372975


   > 并没有看明白你的问题,能举个栗子吗?
   
   例如有一个table:  CREATE TABLE purchase (uuid VARCHAR(20), userid INT, cost DOUBLE SUM)  AGGREGATE KEY (uuid, userid) , 其中uuid表示一个行的唯一id且没有重复,cost表示该userid的消费金额, 通过在该表示建立rollup或者materialized view 
    ```ALTER TABLE purchase ADD ROLLUP rollup_purchase(userid, cost)```,可以快速的查询到每个userid的总消费金额是多少。但是,如果在load数据到这张table里时有重复数据,也就是重复的uuid,那么此时该uuid对应的cost会被相加两次,导致结果出现错误。我调研了很久都没看到在doris中能利用数据模型去避免这种重复数据的方法


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] liuxiahuiyi commented on issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

Posted by GitBox <gi...@apache.org>.
liuxiahuiyi commented on issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708#issuecomment-827378008


   > According to your description, you can only set the aggregation type of the column to replace to remove duplicates,now, Doris cannot support replace and sum aggreation at the same time.
   
   你好,请问下materialized view 为什么不能支持多个列之间的运算呢?因为如果支持列之间的运算,可以多增加一列count用来记录重复数据的行数,重复累加的数据除以这个count就是真实值了


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] liuxiahuiyi commented on issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

Posted by GitBox <gi...@apache.org>.
liuxiahuiyi commented on issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708#issuecomment-826662094


   接ISSUE,该需要SUM的VALUE列是为了后面ROLLUP或者物化视图中需要计算SUM,所以不能把该VALUE列的聚合类型变为REPLACE


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] liuxiahuiyi commented on issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

Posted by GitBox <gi...@apache.org>.
liuxiahuiyi commented on issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708#issuecomment-827389558


   > > > 并没有看明白你的问题,能举个栗子吗?
   > > 
   > > 
   > > 例如有三行数据: uuid1,10001,10.5; uuid2,10001,5.5; uuid1,10001,10.5; 其中第一行和第三行数据是重复的,希望的rollup结果是userid 10001的总消费是10.5+5.5=16,而不是10.5+5.5+10.5=26.5
   > 
   > it is the business logic, doris can not Identify the duplicate rows, actually in our production environment, it may occur that
   > two rows are same but they are different records
   
   的确是个business logic,但是想尽可能通过一些数据模型的设计去避免这些问题,如果materialized view 时候可以改变cost列的aggregation type或者materialized view 能够支持两列之间的除法,是可以做到的。请问下doris之后的版有可能支持上面说的这两个features吗


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] msj100f commented on issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

Posted by GitBox <gi...@apache.org>.
msj100f commented on issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708#issuecomment-851755776


   碰到一样的需求,希望在UNIQUE KEY的列上创建物化视图做聚合,发现并不能支持,如果用AGGREGATE KEY,那么假如程序异常需要重新消费数据会造成数据重复,聚合后的值就不对了。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] liuxiahuiyi commented on issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

Posted by GitBox <gi...@apache.org>.
liuxiahuiyi commented on issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708#issuecomment-827375148


   > 并没有看明白你的问题,能举个栗子吗?
   
   例如有三行数据:  uuid1,10001,10.5;     uuid2,10001,5.5;    uuid1,10001,10.5;   其中第一行和第三行数据是重复的,希望的rollup结果是userid 10001的总消费是10.5+5.5=16,而不是10.5+5.5+10.5=26.5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] caiconghui commented on issue #5708: 明细数据去重,再指定了KEY列后,对于需要SUM的VALUE列,如果有重复数据的话,该SUM的VALUE列的值也会重复的增加,但是并不想要这种重复数据的相加,请问下目前的DORIS版本怎么解决这个问题

Posted by GitBox <gi...@apache.org>.
caiconghui commented on issue #5708:
URL: https://github.com/apache/incubator-doris/issues/5708#issuecomment-827381186


   > > 并没有看明白你的问题,能举个栗子吗?
   > 
   > 例如有三行数据: uuid1,10001,10.5; uuid2,10001,5.5; uuid1,10001,10.5; 其中第一行和第三行数据是重复的,希望的rollup结果是userid 10001的总消费是10.5+5.5=16,而不是10.5+5.5+10.5=26.5
   
   it is the business logic, doris can not Identify the duplicate rows, actually in our production environment, it may occur that
   two rows are same but they are different records


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org