You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@doris.apache.org by 蔡聪辉 <ca...@163.com> on 2022/06/26 13:49:19 UTC

[Discuss][Random Distribution Aggregate Table Query] Make consistent query result on Aggregate Keys table for both hash distribution and random distribution.



Hi, all devs. Now Doris has supported random distribution for Duplicate Keys table and Aggregate Keys table without replace(replace_if_not_null) type column to solve data skew problem.

But now, It brings some problem when we do some speical queris on aggregate table with random distribution. 

For example, if we have a aggreate table like following:




Table agg_table




ColumnNameType      Aggregate Type

k1char(5)

v1                     int        max 




Origin Data Source, which include 5 rows :                                                             

a  1

a  2

b  3

b  4

b  5




In hash distribution, the Doris can ensure that diffrent rows with same aggregate keys always in the same tablet(replica), 




so that the result for select * from agg_table is same with select * from (select k1, max(v1) from agg_table);




but for the random distribution, aggregate keys table with diffrent rows with same aggregate keys may be in the difffrent tablet(replica),




which may result in different reuslt from hash distribution.




But what is the real root cause? This is because Doris leave out the aggregate function and group by statement if needed when do select column operation from aggregate table,

and do aggregate function with group by operation only in the Doris storage layer.




In the current design, count(1) and select just value column without aggregate funcion or group by stament for aggregate table now is diffrent from query reuslt on rollup which base on duiplicate table.




For example, if we have duplcate table 

Table dup_table

ColumnName     Type 

k1    char(5)

v1                   int




the the rollup is 

   Rollup rollup_table

ColumnNameType      Aggregate Type

k1char(5)

v1                     int        max 




if we exectue query like "select * from dup_table " or "select count(1) from dup_table", we won't hit the rollup even if query rollup will cost less, this because select * or select count(1)

not really match the v1 column with aggregate function like min.




So. to make query result on random distribution consistent with hash distribution for aggregate table, here are two main solutions.

1. forbid directly select * from aggregate table, forbid select count(1) directly from aggregate table, and give the right sql to remind user that they should use aggregate function on value column or write group by statement explicitly.

2. keep the origin design, and we rewrite sql when user execute some special queries on aggregate table. 


These two solutions have their own advantages and disadvantages. You are welcome to give your opinions, feel free to discuss.

Re: [Discuss][Random Distribution Aggregate Table Query] Make consistent query result on Aggregate Keys table for both hash distribution and random distribution.

Posted by ling miao <li...@apache.org>.

I support solution2.
Although such queries with mismatched aggregation types are really
meaningless in real scenarios,
based on Doris's consistent design style,
we should allow users to submit such queries instead of banning them,
just like we now allow users to directly query ```select * from
agg_table```  on the AGG model.

Ling Miao

zhg yang <ya...@gmail.com> 于2022年6月27日周一 11:22写道：

> I support solution 2. We should provide the same query interface for
> different tables. The wrong query result is mainly caused by the query
> planning not taking into account the new data shuffle method. This problem
> is more like a bug than a feature, because I I think it's still a matter of
> fixing this issue, rather than restricting how users can use it
>
> Thanks
> Yang Zhengguo
>
>
> 蔡聪辉 <ca...@163.com> 于2022年6月26日周日 21:50写道：
>
> >
> >
> >
> > Hi, all devs. Now Doris has supported random distribution for Duplicate
> > Keys table and Aggregate Keys table without replace(replace_if_not_null)
> > type column to solve data skew problem.
> >
> > But now, It brings some problem when we do some speical queris on
> > aggregate table with random distribution.
> >
> > For example, if we have a aggreate table like following:
> >
> >
> >
> >
> > Table agg_table
> >
> >
> >
> >
> > ColumnNameType      Aggregate Type
> >
> > k1char(5)
> >
> > v1                     int        max
> >
> >
> >
> >
> > Origin Data Source, which include 5 rows :
> >
> >
> > a  1
> >
> > a  2
> >
> > b  3
> >
> > b  4
> >
> > b  5
> >
> >
> >
> >
> > In hash distribution, the Doris can ensure that diffrent rows with same
> > aggregate keys always in the same tablet(replica),
> >
> >
> >
> >
> > so that the result for select * from agg_table is same with select * from
> > (select k1, max(v1) from agg_table);
> >
> >
> >
> >
> > but for the random distribution, aggregate keys table with diffrent rows
> > with same aggregate keys may be in the difffrent tablet(replica),
> >
> >
> >
> >
> > which may result in different reuslt from hash distribution.
> >
> >
> >
> >
> > But what is the real root cause? This is because Doris leave out the
> > aggregate function and group by statement if needed when do select column
> > operation from aggregate table,
> >
> > and do aggregate function with group by operation only in the Doris
> > storage layer.
> >
> >
> >
> >
> > In the current design, count(1) and select just value column without
> > aggregate funcion or group by stament for aggregate table now is diffrent
> > from query reuslt on rollup which base on duiplicate table.
> >
> >
> >
> >
> > For example, if we have duplcate table
> >
> > Table dup_table
> >
> > ColumnName     Type
> >
> > k1    char(5)
> >
> > v1                   int
> >
> >
> >
> >
> > the the rollup is
> >
> >    Rollup rollup_table
> >
> > ColumnNameType      Aggregate Type
> >
> > k1char(5)
> >
> > v1                     int        max
> >
> >
> >
> >
> > if we exectue query like "select * from dup_table " or "select count(1)
> > from dup_table", we won't hit the rollup even if query rollup will cost
> > less, this because select * or select count(1)
> >
> > not really match the v1 column with aggregate function like min.
> >
> >
> >
> >
> > So. to make query result on random distribution consistent with hash
> > distribution for aggregate table, here are two main solutions.
> >
> > 1. forbid directly select * from aggregate table, forbid select count(1)
> > directly from aggregate table, and give the right sql to remind user that
> > they should use aggregate function on value column or write group by
> > statement explicitly.
> >
> > 2. keep the origin design, and we rewrite sql when user execute some
> > special queries on aggregate table.
> >
> >
> > These two solutions have their own advantages and disadvantages. You are
> > welcome to give your opinions, feel free to discuss.
> >
> >
>


-- 
Ling Miao | Apache Doris

Re: [Discuss][Random Distribution Aggregate Table Query] Make consistent query result on Aggregate Keys table for both hash distribution and random distribution.

Posted by zhg yang <ya...@gmail.com>.

I support solution 2. We should provide the same query interface for
different tables. The wrong query result is mainly caused by the query
planning not taking into account the new data shuffle method. This problem
is more like a bug than a feature, because I I think it's still a matter of
fixing this issue, rather than restricting how users can use it

Thanks
Yang Zhengguo


蔡聪辉 <ca...@163.com> 于2022年6月26日周日 21:50写道：

>
>
>
> Hi, all devs. Now Doris has supported random distribution for Duplicate
> Keys table and Aggregate Keys table without replace(replace_if_not_null)
> type column to solve data skew problem.
>
> But now, It brings some problem when we do some speical queris on
> aggregate table with random distribution.
>
> For example, if we have a aggreate table like following:
>
>
>
>
> Table agg_table
>
>
>
>
> ColumnNameType      Aggregate Type
>
> k1char(5)
>
> v1                     int        max
>
>
>
>
> Origin Data Source, which include 5 rows :
>
>
> a  1
>
> a  2
>
> b  3
>
> b  4
>
> b  5
>
>
>
>
> In hash distribution, the Doris can ensure that diffrent rows with same
> aggregate keys always in the same tablet(replica),
>
>
>
>
> so that the result for select * from agg_table is same with select * from
> (select k1, max(v1) from agg_table);
>
>
>
>
> but for the random distribution, aggregate keys table with diffrent rows
> with same aggregate keys may be in the difffrent tablet(replica),
>
>
>
>
> which may result in different reuslt from hash distribution.
>
>
>
>
> But what is the real root cause? This is because Doris leave out the
> aggregate function and group by statement if needed when do select column
> operation from aggregate table,
>
> and do aggregate function with group by operation only in the Doris
> storage layer.
>
>
>
>
> In the current design, count(1) and select just value column without
> aggregate funcion or group by stament for aggregate table now is diffrent
> from query reuslt on rollup which base on duiplicate table.
>
>
>
>
> For example, if we have duplcate table
>
> Table dup_table
>
> ColumnName     Type
>
> k1    char(5)
>
> v1                   int
>
>
>
>
> the the rollup is
>
>    Rollup rollup_table
>
> ColumnNameType      Aggregate Type
>
> k1char(5)
>
> v1                     int        max
>
>
>
>
> if we exectue query like "select * from dup_table " or "select count(1)
> from dup_table", we won't hit the rollup even if query rollup will cost
> less, this because select * or select count(1)
>
> not really match the v1 column with aggregate function like min.
>
>
>
>
> So. to make query result on random distribution consistent with hash
> distribution for aggregate table, here are two main solutions.
>
> 1. forbid directly select * from aggregate table, forbid select count(1)
> directly from aggregate table, and give the right sql to remind user that
> they should use aggregate function on value column or write group by
> statement explicitly.
>
> 2. keep the origin design, and we rewrite sql when user execute some
> special queries on aggregate table.
>
>
> These two solutions have their own advantages and disadvantages. You are
> welcome to give your opinions, feel free to discuss.
>
>

Re: [Discuss][Random Distribution Aggregate Table Query] Make consistent query result on Aggregate Keys table for both hash distribution and random distribution.

Posted by 张家峰 <zh...@gmail.com>.

nice work！
In the past few days, two or three users have raised the issue of data
skew. This solution is very good. I prefer the second of the two solutions,
so that users will not perceive it.

蔡聪辉 <ca...@163.com> 于2022年6月26日周日 21:49写道：

>
>
>
> Hi, all devs. Now Doris has supported random distribution for Duplicate
> Keys table and Aggregate Keys table without replace(replace_if_not_null)
> type column to solve data skew problem.
>
> But now, It brings some problem when we do some speical queris on
> aggregate table with random distribution.
>
> For example, if we have a aggreate table like following:
>
>
>
>
> Table agg_table
>
>
>
>
> ColumnNameType      Aggregate Type
>
> k1char(5)
>
> v1                     int        max
>
>
>
>
> Origin Data Source, which include 5 rows :
>
>
> a  1
>
> a  2
>
> b  3
>
> b  4
>
> b  5
>
>
>
>
> In hash distribution, the Doris can ensure that diffrent rows with same
> aggregate keys always in the same tablet(replica),
>
>
>
>
> so that the result for select * from agg_table is same with select * from
> (select k1, max(v1) from agg_table);
>
>
>
>
> but for the random distribution, aggregate keys table with diffrent rows
> with same aggregate keys may be in the difffrent tablet(replica),
>
>
>
>
> which may result in different reuslt from hash distribution.
>
>
>
>
> But what is the real root cause? This is because Doris leave out the
> aggregate function and group by statement if needed when do select column
> operation from aggregate table,
>
> and do aggregate function with group by operation only in the Doris
> storage layer.
>
>
>
>
> In the current design, count(1) and select just value column without
> aggregate funcion or group by stament for aggregate table now is diffrent
> from query reuslt on rollup which base on duiplicate table.
>
>
>
>
> For example, if we have duplcate table
>
> Table dup_table
>
> ColumnName     Type
>
> k1    char(5)
>
> v1                   int
>
>
>
>
> the the rollup is
>
>    Rollup rollup_table
>
> ColumnNameType      Aggregate Type
>
> k1char(5)
>
> v1                     int        max
>
>
>
>
> if we exectue query like "select * from dup_table " or "select count(1)
> from dup_table", we won't hit the rollup even if query rollup will cost
> less, this because select * or select count(1)
>
> not really match the v1 column with aggregate function like min.
>
>
>
>
> So. to make query result on random distribution consistent with hash
> distribution for aggregate table, here are two main solutions.
>
> 1. forbid directly select * from aggregate table, forbid select count(1)
> directly from aggregate table, and give the right sql to remind user that
> they should use aggregate function on value column or write group by
> statement explicitly.
>
> 2. keep the origin design, and we rewrite sql when user execute some
> special queries on aggregate table.
>
>
> These two solutions have their own advantages and disadvantages. You are
> welcome to give your opinions, feel free to discuss.
>
>

-- 
张家峰

Re: [Discuss][Random Distribution Aggregate Table Query] Make consistent query result on Aggregate Keys table for both hash distribution and random distribution.

Posted by ling miao <em...@gmail.com>.

> I have another question that whether rewrite sql and add a shuffle may
cause performance drop in some scenes.

Yes, but that's the price random tables have to pay.
Random's agg table means that it can no longer perform complete
pre-aggregation operations.
So there will be a loss of code performance. (It's the same no matter which
solution we choose)

Ling Miao

王博 <wa...@gmail.com> 于2022年6月27日周一 12:37写道：

> For Agg table,  data distribution way should keep the consistent semantics,
> from this perspective, I think solution 2 is better,this reduces the cost
> of user understanding.
> I have another question that whether rewrite sql and add a shuffle may
> cause performance drop in some scenes.
>
> 蔡聪辉 <ca...@163.com> 于2022年6月26日周日 21:49写道：
>
> >
> >
> >
> > Hi, all devs. Now Doris has supported random distribution for Duplicate
> > Keys table and Aggregate Keys table without replace(replace_if_not_null)
> > type column to solve data skew problem.
> >
> > But now, It brings some problem when we do some speical queris on
> > aggregate table with random distribution.
> >
> > For example, if we have a aggreate table like following:
> >
> >
> >
> >
> > Table agg_table
> >
> >
> >
> >
> > ColumnNameType      Aggregate Type
> >
> > k1char(5)
> >
> > v1                     int        max
> >
> >
> >
> >
> > Origin Data Source, which include 5 rows :
> >
> >
> > a  1
> >
> > a  2
> >
> > b  3
> >
> > b  4
> >
> > b  5
> >
> >
> >
> >
> > In hash distribution, the Doris can ensure that diffrent rows with same
> > aggregate keys always in the same tablet(replica),
> >
> >
> >
> >
> > so that the result for select * from agg_table is same with select * from
> > (select k1, max(v1) from agg_table);
> >
> >
> >
> >
> > but for the random distribution, aggregate keys table with diffrent rows
> > with same aggregate keys may be in the difffrent tablet(replica),
> >
> >
> >
> >
> > which may result in different reuslt from hash distribution.
> >
> >
> >
> >
> > But what is the real root cause? This is because Doris leave out the
> > aggregate function and group by statement if needed when do select column
> > operation from aggregate table,
> >
> > and do aggregate function with group by operation only in the Doris
> > storage layer.
> >
> >
> >
> >
> > In the current design, count(1) and select just value column without
> > aggregate funcion or group by stament for aggregate table now is diffrent
> > from query reuslt on rollup which base on duiplicate table.
> >
> >
> >
> >
> > For example, if we have duplcate table
> >
> > Table dup_table
> >
> > ColumnName     Type
> >
> > k1    char(5)
> >
> > v1                   int
> >
> >
> >
> >
> > the the rollup is
> >
> >    Rollup rollup_table
> >
> > ColumnNameType      Aggregate Type
> >
> > k1char(5)
> >
> > v1                     int        max
> >
> >
> >
> >
> > if we exectue query like "select * from dup_table " or "select count(1)
> > from dup_table", we won't hit the rollup even if query rollup will cost
> > less, this because select * or select count(1)
> >
> > not really match the v1 column with aggregate function like min.
> >
> >
> >
> >
> > So. to make query result on random distribution consistent with hash
> > distribution for aggregate table, here are two main solutions.
> >
> > 1. forbid directly select * from aggregate table, forbid select count(1)
> > directly from aggregate table, and give the right sql to remind user that
> > they should use aggregate function on value column or write group by
> > statement explicitly.
> >
> > 2. keep the origin design, and we rewrite sql when user execute some
> > special queries on aggregate table.
> >
> >
> > These two solutions have their own advantages and disadvantages. You are
> > welcome to give your opinions, feel free to discuss.
> >
> >
>
> --
> 王博  Wang Bo
>

Re: [Discuss][Random Distribution Aggregate Table Query] Make consistent query result on Aggregate Keys table for both hash distribution and random distribution.

Posted by 王博 <wa...@gmail.com>.

For Agg table,  data distribution way should keep the consistent semantics,
from this perspective, I think solution 2 is better,this reduces the cost
of user understanding.
I have another question that whether rewrite sql and add a shuffle may
cause performance drop in some scenes.

蔡聪辉 <ca...@163.com> 于2022年6月26日周日 21:49写道：

>
>
>
> Hi, all devs. Now Doris has supported random distribution for Duplicate
> Keys table and Aggregate Keys table without replace(replace_if_not_null)
> type column to solve data skew problem.
>
> But now, It brings some problem when we do some speical queris on
> aggregate table with random distribution.
>
> For example, if we have a aggreate table like following:
>
>
>
>
> Table agg_table
>
>
>
>
> ColumnNameType      Aggregate Type
>
> k1char(5)
>
> v1                     int        max
>
>
>
>
> Origin Data Source, which include 5 rows :
>
>
> a  1
>
> a  2
>
> b  3
>
> b  4
>
> b  5
>
>
>
>
> In hash distribution, the Doris can ensure that diffrent rows with same
> aggregate keys always in the same tablet(replica),
>
>
>
>
> so that the result for select * from agg_table is same with select * from
> (select k1, max(v1) from agg_table);
>
>
>
>
> but for the random distribution, aggregate keys table with diffrent rows
> with same aggregate keys may be in the difffrent tablet(replica),
>
>
>
>
> which may result in different reuslt from hash distribution.
>
>
>
>
> But what is the real root cause? This is because Doris leave out the
> aggregate function and group by statement if needed when do select column
> operation from aggregate table,
>
> and do aggregate function with group by operation only in the Doris
> storage layer.
>
>
>
>
> In the current design, count(1) and select just value column without
> aggregate funcion or group by stament for aggregate table now is diffrent
> from query reuslt on rollup which base on duiplicate table.
>
>
>
>
> For example, if we have duplcate table
>
> Table dup_table
>
> ColumnName     Type
>
> k1    char(5)
>
> v1                   int
>
>
>
>
> the the rollup is
>
>    Rollup rollup_table
>
> ColumnNameType      Aggregate Type
>
> k1char(5)
>
> v1                     int        max
>
>
>
>
> if we exectue query like "select * from dup_table " or "select count(1)
> from dup_table", we won't hit the rollup even if query rollup will cost
> less, this because select * or select count(1)
>
> not really match the v1 column with aggregate function like min.
>
>
>
>
> So. to make query result on random distribution consistent with hash
> distribution for aggregate table, here are two main solutions.
>
> 1. forbid directly select * from aggregate table, forbid select count(1)
> directly from aggregate table, and give the right sql to remind user that
> they should use aggregate function on value column or write group by
> statement explicitly.
>
> 2. keep the origin design, and we rewrite sql when user execute some
> special queries on aggregate table.
>
>
> These two solutions have their own advantages and disadvantages. You are
> welcome to give your opinions, feel free to discuss.
>
>

-- 
王博  Wang Bo

Re: [Discuss][Random Distribution Aggregate Table Query] Make consistent query result on Aggregate Keys table for both hash distribution and random distribution.

Posted by Jianliang Qi <ji...@gmail.com>.

solution 2, +1

It is better to make it easier for users to use.

Jianliang Qi

On Tue, Jun 28, 2022 at 2:36 PM Zhou Minghong <mi...@163.com> wrote:

> I support solution 2.
>
>
> We add a new shuffle type, so we have to expand our agg code to support
> the new cases.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2022-06-26 21:49:19, "蔡聪辉" <ca...@163.com> wrote:
> >
> >
> >
> >Hi, all devs. Now Doris has supported random distribution for Duplicate
> Keys table and Aggregate Keys table without replace(replace_if_not_null)
> type column to solve data skew problem.
> >
> >But now, It brings some problem when we do some speical queris on
> aggregate table with random distribution.
> >
> >For example, if we have a aggreate table like following:
> >
> >
> >
> >
> >Table agg_table
> >
> >
> >
> >
> >ColumnNameType      Aggregate Type
> >
> >k1char(5)
> >
> >v1                     int        max
> >
> >
> >
> >
> >Origin Data Source, which include 5 rows :
>
> >
> >a  1
> >
> >a  2
> >
> >b  3
> >
> >b  4
> >
> >b  5
> >
> >
> >
> >
> >In hash distribution, the Doris can ensure that diffrent rows with same
> aggregate keys always in the same tablet(replica),
> >
> >
> >
> >
> >so that the result for select * from agg_table is same with select * from
> (select k1, max(v1) from agg_table);
> >
> >
> >
> >
> >but for the random distribution, aggregate keys table with diffrent rows
> with same aggregate keys may be in the difffrent tablet(replica),
> >
> >
> >
> >
> >which may result in different reuslt from hash distribution.
> >
> >
> >
> >
> >But what is the real root cause? This is because Doris leave out the
> aggregate function and group by statement if needed when do select column
> operation from aggregate table,
> >
> >and do aggregate function with group by operation only in the Doris
> storage layer.
> >
> >
> >
> >
> >In the current design, count(1) and select just value column without
> aggregate funcion or group by stament for aggregate table now is diffrent
> from query reuslt on rollup which base on duiplicate table.
> >
> >
> >
> >
> >For example, if we have duplcate table
> >
> >Table dup_table
> >
> >ColumnName     Type
> >
> >k1    char(5)
> >
> >v1                   int
> >
> >
> >
> >
> >the the rollup is
> >
> >   Rollup rollup_table
> >
> >ColumnNameType      Aggregate Type
> >
> >k1char(5)
> >
> >v1                     int        max
> >
> >
> >
> >
> >if we exectue query like "select * from dup_table " or "select count(1)
> from dup_table", we won't hit the rollup even if query rollup will cost
> less, this because select * or select count(1)
> >
> >not really match the v1 column with aggregate function like min.
> >
> >
> >
> >
> >So. to make query result on random distribution consistent with hash
> distribution for aggregate table, here are two main solutions.
> >
> >1. forbid directly select * from aggregate table, forbid select count(1)
> directly from aggregate table, and give the right sql to remind user that
> they should use aggregate function on value column or write group by
> statement explicitly.
> >
> >2. keep the origin design, and we rewrite sql when user execute some
> special queries on aggregate table.
> >
> >
> >These two solutions have their own advantages and disadvantages. You are
> welcome to give your opinions, feel free to discuss.
> >
>

Re:[Discuss][Random Distribution Aggregate Table Query] Make consistent query result on Aggregate Keys table for both hash distribution and random distribution.

Posted by Zhou Minghong <mi...@163.com>.

I support solution 2.


We add a new shuffle type, so we have to expand our agg code to support the new cases.

















At 2022-06-26 21:49:19, "蔡聪辉" <ca...@163.com> wrote:
>
>
>
>Hi, all devs. Now Doris has supported random distribution for Duplicate Keys table and Aggregate Keys table without replace(replace_if_not_null) type column to solve data skew problem.
>
>But now, It brings some problem when we do some speical queris on aggregate table with random distribution. 
>
>For example, if we have a aggreate table like following:
>
>
>
>
>Table agg_table
>
>
>
>
>ColumnNameType      Aggregate Type
>
>k1char(5)
>
>v1                     int        max 
>
>
>
>
>Origin Data Source, which include 5 rows :                                                             
>
>a  1
>
>a  2
>
>b  3
>
>b  4
>
>b  5
>
>
>
>
>In hash distribution, the Doris can ensure that diffrent rows with same aggregate keys always in the same tablet(replica), 
>
>
>
>
>so that the result for select * from agg_table is same with select * from (select k1, max(v1) from agg_table);
>
>
>
>
>but for the random distribution, aggregate keys table with diffrent rows with same aggregate keys may be in the difffrent tablet(replica),
>
>
>
>
>which may result in different reuslt from hash distribution.
>
>
>
>
>But what is the real root cause? This is because Doris leave out the aggregate function and group by statement if needed when do select column operation from aggregate table,
>
>and do aggregate function with group by operation only in the Doris storage layer.
>
>
>
>
>In the current design, count(1) and select just value column without aggregate funcion or group by stament for aggregate table now is diffrent from query reuslt on rollup which base on duiplicate table.
>
>
>
>
>For example, if we have duplcate table 
>
>Table dup_table
>
>ColumnName     Type 
>
>k1    char(5)
>
>v1                   int
>
>
>
>
>the the rollup is 
>
>   Rollup rollup_table
>
>ColumnNameType      Aggregate Type
>
>k1char(5)
>
>v1                     int        max 
>
>
>
>
>if we exectue query like "select * from dup_table " or "select count(1) from dup_table", we won't hit the rollup even if query rollup will cost less, this because select * or select count(1)
>
>not really match the v1 column with aggregate function like min.
>
>
>
>
>So. to make query result on random distribution consistent with hash distribution for aggregate table, here are two main solutions.
>
>1. forbid directly select * from aggregate table, forbid select count(1) directly from aggregate table, and give the right sql to remind user that they should use aggregate function on value column or write group by statement explicitly.
>
>2. keep the origin design, and we rewrite sql when user execute some special queries on aggregate table. 
>
>
>These two solutions have their own advantages and disadvantages. You are welcome to give your opinions, feel free to discuss.
>