You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by ling miao <em...@gmail.com> on 2020/04/16 09:13:14 UTC

[PROPOSAL] Materialized view 2.0

Hi everyone,

*The status of materialized view 1.0*
In the present, we have supported the materialized views in Doris 0.12
version. The materialized view selector supports to select the most
efficient mv and rewrite the SQL to query against the selected mv instead
of the base table.
For query results contain a small number of rows where the original table
has a large amount of data, the performance can reach the 5X to 100X times
depends on the cardinality of the data.
The aggregate functions supported by the materialized view in 0.12 include:
sum, min, max.

However, the aggregate functions supported by the current materialized view
are not rich enough to fully cover the user's scene.
For example, in the `Order` scenario, user needs to analyze the number of
orders in different dimensions.
Another example is the count_distinct function is used for analyzing PV and
UV data in website traffic.

*The goal of materialized view 2.0*
In order to support more scenarios, the materialized view 2.0 will support
the following functions:

1. Materialized view supports aggregate functions: count, count_distinct
(bitmap and hll)
2. Support to create materialized views of the same column with different
aggregate functions. For example: select k1, sum (v1), min (v1) from table
group by k1


What features do you want the materialized view 2.0 to support?

Looking forward to your idea~

LingMiao

Re: [PROPOSAL] Materialized view 2.0

Posted by ling miao <em...@gmail.com>.
If we want to support the expr of key, the different load plan will be need
between base and rollup.
The key of rollup needs to be transformed by the to_mouth function before
the storage engine aggregate the value column.

From the perspective of demand, users only need to be able to store day,
month data.
From an implementation perspective, is it a good way to implement it with
the current materialized view?


Zhao Chun <zh...@apache.org> 于2020年4月16日周四 下午5:40写道:

> Hi Ling
>
> Very glad to see this proposal.
>
> We can discuss whether to support expression operations.
>
> For the case that expressions are applied on the KEY column. For example we
> can easily get a monthly table
> through a rollup which is defined by "group by to_month(day_column)".
>
> For the case that expression are applied on the value column. For example
> we can get a PV aggregate table
> through a rollup which is defined by "sum(case event_column when
> "page_view" then 1 else 0 end)".
>
> Thanks,
> Zhao Chun
>
>
> ling miao <em...@gmail.com> 于2020年4月16日周四 下午5:13写道:
>
> > Hi everyone,
> >
> > *The status of materialized view 1.0*
> > In the present, we have supported the materialized views in Doris 0.12
> > version. The materialized view selector supports to select the most
> > efficient mv and rewrite the SQL to query against the selected mv instead
> > of the base table.
> > For query results contain a small number of rows where the original table
> > has a large amount of data, the performance can reach the 5X to 100X
> times
> > depends on the cardinality of the data.
> > The aggregate functions supported by the materialized view in 0.12
> include:
> > sum, min, max.
> >
> > However, the aggregate functions supported by the current materialized
> view
> > are not rich enough to fully cover the user's scene.
> > For example, in the `Order` scenario, user needs to analyze the number of
> > orders in different dimensions.
> > Another example is the count_distinct function is used for analyzing PV
> and
> > UV data in website traffic.
> >
> > *The goal of materialized view 2.0*
> > In order to support more scenarios, the materialized view 2.0 will
> support
> > the following functions:
> >
> > 1. Materialized view supports aggregate functions: count, count_distinct
> > (bitmap and hll)
> > 2. Support to create materialized views of the same column with different
> > aggregate functions. For example: select k1, sum (v1), min (v1) from
> table
> > group by k1
> >
> >
> > What features do you want the materialized view 2.0 to support?
> >
> > Looking forward to your idea~
> >
> > LingMiao
> >
>

Re: Re: [PROPOSAL] Materialized view 2.0

Posted by ling miao <em...@gmail.com>.
If expressions are supported, the storage engine will changed when the
function column is built automatically.
Also the mv selector need a expression calculation tool to judge whether mv
could be selected for query.

After Doris support the count_distinct and count, I will consider this
feature.

陈明雨 <mo...@163.com> 于2020年4月16日周四 下午10:50写道:

> That could be good.
>
>
> However, if expressions are supported, the function of automatically
> selecting rollup may fail.
> Because the reason why Doris can automatically select the rollup is based
> on keeping the column values consistent.
> When the column value generates a new value through the expression, the
> column will no longer be consistent.
>
>
> Therefore, we may need a completely new design, such as the "relevant
> table" conception to create views,
> and maintain the data consistency of views through the function of atomic
> load operation.
>
>
>
>
> --
>
> 此致!Best Regards
> 陈明雨 Mingyu Chen
>
> Email:
> chenmingyu@apache.org
>
>
>
>
>
> At 2020-04-16 17:40:34, "Zhao Chun" <zh...@apache.org> wrote:
> >Hi Ling
> >
> >Very glad to see this proposal.
> >
> >We can discuss whether to support expression operations.
> >
> >For the case that expressions are applied on the KEY column. For example
> we
> >can easily get a monthly table
> >through a rollup which is defined by "group by to_month(day_column)".
> >
> >For the case that expression are applied on the value column. For example
> >we can get a PV aggregate table
> >through a rollup which is defined by "sum(case event_column when
> >"page_view" then 1 else 0 end)".
> >
> >Thanks,
> >Zhao Chun
> >
> >
> >ling miao <em...@gmail.com> 于2020年4月16日周四 下午5:13写道:
> >
> >> Hi everyone,
> >>
> >> *The status of materialized view 1.0*
> >> In the present, we have supported the materialized views in Doris 0.12
> >> version. The materialized view selector supports to select the most
> >> efficient mv and rewrite the SQL to query against the selected mv
> instead
> >> of the base table.
> >> For query results contain a small number of rows where the original
> table
> >> has a large amount of data, the performance can reach the 5X to 100X
> times
> >> depends on the cardinality of the data.
> >> The aggregate functions supported by the materialized view in 0.12
> include:
> >> sum, min, max.
> >>
> >> However, the aggregate functions supported by the current materialized
> view
> >> are not rich enough to fully cover the user's scene.
> >> For example, in the `Order` scenario, user needs to analyze the number
> of
> >> orders in different dimensions.
> >> Another example is the count_distinct function is used for analyzing PV
> and
> >> UV data in website traffic.
> >>
> >> *The goal of materialized view 2.0*
> >> In order to support more scenarios, the materialized view 2.0 will
> support
> >> the following functions:
> >>
> >> 1. Materialized view supports aggregate functions: count, count_distinct
> >> (bitmap and hll)
> >> 2. Support to create materialized views of the same column with
> different
> >> aggregate functions. For example: select k1, sum (v1), min (v1) from
> table
> >> group by k1
> >>
> >>
> >> What features do you want the materialized view 2.0 to support?
> >>
> >> Looking forward to your idea~
> >>
> >> LingMiao
> >>
>

Re:Re: [PROPOSAL] Materialized view 2.0

Posted by 陈明雨 <mo...@163.com>.
That could be good.


However, if expressions are supported, the function of automatically selecting rollup may fail.
Because the reason why Doris can automatically select the rollup is based on keeping the column values consistent.
When the column value generates a new value through the expression, the column will no longer be consistent.


Therefore, we may need a completely new design, such as the "relevant table" conception to create views,
and maintain the data consistency of views through the function of atomic load operation.




--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
chenmingyu@apache.org





At 2020-04-16 17:40:34, "Zhao Chun" <zh...@apache.org> wrote:
>Hi Ling
>
>Very glad to see this proposal.
>
>We can discuss whether to support expression operations.
>
>For the case that expressions are applied on the KEY column. For example we
>can easily get a monthly table
>through a rollup which is defined by "group by to_month(day_column)".
>
>For the case that expression are applied on the value column. For example
>we can get a PV aggregate table
>through a rollup which is defined by "sum(case event_column when
>"page_view" then 1 else 0 end)".
>
>Thanks,
>Zhao Chun
>
>
>ling miao <em...@gmail.com> 于2020年4月16日周四 下午5:13写道:
>
>> Hi everyone,
>>
>> *The status of materialized view 1.0*
>> In the present, we have supported the materialized views in Doris 0.12
>> version. The materialized view selector supports to select the most
>> efficient mv and rewrite the SQL to query against the selected mv instead
>> of the base table.
>> For query results contain a small number of rows where the original table
>> has a large amount of data, the performance can reach the 5X to 100X times
>> depends on the cardinality of the data.
>> The aggregate functions supported by the materialized view in 0.12 include:
>> sum, min, max.
>>
>> However, the aggregate functions supported by the current materialized view
>> are not rich enough to fully cover the user's scene.
>> For example, in the `Order` scenario, user needs to analyze the number of
>> orders in different dimensions.
>> Another example is the count_distinct function is used for analyzing PV and
>> UV data in website traffic.
>>
>> *The goal of materialized view 2.0*
>> In order to support more scenarios, the materialized view 2.0 will support
>> the following functions:
>>
>> 1. Materialized view supports aggregate functions: count, count_distinct
>> (bitmap and hll)
>> 2. Support to create materialized views of the same column with different
>> aggregate functions. For example: select k1, sum (v1), min (v1) from table
>> group by k1
>>
>>
>> What features do you want the materialized view 2.0 to support?
>>
>> Looking forward to your idea~
>>
>> LingMiao
>>

Re: [PROPOSAL] Materialized view 2.0

Posted by Zhao Chun <zh...@apache.org>.
Hi Ling

Very glad to see this proposal.

We can discuss whether to support expression operations.

For the case that expressions are applied on the KEY column. For example we
can easily get a monthly table
through a rollup which is defined by "group by to_month(day_column)".

For the case that expression are applied on the value column. For example
we can get a PV aggregate table
through a rollup which is defined by "sum(case event_column when
"page_view" then 1 else 0 end)".

Thanks,
Zhao Chun


ling miao <em...@gmail.com> 于2020年4月16日周四 下午5:13写道:

> Hi everyone,
>
> *The status of materialized view 1.0*
> In the present, we have supported the materialized views in Doris 0.12
> version. The materialized view selector supports to select the most
> efficient mv and rewrite the SQL to query against the selected mv instead
> of the base table.
> For query results contain a small number of rows where the original table
> has a large amount of data, the performance can reach the 5X to 100X times
> depends on the cardinality of the data.
> The aggregate functions supported by the materialized view in 0.12 include:
> sum, min, max.
>
> However, the aggregate functions supported by the current materialized view
> are not rich enough to fully cover the user's scene.
> For example, in the `Order` scenario, user needs to analyze the number of
> orders in different dimensions.
> Another example is the count_distinct function is used for analyzing PV and
> UV data in website traffic.
>
> *The goal of materialized view 2.0*
> In order to support more scenarios, the materialized view 2.0 will support
> the following functions:
>
> 1. Materialized view supports aggregate functions: count, count_distinct
> (bitmap and hll)
> 2. Support to create materialized views of the same column with different
> aggregate functions. For example: select k1, sum (v1), min (v1) from table
> group by k1
>
>
> What features do you want the materialized view 2.0 to support?
>
> Looking forward to your idea~
>
> LingMiao
>