You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by 1037817390 <me...@qq.com.INVALID> on 2022/08/11 04:02:23 UTC
回复: [DISCUSS]: Integrate column stats index with all query engines
+1 for this
it will be better to provide some filter converters to faciliate the integration of the engine:
eg: converter presto domain to hudi domain
and i have already finish the first version of dataskipping/partition prune/filter pushdown for presto,
https://github.com/xiarixiaoyao/presto/commit/800646608d4b88799de0addcddd97d03592954ce
maybe we can work together
孟涛
mengtao0326@qq.com
------------------ 原始邮件 ------------------
发件人: "dev" <vinoth@apache.org>;
发送时间: 2022年8月11日(星期四) 中午12:11
收件人: "dev"<dev@hudi.apache.org>;
主题: Re: [DISCUSS]: Integrate column stats index with all query engines
+1 for this.
Suggested new reviewers on the RFC.
https://github.com/apache/hudi/pull/6345/files#r943073339
On Wed, Aug 10, 2022 at 9:56 PM Pratyaksh Sharma <pratyaksh13@gmail.com>
wrote:
> Hello community,
>
> With the introduction of multi modal index in Hudi, there is a lot of scope
> for improvement on the querying side. There are 2 major ways of reducing
> the data scan at the time of querying - partition pruning and file pruning.
> While with the latest developments in the community, partition pruning is
> supported for commonly used query engines like spark, presto and hive, File
> pruning using column stats index is only supported for spark and flink.
>
> We intend to support data skipping for the rest of the engines as well
> which include hive, presto and trino. I have written a draft RFC here -
> https://github.com/apache/hudi/pull/6345.
>
> Please take a look and let me know what you think. Once we have some
> feedback from the community, we can decide on the next steps.
>
Re: [DISCUSS]: Integrate column stats index with all query engines
Posted by Pratyaksh Sharma <pr...@gmail.com>.
Surely we can work together once we get some feedback on the RFC Meng!
On Thu, Aug 11, 2022 at 9:32 AM 1037817390 <me...@qq.com.invalid>
wrote:
> +1 for this
> it will be better to provide some filter converters to faciliate the
> integration of the engine:
> eg: converter presto domain to hudi domain
>
>
>
> and i have already finish the first version of dataskipping/partition
> prune/filter pushdown for presto,
>
> https://github.com/xiarixiaoyao/presto/commit/800646608d4b88799de0addcddd97d03592954ce
>
> maybe we can work together
>
>
>
>
>
>
>
> 孟涛
> mengtao0326@qq.com
>
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人:
> "dev"
> <
> vinoth@apache.org>;
> 发送时间: 2022年8月11日(星期四) 中午12:11
> 收件人: "dev"<dev@hudi.apache.org>;
>
> 主题: Re: [DISCUSS]: Integrate column stats index with all query engines
>
>
>
> +1 for this.
>
> Suggested new reviewers on the RFC.
> https://github.com/apache/hudi/pull/6345/files#r943073339
>
> On Wed, Aug 10, 2022 at 9:56 PM Pratyaksh Sharma <pratyaksh13@gmail.com
> >
> wrote:
>
> > Hello community,
> >
> > With the introduction of multi modal index in Hudi, there is a lot of
> scope
> > for improvement on the querying side. There are 2 major ways of
> reducing
> > the data scan at the time of querying - partition pruning and file
> pruning.
> > While with the latest developments in the community, partition
> pruning is
> > supported for commonly used query engines like spark, presto and
> hive, File
> > pruning using column stats index is only supported for spark and
> flink.
> >
> > We intend to support data skipping for the rest of the engines as well
> > which include hive, presto and trino. I have written a draft RFC here
> -
> > https://github.com/apache/hudi/pull/6345.
> >
> > Please take a look and let me know what you think. Once we have some
> > feedback from the community, we can decide on the next steps.
> >