You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by Kunal Kapoor <ku...@gmail.com> on 2021/02/25 04:51:03 UTC

Re: [Discussion]Presto Queries leveraging Secondary Index

+1 on using index server to leverage SI index. As discussed earlier we
would need a segment UDF to enable selective segment reading instead of the
current implementation. The existing setSegmentsToRead API should be
removed later as well

Please share the design after your POC

On Mon, Jan 18, 2021 at 9:42 AM akashrn5 <ak...@gmail.com> wrote:

> Hi venu,
>
> Thanks for suggesting.
>
> 1. option 1 is not a good idea. i think performance will be bad
> 2. for option2, like we have other indexes of lucene and bloom where the
> distributed pruning happens. Lucene also a index stored along with table,
> but not another table like SI, so we scan lucene in a distributed job and
> then return the index for the filter expression. So similarly we can call
> for SI to scan and prune, but since we need spark job to do it, we need
> indexserver which is the only option.
> So we can use that for scanning, but im afraid if it impacts the other
> concurrent queries, so i would suggest better to go for POC with the index
> server where we will get to know some other bottlenecks with this approach,
> so then we can decide and start design.
>
> If you have already done POC and have some results and design is ready, we
> can review that.
>
> Thanks
>
> Regards
> Akash
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>