You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by Aniket Adnaik <an...@gmail.com> on 2016/09/27 22:49:03 UTC

[Discussion]: option to disable multi-layered index scan and use full table scan

Carbondata can provide some way to disable usage of multi-layered index and
provide full table scan.
This may help in following cases; 
1. Small tables occupying only few number of blocks are probably better of
using full table scan.
2. Queries with large number of projections with no filter may benefit from
using full table scan.
3. Testing different scenarios and comparing with multiple HDFS file formats
that do not provide multi layer index will be easier. 

Also, Carbondata scan internally should be smart enough to detect this based
on query, data size, etc.  

Any comments?




--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-option-to-disable-multi-layered-index-scan-and-use-full-table-scan-tp1526.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

RE: [Discussion]: option to disable multi-layered index scan and use full table scan

Posted by Jihong Ma <Ji...@huawei.com>.

The use case scenario for direct full table scan is rare and not dominating case, the internal built-in smartness should be very cautious about making this decision as this is considered to be in the worst case category. still didn't see a need for exposing it externally. 

Regards.

Jenny

-----Original Message-----
From: Aniket Adnaik [mailto:aniket.adnaik@gmail.com] 
Sent: Wednesday, September 28, 2016 10:50 AM
To: dev@carbondata.incubator.apache.org
Subject: Re: [Discussion]: option to disable multi-layered index scan and use full table scan

In my view we can have both;
1. We should have internal smartness built-in, like databases do based on
cost estimation .Carbon can take simpler approach based on data size or
number of files to scan.
2. Provide some kind of hints OR switch to  enable or disable - MDK, MinMax
and inverted index, individually and together. This would help us in
 testing different scenarios or performance tuning. These are like options
for users who know what they are doing or can be internal options if you we
don't want to create more confusion.

Best Regards,
Aniket

On Tue, Sep 27, 2016 at 11:58 PM, Raghunandan S <
carbondatacontributions@gmail.com> wrote:

> I agree with jihong.carbon need to have smart logic to decide
> On Wed, 28 Sep 2016 at 6:12 AM, Jihong Ma <Ji...@huawei.com> wrote:
>
> > Ideally this should be an internal improvement, not necessarily exposing
> > it as an config option, Carbon should be able to smartly figure out if
> > leveraging index is beneficial or straightly going for a file scan (just
> as
> > Parquet).
> >
> > Regards.
> >
> > Jihong
> >
> > -----Original Message-----
> > From: Liang Big data [mailto:chenliang6136@gmail.com]
> > Sent: Tuesday, September 27, 2016 4:43 PM
> > To: dev@carbondata.incubator.apache.org
> > Subject: Re: [Discussion]: option to disable multi-layered index scan and
> > use full table scan
> >
> > Hi
> >
> > good suggestion!
> > Add one configurable option to disable index for no filter and small
> table
> > scenarios.
> > One comment : you are suggesting only disable MDK; How about other
> > index(inverted index,and MINMAX index)?
> >
> > +1 for this feature
> >
> > Regards
> > Liang
> >
> > 2016-09-28 6:49 GMT+08:00 Aniket Adnaik <an...@gmail.com>:
> >
> > > Carbondata can provide some way to disable usage of multi-layered index
> > and
> > > provide full table scan.
> > > This may help in following cases;
> > > 1. Small tables occupying only few number of blocks are probably better
> > of
> > > using full table scan.
> > > 2. Queries with large number of projections with no filter may benefit
> > from
> > > using full table scan.
> > > 3. Testing different scenarios and comparing with multiple HDFS file
> > > formats
> > > that do not provide multi layer index will be easier.
> > >
> > > Also, Carbondata scan internally should be smart enough to detect this
> > > based
> > > on query, data size, etc.
> > >
> > > Any comments?
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context: http://apache-carbondata-
> > > mailing-list-archive.1130556.n5.nabble.com/Discussion-
> > > option-to-disable-multi-layered-index-scan-and-use-
> > > full-table-scan-tp1526.html
> > > Sent from the Apache CarbonData Mailing List archive mailing list
> archive
> > > at Nabble.com.
> > >
> >
> >
> >
> > --
> >
> > Regards
> > Liang
> >
>

Re: [Discussion]: option to disable multi-layered index scan and use full table scan

Posted by Aniket Adnaik <an...@gmail.com>.

In my view we can have both;
1. We should have internal smartness built-in, like databases do based on
cost estimation .Carbon can take simpler approach based on data size or
number of files to scan.
2. Provide some kind of hints OR switch to  enable or disable - MDK, MinMax
and inverted index, individually and together. This would help us in
 testing different scenarios or performance tuning. These are like options
for users who know what they are doing or can be internal options if you we
don't want to create more confusion.

Best Regards,
Aniket

On Tue, Sep 27, 2016 at 11:58 PM, Raghunandan S <
carbondatacontributions@gmail.com> wrote:

> I agree with jihong.carbon need to have smart logic to decide
> On Wed, 28 Sep 2016 at 6:12 AM, Jihong Ma <Ji...@huawei.com> wrote:
>
> > Ideally this should be an internal improvement, not necessarily exposing
> > it as an config option, Carbon should be able to smartly figure out if
> > leveraging index is beneficial or straightly going for a file scan (just
> as
> > Parquet).
> >
> > Regards.
> >
> > Jihong
> >
> > -----Original Message-----
> > From: Liang Big data [mailto:chenliang6136@gmail.com]
> > Sent: Tuesday, September 27, 2016 4:43 PM
> > To: dev@carbondata.incubator.apache.org
> > Subject: Re: [Discussion]: option to disable multi-layered index scan and
> > use full table scan
> >
> > Hi
> >
> > good suggestion!
> > Add one configurable option to disable index for no filter and small
> table
> > scenarios.
> > One comment : you are suggesting only disable MDK; How about other
> > index(inverted index,and MINMAX index)?
> >
> > +1 for this feature
> >
> > Regards
> > Liang
> >
> > 2016-09-28 6:49 GMT+08:00 Aniket Adnaik <an...@gmail.com>:
> >
> > > Carbondata can provide some way to disable usage of multi-layered index
> > and
> > > provide full table scan.
> > > This may help in following cases;
> > > 1. Small tables occupying only few number of blocks are probably better
> > of
> > > using full table scan.
> > > 2. Queries with large number of projections with no filter may benefit
> > from
> > > using full table scan.
> > > 3. Testing different scenarios and comparing with multiple HDFS file
> > > formats
> > > that do not provide multi layer index will be easier.
> > >
> > > Also, Carbondata scan internally should be smart enough to detect this
> > > based
> > > on query, data size, etc.
> > >
> > > Any comments?
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context: http://apache-carbondata-
> > > mailing-list-archive.1130556.n5.nabble.com/Discussion-
> > > option-to-disable-multi-layered-index-scan-and-use-
> > > full-table-scan-tp1526.html
> > > Sent from the Apache CarbonData Mailing List archive mailing list
> archive
> > > at Nabble.com.
> > >
> >
> >
> >
> > --
> >
> > Regards
> > Liang
> >
>

Re: [Discussion]: option to disable multi-layered index scan and use full table scan

Posted by Raghunandan S <ca...@gmail.com>.

I agree with jihong.carbon need to have smart logic to decide
On Wed, 28 Sep 2016 at 6:12 AM, Jihong Ma <Ji...@huawei.com> wrote:

> Ideally this should be an internal improvement, not necessarily exposing
> it as an config option, Carbon should be able to smartly figure out if
> leveraging index is beneficial or straightly going for a file scan (just as
> Parquet).
>
> Regards.
>
> Jihong
>
> -----Original Message-----
> From: Liang Big data [mailto:chenliang6136@gmail.com]
> Sent: Tuesday, September 27, 2016 4:43 PM
> To: dev@carbondata.incubator.apache.org
> Subject: Re: [Discussion]: option to disable multi-layered index scan and
> use full table scan
>
> Hi
>
> good suggestion!
> Add one configurable option to disable index for no filter and small table
> scenarios.
> One comment : you are suggesting only disable MDK; How about other
> index(inverted index,and MINMAX index)?
>
> +1 for this feature
>
> Regards
> Liang
>
> 2016-09-28 6:49 GMT+08:00 Aniket Adnaik <an...@gmail.com>:
>
> > Carbondata can provide some way to disable usage of multi-layered index
> and
> > provide full table scan.
> > This may help in following cases;
> > 1. Small tables occupying only few number of blocks are probably better
> of
> > using full table scan.
> > 2. Queries with large number of projections with no filter may benefit
> from
> > using full table scan.
> > 3. Testing different scenarios and comparing with multiple HDFS file
> > formats
> > that do not provide multi layer index will be easier.
> >
> > Also, Carbondata scan internally should be smart enough to detect this
> > based
> > on query, data size, etc.
> >
> > Any comments?
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-carbondata-
> > mailing-list-archive.1130556.n5.nabble.com/Discussion-
> > option-to-disable-multi-layered-index-scan-and-use-
> > full-table-scan-tp1526.html
> > Sent from the Apache CarbonData Mailing List archive mailing list archive
> > at Nabble.com.
> >
>
>
>
> --
>
> Regards
> Liang
>

RE: [Discussion]: option to disable multi-layered index scan and use full table scan

Posted by Jihong Ma <Ji...@huawei.com>.

Ideally this should be an internal improvement, not necessarily exposing it as an config option, Carbon should be able to smartly figure out if leveraging index is beneficial or straightly going for a file scan (just as Parquet). 

Regards.

Jihong

-----Original Message-----
From: Liang Big data [mailto:chenliang6136@gmail.com] 
Sent: Tuesday, September 27, 2016 4:43 PM
To: dev@carbondata.incubator.apache.org
Subject: Re: [Discussion]: option to disable multi-layered index scan and use full table scan

Hi

good suggestion!
Add one configurable option to disable index for no filter and small table
scenarios.
One comment : you are suggesting only disable MDK; How about other
index(inverted index,and MINMAX index)?

+1 for this feature

Regards
Liang

2016-09-28 6:49 GMT+08:00 Aniket Adnaik <an...@gmail.com>:

> Carbondata can provide some way to disable usage of multi-layered index and
> provide full table scan.
> This may help in following cases;
> 1. Small tables occupying only few number of blocks are probably better of
> using full table scan.
> 2. Queries with large number of projections with no filter may benefit from
> using full table scan.
> 3. Testing different scenarios and comparing with multiple HDFS file
> formats
> that do not provide multi layer index will be easier.
>
> Also, Carbondata scan internally should be smart enough to detect this
> based
> on query, data size, etc.
>
> Any comments?
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Discussion-
> option-to-disable-multi-layered-index-scan-and-use-
> full-table-scan-tp1526.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>



-- 

Regards
Liang

Re: [Discussion]: option to disable multi-layered index scan and use full table scan

Posted by Liang Big data <ch...@gmail.com>.

Hi

good suggestion!
Add one configurable option to disable index for no filter and small table
scenarios.
One comment : you are suggesting only disable MDK; How about other
index(inverted index,and MINMAX index)?

+1 for this feature

Regards
Liang

2016-09-28 6:49 GMT+08:00 Aniket Adnaik <an...@gmail.com>:

> Carbondata can provide some way to disable usage of multi-layered index and
> provide full table scan.
> This may help in following cases;
> 1. Small tables occupying only few number of blocks are probably better of
> using full table scan.
> 2. Queries with large number of projections with no filter may benefit from
> using full table scan.
> 3. Testing different scenarios and comparing with multiple HDFS file
> formats
> that do not provide multi layer index will be easier.
>
> Also, Carbondata scan internally should be smart enough to detect this
> based
> on query, data size, etc.
>
> Any comments?
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Discussion-
> option-to-disable-multi-layered-index-scan-and-use-
> full-table-scan-tp1526.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>



-- 

Regards
Liang