You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by Qifan Chen <qi...@esgyn.com> on 2015/07/28 15:04:10 UTC

Re: Index usage

Hi Layla,

Since all non-key columns are filled with values from 0 to 99 and there are
10,000 rows in total, on average, there are 100 occurences of a distinct
value per column. That will be the extra cost to select the index plan (100
random accesses in the worse non-overlapping case), which could be higher
than a full scan (seen in the plan chosen).

To really see the benefit of an index plan, you could either increase the #
of rows in the table, or reduce the number of occurrences of certain values
to be index searched.


You can do the following to verify that the index exists and is populated
correctly (from sqlci).

set parserflags 1;
select count(*) from table(index_table <index_name>);


To force an index plan, do the following (from sqlci). The nested join
operator bridges the index scan operator and the table scan operator.

control query shape nested_join(cut, cut);
explain options 'f' select * from testdata3 where s_key = 5;


You should see trafodion_index_scan as operator in such a plan as shown
below. Note that most of the time, the compiler will pick the right plan
for you.

>>create table tao1(a int, b int);


--- SQL operation complete.

>>create index itao1 on tao1(b);


--- SQL operation complete.

>>control query shape nested_join(cut,cut);


--- SQL operation complete.

>>prepare xx from select * from tao1 where b =1;


--- SQL command prepared.

>>explain options 'f' xx;


LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD

---- ---- ---- --------------------  --------  --------------------
---------


4    .    5    root
1.00E+001

1    3    4    nested_join
1.00E+001

2    .    3    probe_cache
4.00E-001

.    .    2    trafodion_vsbb_scan             TAO1
4.00E-001

.    .    1    trafodion_index_scan            ITAO1
1.00E+001


--- SQL operation complete.

Regards, --Qifan

On Mon, Jul 27, 2015 at 11:27 PM, Martin, Layla (HP DualStudy) <
layla.martin@hp.com> wrote:

> I got one thing wrong: In the best comparison of a single test, the
> performance benefit is ~10%. On average, it is ~0.2% (so ,actually nothing)
> (I've tested with 300 scans)
>
> Layla Martin
> HP DualStudy - Telefon +49 7031 4504682 - layla.martin@hp.com<mailto:
> layla.martin@hp.com>
>
> From: Martin, Layla (HP DualStudy)
> Sent: Dienstag, 28. Juli 2015 08:00
> To: dev@trafodion.incubator.apache.org
> Subject: Index usage
>
> Hi!
>
> I have a similar problem to the one Anoop described in February.
> https://bugs.launchpad.net/trafodion/+bug/1407807
> It seems, as if the index is not being used for a select statement.
>
> I wanted to test how fast a very simple select statement (SELECT * FROM
> tablename WHERE somecolumn = 5) is executed.
> Somecolumn is either primary key, an indexed column or some arbitrary
> column.
>
> My results have shown max. 10% performance increase when using an index,
> but 10x the speed when using the primary key, no matter how the other
> columns are filled (primary key is of course autoincremented, the other
> columns are either filled with random numbers (10,000 rows with integers
> between 0 and 99) or incremented in reverse order).
>
> It doesn't seem as if the index was used at all!
>
> As I wasn't sure if the index was correctly built, I moved the CREATE
> INDEX statement from after test data creation to before it, but I didn't
> see any difference.
>
> Is there a possibility to "switch on" index usage?
>
> I used the explain options 'f' statement, but it doesn't seem to use the
> index at all. (When trying the same for primary key, Trafodion at least
> uses "OPT o" ... what exactly do these codes mean? How would a index scan
> exactly look like?)
>
> Please find the results of the statements below (S_key is the indexed
> column, id is primary key).
>
> >>explain options 'f' select * from testdata3 where s_key = 5;
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
> 1    .    2    root
> 1.64E+002
> .    .    1    trafodion_scan                  TESTDATA3
>  1.64E+002
>
> --- SQL operation complete.
> >>explain options 'f' select * from testdata where id = 5;
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
> 1    .    2    root                  o
>  1.00E+000
> .    .    1    trafodion_scan                  TESTDATA
> 1.00E+000
>
> By, the way, I'm using Trafodion in a single node installation of Sandbox
> v1.1.
>
> Best regards
> Layla
>
> Layla Martin
> HP DualStudy
> Hewlett-Packard GmbH
> Telefon +49 7031 4504682
> layla.martin@hp.com<ma...@hp.com>
> Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy<
> http://www.hp.com/de/dualstudy>
> [Description: Description:
> http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogoSmall.png
> ]
> Follow us on: [facebook] <https://www.facebook.com/DualStudy.hp>
> [twitter] <http://twitter.com/hpdualstudy>  [youtube] <
> http://www.youtube.com/user/hpDualStudy>
> Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard
> Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst Reichart
> Vorsitzender des Aufsichtsrats: Jörg Menno Harms
> Sitz der Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081
> WEEE-Reg.-Nr. DE 30409072
>
>
>


-- 
Regards, --Qifan

Re: Index usage

Posted by Qifan Chen <qi...@esgyn.com>.
Yes, that will be a very good exercise.

did you quit the sqlci session?  The shape is only useful within a session,
or when it is being turned off.

You also need to update stats after each iteration of table load, for the
column used in the predicate.

On Wed, Jul 29, 2015 at 8:50 AM, Martin, Layla (HP DualStudy) <
layla.martin@hp.com> wrote:

>  You meen Frequency in terms of how often a value occurs? That’s exactly
> what I want to do. I have a function which generates random numbers (in a
> specific range). But even with random numbers between 0 and 49,999 and a
> total of 50,000 rows, the index is used.
>
>
>
> I now assume that something is set to force index usage, even though it
> doesn’t make sense.
>
>
>
> *Layla Martin*
> HP DualStudy - Telefon +49 7031 4504682 - *layla.martin@hp.com*
> <la...@hp.com>
>
>
>
> *From:* Qifan Chen [mailto:qifan.chen@esgyn.com]
> *Sent:* Mittwoch, 29. Juli 2015 15:46
> *To:* Martin, Layla (HP DualStudy)
> *Cc:* dev@trafodion.incubator.apache.org
> *Subject:* Re: Index usage
>
>
>
> You probably can change the frequency of a value in the column
> (f=1,2,5,10, ...) to find out the break-even point.
>
>
>
> The CQD provided will force a full scan (without the shape).
>
>
>
> On Wed, Jul 29, 2015 at 8:32 AM, Martin, Layla (HP DualStudy) <
> layla.martin@hp.com> wrote:
>
>  Hi Qifan,
>
>
>
> But this doesn’t just undo the control query shape from before, does it? I
> would like to see at which point the optimizer decides to use indexes
> rather than a full table scan.
>
>
>
> Best regards
>
> Layla
>
>
>
> *Layla Martin*
> HP DualStudy - Telefon +49 7031 4504682 - *layla.martin@hp.com*
> <la...@hp.com>
>
>
>
> *From:* Qifan Chen [mailto:qifan.chen@esgyn.com]
> *Sent:* Mittwoch, 29. Juli 2015 14:58
>
>
> *To:* dev; Martin, Layla (HP DualStudy)
> *Subject:* Re: Index usage
>
>
>
> Yes, apply the following CQD to completely turn off index.
>
>
>
> CQD HIDE_INDEXES 'ALL';
>
>
>
>
>
> On Wed, Jul 29, 2015 at 6:05 AM, Martin, Layla (HP DualStudy) <
> layla.martin@hp.com> wrote:
>
>  Hi,
>
> Thanks for your help.
> It now uses the index! Can I switch the index usage of again? I think it
> still uses the index, even if it doesn’t make sense … (if I’m using 50,000
> different values for 50,000 rows …).
>
> I’ve tried “CONTROL QUERY SHARE OFF, but it didn’t do the thing … It still
> uses the index regardless of its usefulness.
> I don’t think that the reason for this behavior is execution plan caching,
> as I dropped and recreated the table in the meantime.
>
> Best regards and thanks for your help
> Layla
>
> Layla Martin
> HP DualStudy - Telefon +49 7031 4504682 - layla.martin@hp.com<mailto:
> layla.martin@hp.com>
>
> From: Qifan Chen [mailto:qifan.chen@esgyn.com]
> Sent: Dienstag, 28. Juli 2015 15:04
> To: dev; Martin, Layla (HP DualStudy)
> Subject: Re: Index usage
>
>
> Hi Layla,
>
> Since all non-key columns are filled with values from 0 to 99 and there
> are 10,000 rows in total, on average, there are 100 occurences of a
> distinct value per column. That will be the extra cost to select the index
> plan (100 random accesses in the worse non-overlapping case), which could
> be higher than a full scan (seen in the plan chosen).
>
> To really see the benefit of an index plan, you could either increase the
> # of rows in the table, or reduce the number of occurrences of certain
> values to be index searched.
>
>
> You can do the following to verify that the index exists and is populated
> correctly (from sqlci).
>
> set parserflags 1;
> select count(*) from table(index_table <index_name>);
>
>
> To force an index plan, do the following (from sqlci). The nested join
> operator bridges the index scan operator and the table scan operator.
>
> control query shape nested_join(cut, cut);
> explain options 'f' select * from testdata3 where s_key = 5;
>
>
> You should see trafodion_index_scan as operator in such a plan as shown
> below. Note that most of the time, the compiler will pick the right plan
> for you.
>
>
> >>create table tao1(a int, b int);
>
>
>
> --- SQL operation complete.
>
> >>create index itao1 on tao1(b);
>
>
>
> --- SQL operation complete.
>
> >>control query shape nested_join(cut,cut);
>
>
>
> --- SQL operation complete.
>
> >>prepare xx from select * from tao1 where b =1;
>
>
>
> --- SQL command prepared.
>
> >>explain options 'f' xx;
>
>
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
>
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
>
>
> 4    .    5    root
> 1.00E+001
>
> 1    3    4    nested_join
>  1.00E+001
>
> 2    .    3    probe_cache
>  4.00E-001
>
> .    .    2    trafodion_vsbb_scan             TAO1
> 4.00E-001
>
> .    .    1    trafodion_index_scan            ITAO1
>  1.00E+001
>
>
>
> --- SQL operation complete.
>
> Regards, --Qifan
>
> On Mon, Jul 27, 2015 at 11:27 PM, Martin, Layla (HP DualStudy) <
> layla.martin@hp.com<ma...@hp.com>> wrote:
> I got one thing wrong: In the best comparison of a single test, the
> performance benefit is ~10%. On average, it is ~0.2% (so ,actually nothing)
> (I've tested with 300 scans)
>
> Layla Martin
>
> HP DualStudy - Telefon +49 7031 4504682<tel:%2B49%207031%204504682
> <%2B49%207031%204504682>> - layla.martin@hp.com<mailto:layla.martin@hp.com
> ><ma...@hp.com>>
>
>
>
> From: Martin, Layla (HP DualStudy)
> Sent: Dienstag, 28. Juli 2015 08:00
> To: dev@trafodion.incubator.apache.org<mailto:
> dev@trafodion.incubator.apache.org>
> Subject: Index usage
>
> Hi!
>
> I have a similar problem to the one Anoop described in February.
> https://bugs.launchpad.net/trafodion/+bug/1407807
>
> It seems, as if the index is not being used for a select statement.
>
> I wanted to test how fast a very simple select statement (SELECT * FROM
> tablename WHERE somecolumn = 5) is executed.
> Somecolumn is either primary key, an indexed column or some arbitrary
> column.
>
> My results have shown max. 10% performance increase when using an index,
> but 10x the speed when using the primary key, no matter how the other
> columns are filled (primary key is of course autoincremented, the other
> columns are either filled with random numbers (10,000 rows with integers
> between 0 and 99) or incremented in reverse order).
>
> It doesn't seem as if the index was used at all!
>
> As I wasn't sure if the index was correctly built, I moved the CREATE
> INDEX statement from after test data creation to before it, but I didn't
> see any difference.
>
> Is there a possibility to "switch on" index usage?
>
> I used the explain options 'f' statement, but it doesn't seem to use the
> index at all. (When trying the same for primary key, Trafodion at least
> uses "OPT o" ... what exactly do these codes mean? How would a index scan
> exactly look like?)
>
> Please find the results of the statements below (S_key is the indexed
> column, id is primary key).
>
> >>explain options 'f' select * from testdata3 where s_key = 5;
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
> 1    .    2    root
> 1.64E+002
> .    .    1    trafodion_scan                  TESTDATA3
>  1.64E+002
>
> --- SQL operation complete.
> >>explain options 'f' select * from testdata where id = 5;
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
> 1    .    2    root                  o
>  1.00E+000
> .    .    1    trafodion_scan                  TESTDATA
> 1.00E+000
>
> By, the way, I'm using Trafodion in a single node installation of Sandbox
> v1.1.
>
> Best regards
> Layla
>
> Layla Martin
> HP DualStudy
> Hewlett-Packard GmbH
>
> Telefon +49 7031 4504682<tel:%2B49%207031%204504682
> <%2B49%207031%204504682>>
> layla.martin@hp.com<ma...@hp.com><mailto:layla.martin@hp.com
> <ma...@hp.com>>
> Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy<
> http://www.hp.com/de/dualstudy><http://www.hp.com/de/dualstudy>
> [Description: Description:
> http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogoSmall.png
> ]
> Follow us on: [facebook] <https://www.facebook.com/DualStudy.hp>
> [twitter] <http://twitter.com/hpdualstudy>  [youtube] <
> http://www.youtube.com/user/hpDualStudy>
>
> Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard
> Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst Reichart
> Vorsitzender des Aufsichtsrats: Jörg Menno Harms
> Sitz der Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081
> WEEE-Reg.-Nr. DE 30409072
>
>
>
>
> --
> Regards, --Qifan
>
>
>
>
>
> --
>
> Regards, --Qifan
>
>
>
>
>
>
>
> --
>
> Regards, --Qifan
>
>
>



-- 
Regards, --Qifan

RE: Index usage

Posted by "Martin, Layla (HP DualStudy)" <la...@hp.com>.
You meen Frequency in terms of how often a value occurs? That’s exactly what I want to do. I have a function which generates random numbers (in a specific range). But even with random numbers between 0 and 49,999 and a total of 50,000 rows, the index is used.

I now assume that something is set to force index usage, even though it doesn’t make sense.

Layla Martin
HP DualStudy - Telefon +49 7031 4504682 - layla.martin@hp.com<ma...@hp.com>

From: Qifan Chen [mailto:qifan.chen@esgyn.com]
Sent: Mittwoch, 29. Juli 2015 15:46
To: Martin, Layla (HP DualStudy)
Cc: dev@trafodion.incubator.apache.org
Subject: Re: Index usage

You probably can change the frequency of a value in the column (f=1,2,5,10, ...) to find out the break-even point.

The CQD provided will force a full scan (without the shape).

On Wed, Jul 29, 2015 at 8:32 AM, Martin, Layla (HP DualStudy) <la...@hp.com>> wrote:
Hi Qifan,

But this doesn’t just undo the control query shape from before, does it? I would like to see at which point the optimizer decides to use indexes rather than a full table scan.

Best regards
Layla

Layla Martin
HP DualStudy - Telefon +49 7031 4504682<tel:%2B49%207031%204504682> - layla.martin@hp.com<ma...@hp.com>

From: Qifan Chen [mailto:qifan.chen@esgyn.com<ma...@esgyn.com>]
Sent: Mittwoch, 29. Juli 2015 14:58

To: dev; Martin, Layla (HP DualStudy)
Subject: Re: Index usage

Yes, apply the following CQD to completely turn off index.


CQD HIDE_INDEXES 'ALL';



On Wed, Jul 29, 2015 at 6:05 AM, Martin, Layla (HP DualStudy) <la...@hp.com>> wrote:
Hi,

Thanks for your help.
It now uses the index! Can I switch the index usage of again? I think it still uses the index, even if it doesn’t make sense … (if I’m using 50,000 different values for 50,000 rows …).

I’ve tried “CONTROL QUERY SHARE OFF, but it didn’t do the thing … It still uses the index regardless of its usefulness.
I don’t think that the reason for this behavior is execution plan caching, as I dropped and recreated the table in the meantime.

Best regards and thanks for your help
Layla

Layla Martin
HP DualStudy - Telefon +49 7031 4504682<tel:%2B49%207031%204504682> - layla.martin@hp.com<ma...@hp.com>>

From: Qifan Chen [mailto:qifan.chen@esgyn.com<ma...@esgyn.com>]
Sent: Dienstag, 28. Juli 2015 15:04
To: dev; Martin, Layla (HP DualStudy)
Subject: Re: Index usage

Hi Layla,

Since all non-key columns are filled with values from 0 to 99 and there are 10,000 rows in total, on average, there are 100 occurences of a distinct value per column. That will be the extra cost to select the index plan (100 random accesses in the worse non-overlapping case), which could be higher than a full scan (seen in the plan chosen).

To really see the benefit of an index plan, you could either increase the # of rows in the table, or reduce the number of occurrences of certain values to be index searched.


You can do the following to verify that the index exists and is populated correctly (from sqlci).

set parserflags 1;
select count(*) from table(index_table <index_name>);


To force an index plan, do the following (from sqlci). The nested join operator bridges the index scan operator and the table scan operator.

control query shape nested_join(cut, cut);
explain options 'f' select * from testdata3 where s_key = 5;


You should see trafodion_index_scan as operator in such a plan as shown below. Note that most of the time, the compiler will pick the right plan for you.


>>create table tao1(a int, b int);



--- SQL operation complete.

>>create index itao1 on tao1(b);



--- SQL operation complete.

>>control query shape nested_join(cut,cut);



--- SQL operation complete.

>>prepare xx from select * from tao1 where b =1;



--- SQL command prepared.

>>explain options 'f' xx;



LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD

---- ---- ---- --------------------  --------  --------------------  ---------



4    .    5    root                                                  1.00E+001

1    3    4    nested_join                                           1.00E+001

2    .    3    probe_cache                                           4.00E-001

.    .    2    trafodion_vsbb_scan             TAO1                  4.00E-001

.    .    1    trafodion_index_scan            ITAO1                 1.00E+001



--- SQL operation complete.

Regards, --Qifan
On Mon, Jul 27, 2015 at 11:27 PM, Martin, Layla (HP DualStudy) <la...@hp.com>>> wrote:
I got one thing wrong: In the best comparison of a single test, the performance benefit is ~10%. On average, it is ~0.2% (so ,actually nothing) (I've tested with 300 scans)

Layla Martin
HP DualStudy - Telefon +49 7031 4504682<tel:%2B49%207031%204504682><tel:%2B49%207031%204504682> - layla.martin@hp.com<ma...@hp.com>>>


From: Martin, Layla (HP DualStudy)
Sent: Dienstag, 28. Juli 2015 08:00
To: dev@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>>
Subject: Index usage

Hi!

I have a similar problem to the one Anoop described in February. https://bugs.launchpad.net/trafodion/+bug/1407807
It seems, as if the index is not being used for a select statement.

I wanted to test how fast a very simple select statement (SELECT * FROM tablename WHERE somecolumn = 5) is executed.
Somecolumn is either primary key, an indexed column or some arbitrary column.

My results have shown max. 10% performance increase when using an index, but 10x the speed when using the primary key, no matter how the other columns are filled (primary key is of course autoincremented, the other columns are either filled with random numbers (10,000 rows with integers between 0 and 99) or incremented in reverse order).

It doesn't seem as if the index was used at all!

As I wasn't sure if the index was correctly built, I moved the CREATE INDEX statement from after test data creation to before it, but I didn't see any difference.

Is there a possibility to "switch on" index usage?

I used the explain options 'f' statement, but it doesn't seem to use the index at all. (When trying the same for primary key, Trafodion at least uses "OPT o" ... what exactly do these codes mean? How would a index scan exactly look like?)

Please find the results of the statements below (S_key is the indexed column, id is primary key).

>>explain options 'f' select * from testdata3 where s_key = 5;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

1    .    2    root                                                  1.64E+002
.    .    1    trafodion_scan                  TESTDATA3             1.64E+002

--- SQL operation complete.
>>explain options 'f' select * from testdata where id = 5;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

1    .    2    root                  o                               1.00E+000
.    .    1    trafodion_scan                  TESTDATA              1.00E+000

By, the way, I'm using Trafodion in a single node installation of Sandbox v1.1.

Best regards
Layla

Layla Martin
HP DualStudy
Hewlett-Packard GmbH
Telefon +49 7031 4504682<tel:%2B49%207031%204504682><tel:%2B49%207031%204504682>
layla.martin@hp.com<ma...@hp.com>>>
Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy<http://www.hp.com/de/dualstudy><http://www.hp.com/de/dualstudy><http://www.hp.com/de/dualstudy>
[Description: Description: http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogoSmall.png]
Follow us on: [facebook] <https://www.facebook.com/DualStudy.hp>  [twitter] <http://twitter.com/hpdualstudy>  [youtube] <http://www.youtube.com/user/hpDualStudy>
Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst Reichart
Vorsitzender des Aufsichtsrats: Jörg Menno Harms
Sitz der Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081
WEEE-Reg.-Nr. DE 30409072




--
Regards, --Qifan



--
Regards, --Qifan




--
Regards, --Qifan


Re: Index usage

Posted by Qifan Chen <qi...@esgyn.com>.
You probably can change the frequency of a value in the column (f=1,2,5,10,
...) to find out the break-even point.

The CQD provided will force a full scan (without the shape).

On Wed, Jul 29, 2015 at 8:32 AM, Martin, Layla (HP DualStudy) <
layla.martin@hp.com> wrote:

>  Hi Qifan,
>
>
>
> But this doesn’t just undo the control query shape from before, does it? I
> would like to see at which point the optimizer decides to use indexes
> rather than a full table scan.
>
>
>
> Best regards
>
> Layla
>
>
>
> *Layla Martin*
> HP DualStudy - Telefon +49 7031 4504682 - *layla.martin@hp.com*
> <la...@hp.com>
>
>
>
> *From:* Qifan Chen [mailto:qifan.chen@esgyn.com]
> *Sent:* Mittwoch, 29. Juli 2015 14:58
>
> *To:* dev; Martin, Layla (HP DualStudy)
> *Subject:* Re: Index usage
>
>
>
> Yes, apply the following CQD to completely turn off index.
>
>
>
> CQD HIDE_INDEXES 'ALL';
>
>
>
>
>
> On Wed, Jul 29, 2015 at 6:05 AM, Martin, Layla (HP DualStudy) <
> layla.martin@hp.com> wrote:
>
> Hi,
>
> Thanks for your help.
> It now uses the index! Can I switch the index usage of again? I think it
> still uses the index, even if it doesn’t make sense … (if I’m using 50,000
> different values for 50,000 rows …).
>
> I’ve tried “CONTROL QUERY SHARE OFF, but it didn’t do the thing … It still
> uses the index regardless of its usefulness.
> I don’t think that the reason for this behavior is execution plan caching,
> as I dropped and recreated the table in the meantime.
>
> Best regards and thanks for your help
> Layla
>
> Layla Martin
> HP DualStudy - Telefon +49 7031 4504682 - layla.martin@hp.com<mailto:
> layla.martin@hp.com>
>
> From: Qifan Chen [mailto:qifan.chen@esgyn.com]
> Sent: Dienstag, 28. Juli 2015 15:04
> To: dev; Martin, Layla (HP DualStudy)
> Subject: Re: Index usage
>
>
> Hi Layla,
>
> Since all non-key columns are filled with values from 0 to 99 and there
> are 10,000 rows in total, on average, there are 100 occurences of a
> distinct value per column. That will be the extra cost to select the index
> plan (100 random accesses in the worse non-overlapping case), which could
> be higher than a full scan (seen in the plan chosen).
>
> To really see the benefit of an index plan, you could either increase the
> # of rows in the table, or reduce the number of occurrences of certain
> values to be index searched.
>
>
> You can do the following to verify that the index exists and is populated
> correctly (from sqlci).
>
> set parserflags 1;
> select count(*) from table(index_table <index_name>);
>
>
> To force an index plan, do the following (from sqlci). The nested join
> operator bridges the index scan operator and the table scan operator.
>
> control query shape nested_join(cut, cut);
> explain options 'f' select * from testdata3 where s_key = 5;
>
>
> You should see trafodion_index_scan as operator in such a plan as shown
> below. Note that most of the time, the compiler will pick the right plan
> for you.
>
>
> >>create table tao1(a int, b int);
>
>
>
> --- SQL operation complete.
>
> >>create index itao1 on tao1(b);
>
>
>
> --- SQL operation complete.
>
> >>control query shape nested_join(cut,cut);
>
>
>
> --- SQL operation complete.
>
> >>prepare xx from select * from tao1 where b =1;
>
>
>
> --- SQL command prepared.
>
> >>explain options 'f' xx;
>
>
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
>
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
>
>
> 4    .    5    root
> 1.00E+001
>
> 1    3    4    nested_join
>  1.00E+001
>
> 2    .    3    probe_cache
>  4.00E-001
>
> .    .    2    trafodion_vsbb_scan             TAO1
> 4.00E-001
>
> .    .    1    trafodion_index_scan            ITAO1
>  1.00E+001
>
>
>
> --- SQL operation complete.
>
> Regards, --Qifan
>
> On Mon, Jul 27, 2015 at 11:27 PM, Martin, Layla (HP DualStudy) <
> layla.martin@hp.com<ma...@hp.com>> wrote:
> I got one thing wrong: In the best comparison of a single test, the
> performance benefit is ~10%. On average, it is ~0.2% (so ,actually nothing)
> (I've tested with 300 scans)
>
> Layla Martin
> HP DualStudy - Telefon +49 7031 4504682<tel:%2B49%207031%204504682
> <%2B49%207031%204504682>> - layla.martin@hp.com<mailto:layla.martin@hp.com
> ><ma...@hp.com>>
>
>
> From: Martin, Layla (HP DualStudy)
> Sent: Dienstag, 28. Juli 2015 08:00
> To: dev@trafodion.incubator.apache.org<mailto:
> dev@trafodion.incubator.apache.org>
> Subject: Index usage
>
> Hi!
>
> I have a similar problem to the one Anoop described in February.
> https://bugs.launchpad.net/trafodion/+bug/1407807
>
>  It seems, as if the index is not being used for a select statement.
>
> I wanted to test how fast a very simple select statement (SELECT * FROM
> tablename WHERE somecolumn = 5) is executed.
> Somecolumn is either primary key, an indexed column or some arbitrary
> column.
>
> My results have shown max. 10% performance increase when using an index,
> but 10x the speed when using the primary key, no matter how the other
> columns are filled (primary key is of course autoincremented, the other
> columns are either filled with random numbers (10,000 rows with integers
> between 0 and 99) or incremented in reverse order).
>
> It doesn't seem as if the index was used at all!
>
> As I wasn't sure if the index was correctly built, I moved the CREATE
> INDEX statement from after test data creation to before it, but I didn't
> see any difference.
>
> Is there a possibility to "switch on" index usage?
>
> I used the explain options 'f' statement, but it doesn't seem to use the
> index at all. (When trying the same for primary key, Trafodion at least
> uses "OPT o" ... what exactly do these codes mean? How would a index scan
> exactly look like?)
>
> Please find the results of the statements below (S_key is the indexed
> column, id is primary key).
>
> >>explain options 'f' select * from testdata3 where s_key = 5;
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
> 1    .    2    root
> 1.64E+002
> .    .    1    trafodion_scan                  TESTDATA3
>  1.64E+002
>
> --- SQL operation complete.
> >>explain options 'f' select * from testdata where id = 5;
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
> 1    .    2    root                  o
>  1.00E+000
> .    .    1    trafodion_scan                  TESTDATA
> 1.00E+000
>
> By, the way, I'm using Trafodion in a single node installation of Sandbox
> v1.1.
>
> Best regards
> Layla
>
> Layla Martin
> HP DualStudy
> Hewlett-Packard GmbH
>
> Telefon +49 7031 4504682<tel:%2B49%207031%204504682
> <%2B49%207031%204504682>>
> layla.martin@hp.com<ma...@hp.com><mailto:layla.martin@hp.com
> <ma...@hp.com>>
> Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy<
> http://www.hp.com/de/dualstudy><http://www.hp.com/de/dualstudy>
> [Description: Description:
> http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogoSmall.png
> ]
> Follow us on: [facebook] <https://www.facebook.com/DualStudy.hp>
> [twitter] <http://twitter.com/hpdualstudy>  [youtube] <
> http://www.youtube.com/user/hpDualStudy>
>
> Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard
> Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst Reichart
> Vorsitzender des Aufsichtsrats: Jörg Menno Harms
> Sitz der Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081
> WEEE-Reg.-Nr. DE 30409072
>
>
>
>
> --
> Regards, --Qifan
>
>
>
>
>
> --
>
> Regards, --Qifan
>
>
>



-- 
Regards, --Qifan

RE: Index usage

Posted by "Martin, Layla (HP DualStudy)" <la...@hp.com>.
Hi Qifan,

But this doesn’t just undo the control query shape from before, does it? I would like to see at which point the optimizer decides to use indexes rather than a full table scan.

Best regards
Layla

Layla Martin
HP DualStudy - Telefon +49 7031 4504682 - layla.martin@hp.com<ma...@hp.com>

From: Qifan Chen [mailto:qifan.chen@esgyn.com]
Sent: Mittwoch, 29. Juli 2015 14:58
To: dev; Martin, Layla (HP DualStudy)
Subject: Re: Index usage

Yes, apply the following CQD to completely turn off index.


CQD HIDE_INDEXES 'ALL';



On Wed, Jul 29, 2015 at 6:05 AM, Martin, Layla (HP DualStudy) <la...@hp.com>> wrote:
Hi,

Thanks for your help.
It now uses the index! Can I switch the index usage of again? I think it still uses the index, even if it doesn’t make sense … (if I’m using 50,000 different values for 50,000 rows …).

I’ve tried “CONTROL QUERY SHARE OFF, but it didn’t do the thing … It still uses the index regardless of its usefulness.
I don’t think that the reason for this behavior is execution plan caching, as I dropped and recreated the table in the meantime.

Best regards and thanks for your help
Layla

Layla Martin
HP DualStudy - Telefon +49 7031 4504682<tel:%2B49%207031%204504682> - layla.martin@hp.com<ma...@hp.com>>

From: Qifan Chen [mailto:qifan.chen@esgyn.com<ma...@esgyn.com>]
Sent: Dienstag, 28. Juli 2015 15:04
To: dev; Martin, Layla (HP DualStudy)
Subject: Re: Index usage

Hi Layla,

Since all non-key columns are filled with values from 0 to 99 and there are 10,000 rows in total, on average, there are 100 occurences of a distinct value per column. That will be the extra cost to select the index plan (100 random accesses in the worse non-overlapping case), which could be higher than a full scan (seen in the plan chosen).

To really see the benefit of an index plan, you could either increase the # of rows in the table, or reduce the number of occurrences of certain values to be index searched.


You can do the following to verify that the index exists and is populated correctly (from sqlci).

set parserflags 1;
select count(*) from table(index_table <index_name>);


To force an index plan, do the following (from sqlci). The nested join operator bridges the index scan operator and the table scan operator.

control query shape nested_join(cut, cut);
explain options 'f' select * from testdata3 where s_key = 5;


You should see trafodion_index_scan as operator in such a plan as shown below. Note that most of the time, the compiler will pick the right plan for you.


>>create table tao1(a int, b int);



--- SQL operation complete.

>>create index itao1 on tao1(b);



--- SQL operation complete.

>>control query shape nested_join(cut,cut);



--- SQL operation complete.

>>prepare xx from select * from tao1 where b =1;



--- SQL command prepared.

>>explain options 'f' xx;



LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD

---- ---- ---- --------------------  --------  --------------------  ---------



4    .    5    root                                                  1.00E+001

1    3    4    nested_join                                           1.00E+001

2    .    3    probe_cache                                           4.00E-001

.    .    2    trafodion_vsbb_scan             TAO1                  4.00E-001

.    .    1    trafodion_index_scan            ITAO1                 1.00E+001



--- SQL operation complete.

Regards, --Qifan
On Mon, Jul 27, 2015 at 11:27 PM, Martin, Layla (HP DualStudy) <la...@hp.com>>> wrote:
I got one thing wrong: In the best comparison of a single test, the performance benefit is ~10%. On average, it is ~0.2% (so ,actually nothing) (I've tested with 300 scans)

Layla Martin
HP DualStudy - Telefon +49 7031 4504682<tel:%2B49%207031%204504682><tel:%2B49%207031%204504682> - layla.martin@hp.com<ma...@hp.com>>>

From: Martin, Layla (HP DualStudy)
Sent: Dienstag, 28. Juli 2015 08:00
To: dev@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>>
Subject: Index usage

Hi!

I have a similar problem to the one Anoop described in February. https://bugs.launchpad.net/trafodion/+bug/1407807
It seems, as if the index is not being used for a select statement.

I wanted to test how fast a very simple select statement (SELECT * FROM tablename WHERE somecolumn = 5) is executed.
Somecolumn is either primary key, an indexed column or some arbitrary column.

My results have shown max. 10% performance increase when using an index, but 10x the speed when using the primary key, no matter how the other columns are filled (primary key is of course autoincremented, the other columns are either filled with random numbers (10,000 rows with integers between 0 and 99) or incremented in reverse order).

It doesn't seem as if the index was used at all!

As I wasn't sure if the index was correctly built, I moved the CREATE INDEX statement from after test data creation to before it, but I didn't see any difference.

Is there a possibility to "switch on" index usage?

I used the explain options 'f' statement, but it doesn't seem to use the index at all. (When trying the same for primary key, Trafodion at least uses "OPT o" ... what exactly do these codes mean? How would a index scan exactly look like?)

Please find the results of the statements below (S_key is the indexed column, id is primary key).

>>explain options 'f' select * from testdata3 where s_key = 5;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

1    .    2    root                                                  1.64E+002
.    .    1    trafodion_scan                  TESTDATA3             1.64E+002

--- SQL operation complete.
>>explain options 'f' select * from testdata where id = 5;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

1    .    2    root                  o                               1.00E+000
.    .    1    trafodion_scan                  TESTDATA              1.00E+000

By, the way, I'm using Trafodion in a single node installation of Sandbox v1.1.

Best regards
Layla

Layla Martin
HP DualStudy
Hewlett-Packard GmbH
Telefon +49 7031 4504682<tel:%2B49%207031%204504682><tel:%2B49%207031%204504682>
layla.martin@hp.com<ma...@hp.com>>>
Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy<http://www.hp.com/de/dualstudy><http://www.hp.com/de/dualstudy><http://www.hp.com/de/dualstudy>
[Description: Description: http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogoSmall.png]
Follow us on: [facebook] <https://www.facebook.com/DualStudy.hp>  [twitter] <http://twitter.com/hpdualstudy>  [youtube] <http://www.youtube.com/user/hpDualStudy>
Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst Reichart
Vorsitzender des Aufsichtsrats: Jörg Menno Harms
Sitz der Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081
WEEE-Reg.-Nr. DE 30409072




--
Regards, --Qifan



--
Regards, --Qifan


Re: Index usage

Posted by Qifan Chen <qi...@esgyn.com>.
Yes, apply the following CQD to completely turn off index.

CQD HIDE_INDEXES 'ALL';



On Wed, Jul 29, 2015 at 6:05 AM, Martin, Layla (HP DualStudy) <
layla.martin@hp.com> wrote:

> Hi,
>
> Thanks for your help.
> It now uses the index! Can I switch the index usage of again? I think it
> still uses the index, even if it doesn’t make sense … (if I’m using 50,000
> different values for 50,000 rows …).
>
> I’ve tried “CONTROL QUERY SHARE OFF, but it didn’t do the thing … It still
> uses the index regardless of its usefulness.
> I don’t think that the reason for this behavior is execution plan caching,
> as I dropped and recreated the table in the meantime.
>
> Best regards and thanks for your help
> Layla
>
> Layla Martin
> HP DualStudy - Telefon +49 7031 4504682 - layla.martin@hp.com<mailto:
> layla.martin@hp.com>
>
> From: Qifan Chen [mailto:qifan.chen@esgyn.com]
> Sent: Dienstag, 28. Juli 2015 15:04
> To: dev; Martin, Layla (HP DualStudy)
> Subject: Re: Index usage
>
> Hi Layla,
>
> Since all non-key columns are filled with values from 0 to 99 and there
> are 10,000 rows in total, on average, there are 100 occurences of a
> distinct value per column. That will be the extra cost to select the index
> plan (100 random accesses in the worse non-overlapping case), which could
> be higher than a full scan (seen in the plan chosen).
>
> To really see the benefit of an index plan, you could either increase the
> # of rows in the table, or reduce the number of occurrences of certain
> values to be index searched.
>
>
> You can do the following to verify that the index exists and is populated
> correctly (from sqlci).
>
> set parserflags 1;
> select count(*) from table(index_table <index_name>);
>
>
> To force an index plan, do the following (from sqlci). The nested join
> operator bridges the index scan operator and the table scan operator.
>
> control query shape nested_join(cut, cut);
> explain options 'f' select * from testdata3 where s_key = 5;
>
>
> You should see trafodion_index_scan as operator in such a plan as shown
> below. Note that most of the time, the compiler will pick the right plan
> for you.
>
>
> >>create table tao1(a int, b int);
>
>
>
> --- SQL operation complete.
>
> >>create index itao1 on tao1(b);
>
>
>
> --- SQL operation complete.
>
> >>control query shape nested_join(cut,cut);
>
>
>
> --- SQL operation complete.
>
> >>prepare xx from select * from tao1 where b =1;
>
>
>
> --- SQL command prepared.
>
> >>explain options 'f' xx;
>
>
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
>
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
>
>
> 4    .    5    root
> 1.00E+001
>
> 1    3    4    nested_join
>  1.00E+001
>
> 2    .    3    probe_cache
>  4.00E-001
>
> .    .    2    trafodion_vsbb_scan             TAO1
> 4.00E-001
>
> .    .    1    trafodion_index_scan            ITAO1
>  1.00E+001
>
>
>
> --- SQL operation complete.
>
> Regards, --Qifan
>
> On Mon, Jul 27, 2015 at 11:27 PM, Martin, Layla (HP DualStudy) <
> layla.martin@hp.com<ma...@hp.com>> wrote:
> I got one thing wrong: In the best comparison of a single test, the
> performance benefit is ~10%. On average, it is ~0.2% (so ,actually nothing)
> (I've tested with 300 scans)
>
> Layla Martin
> HP DualStudy - Telefon +49 7031 4504682<tel:%2B49%207031%204504682> -
> layla.martin@hp.com<ma...@hp.com><mailto:layla.martin@hp.com
> <ma...@hp.com>>
>
> From: Martin, Layla (HP DualStudy)
> Sent: Dienstag, 28. Juli 2015 08:00
> To: dev@trafodion.incubator.apache.org<mailto:
> dev@trafodion.incubator.apache.org>
> Subject: Index usage
>
> Hi!
>
> I have a similar problem to the one Anoop described in February.
> https://bugs.launchpad.net/trafodion/+bug/1407807
> It seems, as if the index is not being used for a select statement.
>
> I wanted to test how fast a very simple select statement (SELECT * FROM
> tablename WHERE somecolumn = 5) is executed.
> Somecolumn is either primary key, an indexed column or some arbitrary
> column.
>
> My results have shown max. 10% performance increase when using an index,
> but 10x the speed when using the primary key, no matter how the other
> columns are filled (primary key is of course autoincremented, the other
> columns are either filled with random numbers (10,000 rows with integers
> between 0 and 99) or incremented in reverse order).
>
> It doesn't seem as if the index was used at all!
>
> As I wasn't sure if the index was correctly built, I moved the CREATE
> INDEX statement from after test data creation to before it, but I didn't
> see any difference.
>
> Is there a possibility to "switch on" index usage?
>
> I used the explain options 'f' statement, but it doesn't seem to use the
> index at all. (When trying the same for primary key, Trafodion at least
> uses "OPT o" ... what exactly do these codes mean? How would a index scan
> exactly look like?)
>
> Please find the results of the statements below (S_key is the indexed
> column, id is primary key).
>
> >>explain options 'f' select * from testdata3 where s_key = 5;
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
> 1    .    2    root
> 1.64E+002
> .    .    1    trafodion_scan                  TESTDATA3
>  1.64E+002
>
> --- SQL operation complete.
> >>explain options 'f' select * from testdata where id = 5;
>
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------
> ---------
>
> 1    .    2    root                  o
>  1.00E+000
> .    .    1    trafodion_scan                  TESTDATA
> 1.00E+000
>
> By, the way, I'm using Trafodion in a single node installation of Sandbox
> v1.1.
>
> Best regards
> Layla
>
> Layla Martin
> HP DualStudy
> Hewlett-Packard GmbH
> Telefon +49 7031 4504682<tel:%2B49%207031%204504682>
> layla.martin@hp.com<ma...@hp.com><mailto:layla.martin@hp.com
> <ma...@hp.com>>
> Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy<
> http://www.hp.com/de/dualstudy><http://www.hp.com/de/dualstudy>
> [Description: Description:
> http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogoSmall.png
> ]
> Follow us on: [facebook] <https://www.facebook.com/DualStudy.hp>
> [twitter] <http://twitter.com/hpdualstudy>  [youtube] <
> http://www.youtube.com/user/hpDualStudy>
> Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard
> Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst Reichart
> Vorsitzender des Aufsichtsrats: Jörg Menno Harms
> Sitz der Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081
> WEEE-Reg.-Nr. DE 30409072
>
>
>
>
> --
> Regards, --Qifan
>
>


-- 
Regards, --Qifan

RE: Index usage

Posted by "Martin, Layla (HP DualStudy)" <la...@hp.com>.
Hi,

Thanks for your help.
It now uses the index! Can I switch the index usage of again? I think it still uses the index, even if it doesn’t make sense … (if I’m using 50,000 different values for 50,000 rows …).

I’ve tried “CONTROL QUERY SHARE OFF, but it didn’t do the thing … It still uses the index regardless of its usefulness.
I don’t think that the reason for this behavior is execution plan caching, as I dropped and recreated the table in the meantime.

Best regards and thanks for your help
Layla

Layla Martin
HP DualStudy - Telefon +49 7031 4504682 - layla.martin@hp.com<ma...@hp.com>

From: Qifan Chen [mailto:qifan.chen@esgyn.com]
Sent: Dienstag, 28. Juli 2015 15:04
To: dev; Martin, Layla (HP DualStudy)
Subject: Re: Index usage

Hi Layla,

Since all non-key columns are filled with values from 0 to 99 and there are 10,000 rows in total, on average, there are 100 occurences of a distinct value per column. That will be the extra cost to select the index plan (100 random accesses in the worse non-overlapping case), which could be higher than a full scan (seen in the plan chosen).

To really see the benefit of an index plan, you could either increase the # of rows in the table, or reduce the number of occurrences of certain values to be index searched.


You can do the following to verify that the index exists and is populated correctly (from sqlci).

set parserflags 1;
select count(*) from table(index_table <index_name>);


To force an index plan, do the following (from sqlci). The nested join operator bridges the index scan operator and the table scan operator.

control query shape nested_join(cut, cut);
explain options 'f' select * from testdata3 where s_key = 5;


You should see trafodion_index_scan as operator in such a plan as shown below. Note that most of the time, the compiler will pick the right plan for you.


>>create table tao1(a int, b int);



--- SQL operation complete.

>>create index itao1 on tao1(b);



--- SQL operation complete.

>>control query shape nested_join(cut,cut);



--- SQL operation complete.

>>prepare xx from select * from tao1 where b =1;



--- SQL command prepared.

>>explain options 'f' xx;



LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD

---- ---- ---- --------------------  --------  --------------------  ---------



4    .    5    root                                                  1.00E+001

1    3    4    nested_join                                           1.00E+001

2    .    3    probe_cache                                           4.00E-001

.    .    2    trafodion_vsbb_scan             TAO1                  4.00E-001

.    .    1    trafodion_index_scan            ITAO1                 1.00E+001



--- SQL operation complete.

Regards, --Qifan

On Mon, Jul 27, 2015 at 11:27 PM, Martin, Layla (HP DualStudy) <la...@hp.com>> wrote:
I got one thing wrong: In the best comparison of a single test, the performance benefit is ~10%. On average, it is ~0.2% (so ,actually nothing) (I've tested with 300 scans)

Layla Martin
HP DualStudy - Telefon +49 7031 4504682<tel:%2B49%207031%204504682> - layla.martin@hp.com<ma...@hp.com>>

From: Martin, Layla (HP DualStudy)
Sent: Dienstag, 28. Juli 2015 08:00
To: dev@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: Index usage

Hi!

I have a similar problem to the one Anoop described in February. https://bugs.launchpad.net/trafodion/+bug/1407807
It seems, as if the index is not being used for a select statement.

I wanted to test how fast a very simple select statement (SELECT * FROM tablename WHERE somecolumn = 5) is executed.
Somecolumn is either primary key, an indexed column or some arbitrary column.

My results have shown max. 10% performance increase when using an index, but 10x the speed when using the primary key, no matter how the other columns are filled (primary key is of course autoincremented, the other columns are either filled with random numbers (10,000 rows with integers between 0 and 99) or incremented in reverse order).

It doesn't seem as if the index was used at all!

As I wasn't sure if the index was correctly built, I moved the CREATE INDEX statement from after test data creation to before it, but I didn't see any difference.

Is there a possibility to "switch on" index usage?

I used the explain options 'f' statement, but it doesn't seem to use the index at all. (When trying the same for primary key, Trafodion at least uses "OPT o" ... what exactly do these codes mean? How would a index scan exactly look like?)

Please find the results of the statements below (S_key is the indexed column, id is primary key).

>>explain options 'f' select * from testdata3 where s_key = 5;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

1    .    2    root                                                  1.64E+002
.    .    1    trafodion_scan                  TESTDATA3             1.64E+002

--- SQL operation complete.
>>explain options 'f' select * from testdata where id = 5;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

1    .    2    root                  o                               1.00E+000
.    .    1    trafodion_scan                  TESTDATA              1.00E+000

By, the way, I'm using Trafodion in a single node installation of Sandbox v1.1.

Best regards
Layla

Layla Martin
HP DualStudy
Hewlett-Packard GmbH
Telefon +49 7031 4504682<tel:%2B49%207031%204504682>
layla.martin@hp.com<ma...@hp.com>>
Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy<http://www.hp.com/de/dualstudy><http://www.hp.com/de/dualstudy>
[Description: Description: http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogoSmall.png]
Follow us on: [facebook] <https://www.facebook.com/DualStudy.hp>  [twitter] <http://twitter.com/hpdualstudy>  [youtube] <http://www.youtube.com/user/hpDualStudy>
Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst Reichart
Vorsitzender des Aufsichtsrats: Jörg Menno Harms
Sitz der Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081
WEEE-Reg.-Nr. DE 30409072




--
Regards, --Qifan


RE: Index usage

Posted by gselva <se...@esgyn.com>.
I came across the following situation while making changes in the Trafodion engine.  I am not sure if Trafodion engine should pick index for the second query. Is there a rationale for not choosing the index plan.

>>set schema selva ;

 

--- SQL operation complete.

>>explain options 'f' select c from t12 ;

 

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD

---- ---- ---- --------------------  --------  --------------------  ---------

 

1    .    2    root                                                  1.00E+002

.    .    1    trafodion_index_scan            T12I2                 1.00E+002

 

--- SQL operation complete.

>>explain options 'f' select c from t12 where c = ? ;

 

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD

---- ---- ---- --------------------  --------  --------------------  ---------

 

1    .    2    root                                                  5.00E+001

.    .    1    trafodion_scan                  T12                   5.00E+001

 

--- SQL operation complete.

>>showddl t12 ;

 

CREATE TABLE TRAFODION.SELVA.T12

  (

    A                                INT NO DEFAULT NOT NULL NOT DROPPABLE

  , B                                INT NO DEFAULT NOT NULL NOT DROPPABLE

  , C                                INT DEFAULT NULL

  , PRIMARY KEY (A ASC)

  )

  SALT USING 2 PARTITIONS

;

 

CREATE INDEX T12I1 ON TRAFODION.SELVA.T12

  (

    B ASC

  )

;

 

CREATE INDEX T12I2 ON TRAFODION.SELVA.T12

  (

    C ASC

  )

SALT LIKE TABLE

;

 

-- GRANT SELECT, INSERT, DELETE, UPDATE, REFERENCES ON TRAFODION.SELVA.T12 TO DB__ROOT WITH GRANT OPTION;

 

--- SQL operation complete.

>> 

 

I am seeing similar plan when the columns c and a are in the select list.

 

Selva

 

-----Original Message-----
From: Rohit Jain [mailto:rohit.jain@esgyn.com] 
Sent: Tuesday, July 28, 2015 6:25 AM
To: dev@trafodion.incubator.apache.org; layla.martin@hp.com
Subject: Re: Index usage

 

If you create indexes with the intention of getting index only access for your queries — i.e. Include all columns in the index that would be relevant for the query, then you would certainly see benefit from a query that leverages an index column not otherwise available in the primary key set of columns, or not leverageable by MDAM.  But if you have index access that then has to get relevant column information from the base table that you created the index on, then you essentially now have random scattered access on your base table after retrieving the index rows.  The breakeven of a fast full table scan with large I/O and pre-fetch, will very quickly yield the index not very useful.  That is, the breakeven point is surprisingly very very low.  

 

We should probably calibrate that for Trafodion, but of course, it varies with the primary key size (as you might know Trafodion stores this as part of the secondary index in order to reference the base table row that is indexed), and the number and size of columns that are part of the secondary index.  

 

That is why in Trafodion we try to advice customers to leverage the Primary Key and MDAM as much as possible before considering the creation of an index, since the usefulness of an index is relatively limited AND then there is a rather large maintenance / latency on update, and space cost that you are incurring for the index.  Of course, there are situations where an index is unavoidable and it is too expensive to make it an index only scan.  So thanks for testing the breakeven point for such situations :-)

 

 

 

 

On 7/28/15, 9:04 PM, "Qifan Chen" < <ma...@esgyn.com> qifan.chen@esgyn.com> wrote:

 

>Hi Layla,

> 

>Since all non-key columns are filled with values from 0 to 99 and there 

>are

>10,000 rows in total, on average, there are 100 occurences of a 

>distinct value per column. That will be the extra cost to select the 

>index plan (100 random accesses in the worse non-overlapping case), 

>which could be higher than a full scan (seen in the plan chosen).

> 

>To really see the benefit of an index plan, you could either increase 

>the # of rows in the table, or reduce the number of occurrences of 

>certain values to be index searched.

> 

> 

>You can do the following to verify that the index exists and is 

>populated correctly (from sqlci).

> 

>set parserflags 1;

>select count(*) from table(index_table <index_name>);

> 

> 

>To force an index plan, do the following (from sqlci). The nested join 

>operator bridges the index scan operator and the table scan operator.

> 

>control query shape nested_join(cut, cut); explain options 'f' select * 

>from testdata3 where s_key = 5;

> 

> 

>You should see trafodion_index_scan as operator in such a plan as shown 

>below. Note that most of the time, the compiler will pick the right 

>plan for you.

> 

>>>create table tao1(a int, b int);

> 

> 

>--- SQL operation complete.

> 

>>>create index itao1 on tao1(b);

> 

> 

>--- SQL operation complete.

> 

>>>control query shape nested_join(cut,cut);

> 

> 

>--- SQL operation complete.

> 

>>>prepare xx from select * from tao1 where b =1;

> 

> 

>--- SQL command prepared.

> 

>>>explain options 'f' xx;

> 

> 

>LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD

> 

>---- ---- ---- --------------------  --------  --------------------

>---------

> 

> 

>4    .    5    root

>1.00E+001

> 

>1    3    4    nested_join

>1.00E+001

> 

>2    .    3    probe_cache

>4.00E-001

> 

>.    .    2    trafodion_vsbb_scan             TAO1

>4.00E-001

> 

>.    .    1    trafodion_index_scan            ITAO1

>1.00E+001

> 

> 

>--- SQL operation complete.

> 

>Regards, --Qifan

> 

>On Mon, Jul 27, 2015 at 11:27 PM, Martin, Layla (HP DualStudy) < 

> <ma...@hp.com> layla.martin@hp.com> wrote:

> 

>> I got one thing wrong: In the best comparison of a single test, the 

>> performance benefit is ~10%. On average, it is ~0.2% (so ,actually 

>> nothing) (I've tested with 300 scans)

>> 

>> Layla Martin

>> HP DualStudy - Telefon +49 7031 4504682 -  <mailto:layla.martin@hp.com%3cmailto> layla.martin@hp.com<mailto:

>>  <ma...@hp.com> layla.martin@hp.com>

>> 

>> From: Martin, Layla (HP DualStudy)

>> Sent: Dienstag, 28. Juli 2015 08:00

>> To:  <ma...@trafodion.incubator.apache.org> dev@trafodion.incubator.apache.org

>> Subject: Index usage

>> 

>> Hi!

>> 

>> I have a similar problem to the one Anoop described in February.

>>  <https://bugs.launchpad.net/trafodion/+bug/1407807> https://bugs.launchpad.net/trafodion/+bug/1407807

>> It seems, as if the index is not being used for a select statement.

>> 

>> I wanted to test how fast a very simple select statement (SELECT * 

>> FROM tablename WHERE somecolumn = 5) is executed.

>> Somecolumn is either primary key, an indexed column or some arbitrary 

>> column.

>> 

>> My results have shown max. 10% performance increase when using an 

>> index, but 10x the speed when using the primary key, no matter how 

>> the other columns are filled (primary key is of course 

>> autoincremented, the other columns are either filled with random 

>> numbers (10,000 rows with integers between 0 and 99) or incremented in reverse order).

>> 

>> It doesn't seem as if the index was used at all!

>> 

>> As I wasn't sure if the index was correctly built, I moved the CREATE 

>> INDEX statement from after test data creation to before it, but I 

>> didn't see any difference.

>> 

>> Is there a possibility to "switch on" index usage?

>> 

>> I used the explain options 'f' statement, but it doesn't seem to use 

>> the index at all. (When trying the same for primary key, Trafodion at 

>> least uses "OPT o" ... what exactly do these codes mean? How would a 

>> index scan exactly look like?)

>> 

>> Please find the results of the statements below (S_key is the indexed 

>> column, id is primary key).

>> 

>> >>explain options 'f' select * from testdata3 where s_key = 5;

>> 

>> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD

>> ---- ---- ---- --------------------  --------  --------------------

>> ---------

>> 

>> 1    .    2    root

>> 1.64E+002

>> .    .    1    trafodion_scan                  TESTDATA3

>>  1.64E+002

>> 

>> --- SQL operation complete.

>> >>explain options 'f' select * from testdata where id = 5;

>> 

>> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD

>> ---- ---- ---- --------------------  --------  --------------------

>> ---------

>> 

>> 1    .    2    root                  o

>>  1.00E+000

>> .    .    1    trafodion_scan                  TESTDATA

>> 1.00E+000

>> 

>> By, the way, I'm using Trafodion in a single node installation of 

>> Sandbox v1.1.

>> 

>> Best regards

>> Layla

>> 

>> Layla Martin

>> HP DualStudy

>> Hewlett-Packard GmbH

>> Telefon +49 7031 4504682

>>  <ma...@hp.com> layla.martin@hp.com<ma...@hp.com>

>> Herrenberger Strasse 140| 71034 Böblingen|  <http://www.hp.com/de/dualstudy%3c> www.hp.com/de/dualstudy< 

>>  <http://www.hp.com/de/dualstudy> http://www.hp.com/de/dualstudy>

>> [Description: Description:

>>  <http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLog> http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLog

>> oSmall.png

>> ]

>> Follow us on: [facebook] < <https://www.facebook.com/DualStudy.hp> https://www.facebook.com/DualStudy.hp>

>> [twitter] < <http://twitter.com/hpdualstudy> http://twitter.com/hpdualstudy>  [youtube] < 

>>  <http://www.youtube.com/user/hpDualStudy> http://www.youtube.com/user/hpDualStudy>

>> Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard 

>> Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst 

>> Reichart Vorsitzender des Aufsichtsrats: Jörg Menno Harms Sitz der 

>> Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081 

>> WEEE-Reg.-Nr. DE 30409072

>> 

>> 

>> 

> 

> 

>--

>Regards, --Qifan

 


Re: Index usage

Posted by Rohit Jain <ro...@esgyn.com>.
If you create indexes with the intention of getting index only access for your queries — i.e. Include all columns in the index that would be relevant for the query, then you would certainly see benefit from a query that leverages an index column not otherwise available in the primary key set of columns, or not leverageable by MDAM.  But if you have index access that then has to get relevant column information from the base table that you created the index on, then you essentially now have random scattered access on your base table after retrieving the index rows.  The breakeven of a fast full table scan with large I/O and pre-fetch, will very quickly yield the index not very useful.  That is, the breakeven point is surprisingly very very low.  

We should probably calibrate that for Trafodion, but of course, it varies with the primary key size (as you might know Trafodion stores this as part of the secondary index in order to reference the base table row that is indexed), and the number and size of columns that are part of the secondary index.  

That is why in Trafodion we try to advice customers to leverage the Primary Key and MDAM as much as possible before considering the creation of an index, since the usefulness of an index is relatively limited AND then there is a rather large maintenance / latency on update, and space cost that you are incurring for the index.  Of course, there are situations where an index is unavoidable and it is too expensive to make it an index only scan.  So thanks for testing the breakeven point for such situations :-)




On 7/28/15, 9:04 PM, "Qifan Chen" <qi...@esgyn.com> wrote:

>Hi Layla,
>
>Since all non-key columns are filled with values from 0 to 99 and there are
>10,000 rows in total, on average, there are 100 occurences of a distinct
>value per column. That will be the extra cost to select the index plan (100
>random accesses in the worse non-overlapping case), which could be higher
>than a full scan (seen in the plan chosen).
>
>To really see the benefit of an index plan, you could either increase the #
>of rows in the table, or reduce the number of occurrences of certain values
>to be index searched.
>
>
>You can do the following to verify that the index exists and is populated
>correctly (from sqlci).
>
>set parserflags 1;
>select count(*) from table(index_table <index_name>);
>
>
>To force an index plan, do the following (from sqlci). The nested join
>operator bridges the index scan operator and the table scan operator.
>
>control query shape nested_join(cut, cut);
>explain options 'f' select * from testdata3 where s_key = 5;
>
>
>You should see trafodion_index_scan as operator in such a plan as shown
>below. Note that most of the time, the compiler will pick the right plan
>for you.
>
>>>create table tao1(a int, b int);
>
>
>--- SQL operation complete.
>
>>>create index itao1 on tao1(b);
>
>
>--- SQL operation complete.
>
>>>control query shape nested_join(cut,cut);
>
>
>--- SQL operation complete.
>
>>>prepare xx from select * from tao1 where b =1;
>
>
>--- SQL command prepared.
>
>>>explain options 'f' xx;
>
>
>LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
>
>---- ---- ---- --------------------  --------  --------------------
>---------
>
>
>4    .    5    root
>1.00E+001
>
>1    3    4    nested_join
>1.00E+001
>
>2    .    3    probe_cache
>4.00E-001
>
>.    .    2    trafodion_vsbb_scan             TAO1
>4.00E-001
>
>.    .    1    trafodion_index_scan            ITAO1
>1.00E+001
>
>
>--- SQL operation complete.
>
>Regards, --Qifan
>
>On Mon, Jul 27, 2015 at 11:27 PM, Martin, Layla (HP DualStudy) <
>layla.martin@hp.com> wrote:
>
>> I got one thing wrong: In the best comparison of a single test, the
>> performance benefit is ~10%. On average, it is ~0.2% (so ,actually nothing)
>> (I've tested with 300 scans)
>>
>> Layla Martin
>> HP DualStudy - Telefon +49 7031 4504682 - layla.martin@hp.com<mailto:
>> layla.martin@hp.com>
>>
>> From: Martin, Layla (HP DualStudy)
>> Sent: Dienstag, 28. Juli 2015 08:00
>> To: dev@trafodion.incubator.apache.org
>> Subject: Index usage
>>
>> Hi!
>>
>> I have a similar problem to the one Anoop described in February.
>> https://bugs.launchpad.net/trafodion/+bug/1407807
>> It seems, as if the index is not being used for a select statement.
>>
>> I wanted to test how fast a very simple select statement (SELECT * FROM
>> tablename WHERE somecolumn = 5) is executed.
>> Somecolumn is either primary key, an indexed column or some arbitrary
>> column.
>>
>> My results have shown max. 10% performance increase when using an index,
>> but 10x the speed when using the primary key, no matter how the other
>> columns are filled (primary key is of course autoincremented, the other
>> columns are either filled with random numbers (10,000 rows with integers
>> between 0 and 99) or incremented in reverse order).
>>
>> It doesn't seem as if the index was used at all!
>>
>> As I wasn't sure if the index was correctly built, I moved the CREATE
>> INDEX statement from after test data creation to before it, but I didn't
>> see any difference.
>>
>> Is there a possibility to "switch on" index usage?
>>
>> I used the explain options 'f' statement, but it doesn't seem to use the
>> index at all. (When trying the same for primary key, Trafodion at least
>> uses "OPT o" ... what exactly do these codes mean? How would a index scan
>> exactly look like?)
>>
>> Please find the results of the statements below (S_key is the indexed
>> column, id is primary key).
>>
>> >>explain options 'f' select * from testdata3 where s_key = 5;
>>
>> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
>> ---- ---- ---- --------------------  --------  --------------------
>> ---------
>>
>> 1    .    2    root
>> 1.64E+002
>> .    .    1    trafodion_scan                  TESTDATA3
>>  1.64E+002
>>
>> --- SQL operation complete.
>> >>explain options 'f' select * from testdata where id = 5;
>>
>> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
>> ---- ---- ---- --------------------  --------  --------------------
>> ---------
>>
>> 1    .    2    root                  o
>>  1.00E+000
>> .    .    1    trafodion_scan                  TESTDATA
>> 1.00E+000
>>
>> By, the way, I'm using Trafodion in a single node installation of Sandbox
>> v1.1.
>>
>> Best regards
>> Layla
>>
>> Layla Martin
>> HP DualStudy
>> Hewlett-Packard GmbH
>> Telefon +49 7031 4504682
>> layla.martin@hp.com<ma...@hp.com>
>> Herrenberger Strasse 140| 71034 Böblingen| www.hp.com/de/dualstudy<
>> http://www.hp.com/de/dualstudy>
>> [Description: Description:
>> http://intranet.hp.com/country/germany/live/PublishingImages/NeuesLogoSmall.png
>> ]
>> Follow us on: [facebook] <https://www.facebook.com/DualStudy.hp>
>> [twitter] <http://twitter.com/hpdualstudy>  [youtube] <
>> http://www.youtube.com/user/hpDualStudy>
>> Geschäftsführer: Heiko Meyer (Vorsitzender), Thomas Bässler, Volkhard
>> Bregulla, Michael Eberhardt, Jochen Erlach, Angelika Gifford, Ernst Reichart
>> Vorsitzender des Aufsichtsrats: Jörg Menno Harms
>> Sitz der Gesellschaft: Böblingen, Amtsgericht Stuttgart HRB 244081
>> WEEE-Reg.-Nr. DE 30409072
>>
>>
>>
>
>
>-- 
>Regards, --Qifan