You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by Eric Owhadi <er...@esgyn.com> on 2015/08/11 23:24:30 UTC

question on regression test development with update statistics

I am developing regression test for the small scanner feature.

In order to test the logic, I created 2 tables with roughly 4000 bytes row
size and one has default 64K hbase block size, the other has 128K.



By doing an explain select * from t1311k where uniq>2 and uniq <30, I am
expecting that the 29 rows scanned will be ether bellow or above the hbase
block size when hitting the 64 or 128 K block table.



However in order for the compiler/optimizer to accurately evaluate row
accessed, I update statistics on both tables.



Scripting all this on a test, it does not work, as it appears that the
update statistics is not having any effect.

If I close sqlci, and restart, then the correct behavior happen.



So I intuit that the NATable associated with my tables are cached, for each
session, and that update statistics is not invalidating the cache. Am I
right?

If so, is there a programmatic way to force the NATable cache to refresh?



Eric

RE: question on regression test development with update statistics

Posted by Roberta Marton <ro...@esgyn.com>.
You can turn off query, metadata, and histogram caching:

cqd QUERY_CACHE '0';  -- query cache
cqd METADATA_CACHE_SIZE '0'; -- metadata cache
cqd CACHE_HISTOGRAMS 'OFF'; -- turns off caching of histograms.

   Roberta

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, August 11, 2015 3:30 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: question on regression test development with update statistics

Hi Selva,
Thanks for the help, but I came up with a solution:
Run the DML steps of the test script with a new shell so that a new session
is forced. Therefore the cached histogram is not used and the result of
update statistics is taken into account.
This is nice trick to know for Level 1 test development.
Eric

-----Original Message-----
From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Tuesday, August 11, 2015 5:02 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: question on regression test development with update statistics

If the table name are different, then there should be a separate entry in
NATable cache for each of them.  If these attributes are part of NATable,
then there is no need to invalidate the cache entry. If the table names are
same (you dropped the table and created the table with the same name but
with different block size), the Trafodion engine automatically removes the
NATable entry corresponding to this table from the cache when the table is
dropped. Additionally, it uses RMS infrastructure and hence the trafodion
engine removes the cache entry from all processes if it exists. So, I would
think it may not be due to NATable cache, but you may need to confirm it by
debugging.

Selva

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, August 11, 2015 2:25 PM
To: dev@trafodion.incubator.apache.org
Subject: question on regression test development with update statistics

I am developing regression test for the small scanner feature.

In order to test the logic, I created 2 tables with roughly 4000 bytes row
size and one has default 64K hbase block size, the other has 128K.



By doing an explain select * from t1311k where uniq>2 and uniq <30, I am
expecting that the 29 rows scanned will be ether bellow or above the hbase
block size when hitting the 64 or 128 K block table.



However in order for the compiler/optimizer to accurately evaluate row
accessed, I update statistics on both tables.



Scripting all this on a test, it does not work, as it appears that the
update statistics is not having any effect.

If I close sqlci, and restart, then the correct behavior happen.



So I intuit that the NATable associated with my tables are cached, for each
session, and that update statistics is not invalidating the cache. Am I
right?

If so, is there a programmatic way to force the NATable cache to refresh?



Eric

RE: question on regression test development with update statistics

Posted by Eric Owhadi <er...@esgyn.com>.
Hi Selva,
Thanks for the help, but I came up with a solution:
Run the DML steps of the test script with a new shell so that a new session
is forced. Therefore the cached histogram is not used and the result of
update statistics is taken into account.
This is nice trick to know for Level 1 test development.
Eric

-----Original Message-----
From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Tuesday, August 11, 2015 5:02 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: question on regression test development with update statistics

If the table name are different, then there should be a separate entry in
NATable cache for each of them.  If these attributes are part of NATable,
then there is no need to invalidate the cache entry. If the table names are
same (you dropped the table and created the table with the same name but
with different block size), the Trafodion engine automatically removes the
NATable entry corresponding to this table from the cache when the table is
dropped. Additionally, it uses RMS infrastructure and hence the trafodion
engine removes the cache entry from all processes if it exists. So, I would
think it may not be due to NATable cache, but you may need to confirm it by
debugging.

Selva

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, August 11, 2015 2:25 PM
To: dev@trafodion.incubator.apache.org
Subject: question on regression test development with update statistics

I am developing regression test for the small scanner feature.

In order to test the logic, I created 2 tables with roughly 4000 bytes row
size and one has default 64K hbase block size, the other has 128K.



By doing an explain select * from t1311k where uniq>2 and uniq <30, I am
expecting that the 29 rows scanned will be ether bellow or above the hbase
block size when hitting the 64 or 128 K block table.



However in order for the compiler/optimizer to accurately evaluate row
accessed, I update statistics on both tables.



Scripting all this on a test, it does not work, as it appears that the
update statistics is not having any effect.

If I close sqlci, and restart, then the correct behavior happen.



So I intuit that the NATable associated with my tables are cached, for each
session, and that update statistics is not invalidating the cache. Am I
right?

If so, is there a programmatic way to force the NATable cache to refresh?



Eric

RE: question on regression test development with update statistics

Posted by Selva Govindarajan <se...@esgyn.com>.
If the table name are different, then there should be a separate entry in
NATable cache for each of them.  If these attributes are part of NATable,
then there is no need to invalidate the cache entry. If the table names are
same (you dropped the table and created the table with the same name but
with different block size), the Trafodion engine automatically removes the
NATable entry corresponding to this table from the cache when the table is
dropped. Additionally, it uses RMS infrastructure and hence the trafodion
engine removes the cache entry from all processes if it exists. So, I would
think it may not be due to NATable cache, but you may need to confirm it by
debugging.

Selva

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, August 11, 2015 2:25 PM
To: dev@trafodion.incubator.apache.org
Subject: question on regression test development with update statistics

I am developing regression test for the small scanner feature.

In order to test the logic, I created 2 tables with roughly 4000 bytes row
size and one has default 64K hbase block size, the other has 128K.



By doing an explain select * from t1311k where uniq>2 and uniq <30, I am
expecting that the 29 rows scanned will be ether bellow or above the hbase
block size when hitting the 64 or 128 K block table.



However in order for the compiler/optimizer to accurately evaluate row
accessed, I update statistics on both tables.



Scripting all this on a test, it does not work, as it appears that the
update statistics is not having any effect.

If I close sqlci, and restart, then the correct behavior happen.



So I intuit that the NATable associated with my tables are cached, for each
session, and that update statistics is not invalidating the cache. Am I
right?

If so, is there a programmatic way to force the NATable cache to refresh?



Eric