You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Alice Chen (JIRA)" <ji...@apache.org> on 2015/07/22 20:16:02 UTC
[jira] [Created] (TRAFODION-436) LP Bug: 1340432 - Oversampling has no effect

Alice Chen created TRAFODION-436:
------------------------------------

             Summary: LP Bug: 1340432 - Oversampling has no effect
                 Key: TRAFODION-436
                 URL: https://issues.apache.org/jira/browse/TRAFODION-436
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-exe
            Reporter: Weishiun Tsai
            Assignee: Barry Fritchman
            Priority: Critical


The following statements specify ‘random 150 percent’ as the sampling option for the select count(*) statements.  The Trafodion SQL Reference Manual at http://docs.trafodion.org/Trafodion_SQL_Reference_Manual_0.8.0.pdf has the following description about oversampling:

“If you specify a sample size greater than 100 PERCENT, Trafodion SQL returns all the rows in the result table plus duplicate rows. The duplicate rows are picked from the result table according to the specified sampling method. This technique is called oversampling.”

The test passed in previously builds (v0629_0930 build or earlier).  It means that the row counts from the oversampling were more or less in line with the description.  But the result from the v0707_0230 build started to show that the sampling does not go beyond 100% of the table row count.  Oversampling does not seem to have any effect anymore.

This is seen on the v0707_0230 build installed on a workstation.

------------------------------------------------

Here is the entire script to reproduce this problem:

drop schema mytest1 cascade;
set schema mytest1;

create table samptb054 (
empid   numeric (4) unsigned not null not droppable,
dnum    numeric (4) unsigned not null not droppable,
salary numeric (8,2) unsigned,
age     integer,
sex     char (6),
primary key (empid) not droppable
);

insert into samptb054 values
(1234, 3333, 20000.00, 16, 'MALE'),
(5678, 3333, 50000.00, 35, 'FEMALE'),
(9019, 3333, 40000.00, 30, 'MALE'),
(3455, 4444, 45000.00, 31, 'FEMALE'),
(6789, 4444, 55000.00, 36, 'MALE'),
(0123, 3333, 25000.00, 20, 'MALE'),
(4567, 4444, 30000.00, 23, 'FEMALE'),
(8901, 5555, 40000.00, 56, 'MALE'),
(2345, 5555, 36000.00, 29, 'FEMALE'),
(6799, 5555, 60000.00, 60, 'MALE'),
(9123, 3333, 20050.00, 18, 'MALE'),
(4900, 3333, 45000.00, 34, 'FEMALE'),
(6234, 3333, 20000.00, 60, 'MALE'),
(6678, 3333, 50000.00, 59, 'FEMALE'),
(6012, 3333, 40000.00, 49, 'MALE'),
(6455, 4444, 45000.00, 39, 'FEMALE'),
(7789, 4444, 55000.00, 38, 'MALE'),
(6123, 3333, 25000.00, 37, 'MALE'),
(6567, 4444, 30000.00, 24, 'FEMALE'),
(6901, 5555, 40000.00, 22, 'MALE'),
(6345, 5555, 36000.00, 21, 'FEMALE'),
(6798, 5555, 60000.00, 19, 'MALE'),
(6133, 3333, 20050.00, 80, 'MALE'),
(6900, 3333, 45000.00, 61, 'FEMALE'),
(4909, 3333, 45000.00, 74, 'FEMALE'),
(9234, 3333, 20000.00, 62, 'MALE'),
(9678, 3333, 50000.00, 99, 'FEMALE'),
(9012, 3333, 40000.00,  1, 'MALE'),
(9455, 4444, 45000.00,  2, 'FEMALE'),
(9789, 4444, 55000.00,  3, 'MALE');

select count (*) from samptb054 sample random 150 percent;

delete from samptb054 where empid = 9789;

select count (*) from samptb054 sample random 150 percent;

------------------------------------------------

Here is the execution output:

>>drop schema mytest1 cascade;

--- SQL operation complete.
>>set schema mytest1;

--- SQL operation complete.
>>
>>create table samptb054 (
+>empid   numeric (4) unsigned not null not droppable,
+>dnum    numeric (4) unsigned not null not droppable,
+>salary numeric (8,2) unsigned,
+>age     integer,
+>sex     char (6),
+>primary key (empid) not droppable
+>);

--- SQL operation complete.
>>
>>insert into samptb054 values
+>(1234, 3333, 20000.00, 16, 'MALE'),
+>(5678, 3333, 50000.00, 35, 'FEMALE'),
+>(9019, 3333, 40000.00, 30, 'MALE'),
+>(3455, 4444, 45000.00, 31, 'FEMALE'),
+>(6789, 4444, 55000.00, 36, 'MALE'),
+>(0123, 3333, 25000.00, 20, 'MALE'),
+>(4567, 4444, 30000.00, 23, 'FEMALE'),
+>(8901, 5555, 40000.00, 56, 'MALE'),
+>(2345, 5555, 36000.00, 29, 'FEMALE'),
+>(6799, 5555, 60000.00, 60, 'MALE'),
+>(9123, 3333, 20050.00, 18, 'MALE'),
+>(4900, 3333, 45000.00, 34, 'FEMALE'),
+>(6234, 3333, 20000.00, 60, 'MALE'),
+>(6678, 3333, 50000.00, 59, 'FEMALE'),
+>(6012, 3333, 40000.00, 49, 'MALE'),
+>(6455, 4444, 45000.00, 39, 'FEMALE'),
+>(7789, 4444, 55000.00, 38, 'MALE'),
+>(6123, 3333, 25000.00, 37, 'MALE'),
+>(6567, 4444, 30000.00, 24, 'FEMALE'),
+>(6901, 5555, 40000.00, 22, 'MALE'),
+>(6345, 5555, 36000.00, 21, 'FEMALE'),
+>(6798, 5555, 60000.00, 19, 'MALE'),
+>(6133, 3333, 20050.00, 80, 'MALE'),
+>(6900, 3333, 45000.00, 61, 'FEMALE'),
+>(4909, 3333, 45000.00, 74, 'FEMALE'),
+>(9234, 3333, 20000.00, 62, 'MALE'),
+>(9678, 3333, 50000.00, 99, 'FEMALE'),
+>(9012, 3333, 40000.00,  1, 'MALE'),
+>(9455, 4444, 45000.00,  2, 'FEMALE'),
+>(9789, 4444, 55000.00,  3, 'MALE');

--- 30 row(s) inserted.
>>
>>select count (*) from samptb054 sample random 150 percent;

(EXPR)
--------------------

                  30

--- 1 row(s) selected.
>>
>>delete from samptb054 where empid = 9789;

--- 1 row(s) deleted.
>>
>>select count (*) from samptb054 sample random 150 percent;

(EXPR)
--------------------

                  29

--- 1 row(s) selected.
>>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)