You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Rajesh Kartha <ka...@gmail.com> on 2006/09/21 02:35:43 UTC

Question on EXCEPT ALL and INTERSECT ALL

Hello,

Based on the DERBY docs at:

http://db.apache.org/derby/docs/dev/ref/rrefsqlj21571.html

adding an ALL to UNION/EXCEPT/INTERSECT gets duplicate rows and the 
default is DISTINCT.

I noticed the rows returned were different between

ij>(select i from t1 union all select j from t2) except ( select k from t3);
1
-----------
6
8
9

3 rows selected
ij> --
--  I expected 6,6,8,9 only
--
(select i from t1 union all select j from t2) except all (select k from t3);
1
-----------
1
1
2
3
6
6
8
9

8 rows selected

If the EXCEPT /EXCEPT ALL are used on tables, they seem to work fine:
ij> select * from t1 except select k from t3;
1
-----------
6
8

2 rows selected
ij> select * from t1 except all select k from t3;
1
-----------
6
6
8
--------------------------------------------------------------------------
--------------------------------------------------------------------------
And for INTERSECT, there is no difference in the rows returned with and 
without ALL.
on simple tables  and  when used with UNIONS as well.

ij> select * from t1 intersect select k from t3;
1
-----------
1
2
3

3 rows selected
ij> select * from t1 intersect all select k from t3;
1
-----------
1
2
3

(See below for the full statement execution).

I am not sure on the above, can someone please share their thoughts/comments
on the expected behaviour.

Also I think it would be good to update the docs with
some examples of using EXCEPT and INTERSECT.

-Rajesh

====================================================
ij version 10.2
ij> connect 'jdbc:derby:testdb;create=true';
ij> CREATE TABLE t1 (i int);
0 rows inserted/updated/deleted
ij> CREATE TABLE t2 (j int);
0 rows inserted/updated/deleted
ij> CREATE TABLE t3 (k int);
0 rows inserted/updated/deleted
ij> INSERT INTO t1 VALUES 3,2,1,6,8,6;
6 rows inserted/updated/deleted
ij> INSERT INTO t2 VALUES 1,2,3,4,9,1;
6 rows inserted/updated/deleted
ij> INSERT INTO t3 VALUES 5,2,3,4,1;
5 rows inserted/updated/deleted
ij> select i from t1 union all select j from t2;
1
-----------
3
2
1
6
8
6
1
2
3
4
9
1

12 rows selected
ij> select k from t3;
K
-----------
5
2
3
4
1

5 rows selected
ij> --
--Except and Except All
--
(select i from t1 union all select j from t2) except ( select k from t3);
1
-----------
6
8
9

3 rows selected
ij> --
-- Are these Wrong results, I expected 6,6,8,9 only
--
(select i from t1 union all select j from t2) except all (select k from t3);
1
-----------
1
1
2
3
6
6
8
9

8 rows selected
ij> --
--Intersect
--
(select i from t1 union all select j from t2) intersect ( select k from t3);
1
-----------
1
2
3
4

4 rows selected
ij> --
-- Are these Wrong results, I expected 1,1,1,2,2,3,3,4
--
(select i from t1 union all select j from t2) intersect all ( select k 
from t3);

1
-----------
1
2
3
4

4 rows selected

Re: Question on EXCEPT ALL and INTERSECT ALL

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Rajesh Kartha <ka...@gmail.com> writes:

> Hello,
> Based on the DERBY docs at:
>
> http://db.apache.org/derby/docs/dev/ref/rrefsqlj21571.html
>
> adding an ALL to UNION/EXCEPT/INTERSECT gets duplicate rows and the
> default is DISTINCT.
>
> I noticed the rows returned were different between

[snip]

> (See below for the full statement execution).
>
> I am not sure on the above, can someone please share their thoughts/comments
> on the expected behaviour.

I couldn't see anything wrong in the results returned from your
example queries. Duplicate elimination on the result is not the only
difference between DISTINCT and ALL. With DISTINCT,
UNION/EXCEPT/INTERSECT are set operations. With ALL, they are bag
operations. The docs also say this about the ALL keyword:

,-------------------
| Depending on which operation is specified, if the number of copies
| of a row in the left table is L and the number of copies of that row
| in the right table is R, then the number of duplicates of that
| particular row that the output table contains (assuming the ALL
| keyword is specified) is:
|
|    * UNION: ( L + R ).
|    * EXCEPT: the maximum of ( L - R ) and 0 (zero).
|    * INTERSECT: the minimum of L and R.

If you have two tables: A(1,1,2,3,3,4,5,5) and B(1,2,3,3,4), you should
get these results:

  A EXCEPT B: (5)
  A EXCEPT ALL B: (1,5,5)

Note that A EXCEPT B is not simply A EXCEPT ALL B with the duplicates
removed since there are more 1's in A than in B.

> Also I think it would be good to update the docs with
> some examples of using EXCEPT and INTERSECT.

I think that's a good idea. Clearing up the wording would also be
good.

-- 
Knut Anders

Re: Question on EXCEPT ALL and INTERSECT ALL

Posted by Yip Ng <yi...@gmail.com>.
On 9/20/06, Rajesh Kartha <ka...@gmail.com> wrote:<snip>

> ij> --
> --  I expected 6,6,8,9 only
> --
> (select i from t1 union all select j from t2) except all (select k from
> t3);
> 1
> -----------
> 1
> 1
> 2
> 3
> 6
> 6
> 8
> 9
>
> 8 rows selected
> </snip>


 Hi Rajesh:

  The query result above returned from the system is correct.  Let's take
the query apart so we can visualize better:

Q1:  (select i from t1 union all select j from t2)
Q2:  select k from t3

Q1 will retain all rows from both table t1 and t2, so the result is:
1,2,3,4,9,1,3,2,1,6,8,6
Q2 result is:  5,2,3,4,1

Q1 EXCEPT ALL Q2, will have the result:  1, 1, 2, 3, 6, 6, 8, 9 because Q1
have three rows containing the value of 1, Q2 has only one, so the result
should retain (3-1 or 2) rows containing the value of 1.  Q1 have two rows
containing the value of 2, Q2 has only one, so
the result should retain (2-1 or 1) row containing the value of 2 and so
forth.  Rows of Q1 that does not appear in Q2 will be retained in the final
result as well.

Q1 EXCEPT Q2, will have the result:  6,8,9 because duplicates are eliminated
from both
Q1 and Q2 before applying the EXCEPT operator:

Q1-nodups:  1,2,3,4,6,8,9
Q2-nodups:  1,2,3,4,5

So the result here is 6,8,9, which is also correct in your output.  Now,
let's try the INTERSECT testcase you listed, the output is also expected and
correct:
<snip>
ij> --
-- Are these Wrong results, I expected 1,1,1,2,2,3,3,4
--
(select i from t1 union all select j from t2) intersect all ( select k from
t3);

1
-----------
1
2
3
4

4 rows selected
</snip>

Again, using the results from above:

Q1:  1,2,3,4,9,1,3,2,1,6,8,6
Q2 : 5,2,3,4,1

Q1 INTERSECT ALL Q2 means "return all rows from Q1 that are also found in
Q2".  The result will be 1,2,3,4 because there are three rows containing the
value of 1 in Q1 but Q2 only has one, so the result will only have one row
containing the value of 1.  There are two rows containing the value of 2 in
Q1 but Q2 only has one, so the result will only  have one row containing the
value of 2 and so forth.  (If Q2 happens to have 2 rows containing the value
of 2, the final result will have 2 rows containing the value of 2). Rows
that only appear in one of the two queries will not make it into the final
result.

For Q1 INTERSECT Q2, Q1 and Q2 will first have their duplicates removed
before
applying the INTERSECT operator:

Q1-nodups:  1,2,3,4,6,8,9
Q2-nodups:  1,2,3,4,5

So, the result is again 1,2,3,4.  Hope this helps.

Yip