You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "David Butterworth (JIRA)" <ji...@apache.org> on 2008/04/08 04:15:25 UTC

[jira] Created: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

'IN' clause ignores valid results, incorrect qualifier handling suspected
-------------------------------------------------------------------------

                 Key: DERBY-3603
                 URL: https://issues.apache.org/jira/browse/DERBY-3603
             Project: Derby
          Issue Type: Bug
          Components: SQL
    Affects Versions: 10.3.2.1, 10.4.1.1
            Reporter: David Butterworth


Derbys' 'IN' clause is returning different results depending on which side of a joined table
I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.

This behaviour was also confirmed by Bryan Pendleton in this thread:
http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e

Using the test database attatched the following 2 queries produce the issue:

ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
    spike.bookings booking
    WHERE booking.child_id = 2 AND
    admin_unit.admin_unit_id IN (1,21) AND
    booking.booking_date_time_out >= 20080331000000 AND
    booking.booking_date_time_in <= 20080406235900 AND
    account.account_id = booking.account_id AND
    admin_unit.admin_unit_id = account.admin_unit_id;
1          
-----------
2          

1 row selected
ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
    spike.bookings booking
    WHERE booking.child_id = 2 AND
    account.admin_unit_id IN (1,21) AND
    booking.booking_date_time_out >= 20080331000000 AND
    booking.booking_date_time_in <= 20080406235900 AND
    account.account_id = booking.account_id AND
    admin_unit.admin_unit_id = account.admin_unit_id;
1          
-----------
3          

1 row selected
ij> 

The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.

Bryan performed some initial testing and stated the following:

--------------------- SNIP ------------------------

Interestingly, although the actual results do NOT contain any values
for admin_unit_id = 21, if I change the query to:

    admin_unit.admin_unit_id IN (1)
or
    account.admin_unit_id IN (1)

then the problem disappears -- I get 3 rows for both queries.

I also ran query plans for both the queries (in the IN (1,21) case)
and have pasted the (simplified) query plans at the end of this message.

I notice that in the case where the query gives 2 rows, which is
when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
index scan output in the query plan contains:

           qualifiers:
Column[0][0] Id: 0
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false

However, in the case where the query gives 3 rows, which is
when we specify account.admin_unit_id in (1,21), the admin_unit_id
index scan output in the query plan contains:

           qualifiers:
None

I think it is the presence/absence of this qualifier on the query
scan which is causing the different results in the query, as in
the first case we see:

           Number of rows qualified=2
           Number of rows visited=3

but in the second case we see:

           Number of rows qualified=3
           Number of rows visited=3

I definitely don't have any explanation for why you are getting
this odd behavior; it certainly seems like a bug to me.

-------------END SNIP -----------------------


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3603:
----------------------------------

    Derby Info: [Regression]  (was: [Regression, Existing Application Impact])

Unchecking Existing Application Impact, because that means the solution will impact existing applications.


> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587030#action_12587030 ] 

Bryan Pendleton commented on DERBY-3603:
----------------------------------------

Just for completeness, I repeated Kathey's experiment on the trunk;
I backed out the DERBY-3061 change and the query then returns 3 rows.
So that's more evidence that DERBY-3061's change is relevant, but I'm
afraid I don't have any theories to contribute about why that is.

I do notice that in DERBY-3061, the query in question has two conditions
on the same column, where one is clearly "stronger" than the other:

   where mytable.id < 100
   and mytable.id in ( 2, 15, 19, 20, 21, 48, 49 );

However, in this issue, the query in question has two conditions on the
same column, where both conditions should (apparently) be of the same
"strength", since both are (according to the DERBY-3061 changes) to be
treated as equality:

   WHERE   admin_unit.admin_unit_id IN (1,21) AND
      admin_unit.admin_unit_id = account.admin_unit_id; 

This similarity in query structure appears to explain why the DERBY-3061
changes are relevant.



> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Pendleton updated DERBY-3603:
-----------------------------------

    Attachment: subqueryFlattening.diff
                d3603_with_test.diff

Attached is 'patch_with_test.diff', which expands on Army's
original test by adding a simple reproduction script to
InListMultiProbeTest.java.

With this patch in place, *almost* all the tests pass.

However, I'm seeing a diff in subqueryFlattening.sql in
the derbyall/derbylang suite. I've attached the diff that I get.


> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "David Butterworth (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Butterworth updated DERBY-3603:
-------------------------------------

    Attachment: derbydb.tar.bz2

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587239#action_12587239 ] 

A B commented on DERBY-3603:
----------------------------

Thanks for the investigation, Bryan and Thomas!  I was hoping to look more into this last night but was hit by a freak migraine that knocked me out early.  It's great to see that some work is still happening, though.

bryan> The other thing I was thinking about is that if I change the query to just IN (1) rather
bryan> than IN (1,21), then the problem does not occur. 

Just a quick note that when the IN list only has a single value, it is transformed into a normal equality predicate.  See InListOperatorNode.preprocess(...).  So in that case IN-list "multi-probing" processing, which appears to be where the underlying bug resides, does not come into play.  That could be why things work correctly in that case...

thomas> It is a possibility that in this case we neglect that there is a equivalent rightOperand
thomas> (it's not simply a true as might be expected)

Without having done any tracing myself, I agree with both of you: this seems worth pursuing...

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589599#action_12589599 ] 

Bryan Pendleton commented on DERBY-3603:
----------------------------------------

Committed the change to the 10.4 branch as revision 648722.

I'll investigate merging this change back to the 10.3 branch as well.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff, updateSubqueryMaster.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Thomas Nielsen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587193#action_12587193 ] 

thomanie edited comment on DERBY-3603 at 4/9/08 7:05 AM:
---------------------------------------------------------------

Ran the queries through the debugger to see if there were any obvious errors.

The only obvious difference I see is that the 2-row result query (S2), enters the procedure changed by DERBY-3061 a total of 4 times during optimization, while the 3-row result query (S3) only enters a single time. The callstacks are identical for the S2 and S3 queries in all 5 invocations. This could mean part of the IN list is optimized away in S2. If one optimizer path in S2 does not take all rows into consideration it will probably be a lot cheaper, and will be used, resulting in an incomplete result.

      was (Author: thomanie):
    Ran the queries through the debugger to see if there were any obvious errors.

The only obvious difference I see is that the 2-row result query (S2), enters the procedure changed by DERBY-3603 a total of 4 times during optimization, while the 3-row result query (S3) only enters a single time. The callstacks are identical for the S2 and S3 queries in all 5 invocations. This could mean part of the IN list is optimized away in S2. If one optimizer path in S2 does not take all rows into consideration it will probably be a lot cheaper, and will be used, resulting in an incomplete result.
  
> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588661#action_12588661 ] 

A B commented on DERBY-3603:
----------------------------

Thank you very much for picking this one up, Bryan.  I greatly appreciate it.

On the subqueryFlattening.diff:

To see why the diff occurs, I stepped through the following exercise.  I did not actually trace through the code, but I walked through what I *think* should be happening and it seems to have explained the diff...

At the time of the statements in question the data in the DOCS and COLLS tables is as follows:

DOCS: ID (VARCHAR) =>
  ('124'), ('24'), ('25'), ('27'), ('36'), ('567')

COLLS: ID (VARCHAR), COLLID (SMALLINT) =>
  ('123', 2), ('124', -5), ('24', 1), ('26', -2), ('36', 1), ('37', 8)

Now assume the query in question is:

  SELECT count(ID) FROM DOCS
    WHERE ID <> ANY (SELECT ID FROM COLLS WHERE COLLID IN (-2,1))

The ANY will be flattened giving us something to the effect of:

  SELECT count(ID) FROM DOCS, COLLS
    WHERE DOCS.ID <> COLLS.ID AND COLLS.COLLID IN (-2, 1)

The IN list for the queries that we're talking about is (-2, 1). So per the explanation I posted on Friday, we'd end up with something like the following withOUT the change.

Note: Lines preceded by "+" indicate that we read a row from the inner table, COLLS.  Lines preced by "*' indicate that what we do for the line changes with the proposed patch for this issue.  Lines preceded by an "x" indicate that we "filter" a row from COLLS, meaning we read it but then we discard it.

  - Get DOCS row ('124').
  - Get first IN list value, -2.
  + Get row from COLLS where COLLID is -2, i.e. ('26', -2).
  - See if ID column of ('26', -2) is NOT EQUAL to '124'.  They
    are not equal so count the '124' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('24').
  * Get next IN list value, 1.
  + Get row from COLLS where COLLID is 1, i.e. ('24', 1).
  x See if ID column of ('24', 1) is NOT EQUAL to '24'.  They
    are EQUAL so we skip the row ('24', 1) and continuing our
    scan of COLLS.
  + Get next row from COLLS where COLLID is 1, i.e. ('36', 1).
  - See if ID column of ('36', 1) is NOT EQUAL to '24'.  They
    are not equal so count the '24' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('25').
  - Implicitly RESET the probe state, then get the "next" IN
    list value, which will be -2.
  + Get row from COLLS where COLLID is -2, i.e. ('26', -2).
  - See if ID column of ('26', -2) is NOT EQUAL to '25'.  They
    are not equal so count the '25' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('27').
  * Get next IN list value, 1.
  + Get row from COLLS where COLLID is 1, i.e. ('24', 1).
  - See if ID column of ('24', 1) is NOT EQUAL to '27'.  They
    are not equal so count the '27' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('36').
  - Implicitly RESET the probe state, then get the "next" IN
    list value, which will be -2.
  + Get row from COLLS where COLLID is -2, i.e. ('26', -2).
  - See if ID column of ('26', -2) is NOT EQUAL to '36'.  They
    are not equal so count the '36' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('567').
  * Get next IN list value, 1.
  + Get row from COLLS where COLLID is 1, i.e. ('24', 1).
  - See if ID column of ('24', 1) is NOT EQUAL to '567'.  They
    are not equal so count the '567' row from DOCS and quit the
    scan of COLLS.
  - No more rows from DOCS, so done.

We can see from this that we get a row from COLLS a total of 7 times. In the process we "filtered" one row out, i.e. the row ('24', 1), because that row has the same ID has a row in DOCS.  Or put in terms of the scan statistics, we have 7 "rows seen" and one (1) row filtered.  And the query returns a count total of 6 rows from DOCS.

But now if we do the same exercise with the patch applied, we see the following:

  - Get DOCS row ('124').
  - Get first IN list value, -2.
  + Get row from COLLS where COLLID is -2, i.e. ('26', -2).
  - See if ID column of ('26', -2) is NOT EQUAL to '124'.  They
    are not equal so count the '124' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('24').
  * RESET probe state, then get "next" IN list value, which
    is -2 (again).
  + Get row from COLLS where COLLID is -2, i.e. ('26', 1).
  - See if ID column of ('26', -2) is NOT EQUAL to '24'.  They
    are not equal so count the '24' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('25').
  * RESET probe state, then get "next" IN list value, which
    is -2 (again).
  + Get row from COLLS where COLLID is -2, i.e. ('26', 1).
  - See if ID column of ('26', -2) is NOT EQUAL to '25'.  They
    are not equal so count the '25' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('27').
  * RESET probe state, then get "next" IN list value, which
    is -2 (again).
  + Get row from COLLS where COLLID is -2, i.e. ('26', 1).
  - See if ID column of ('26', -2) is NOT EQUAL to '27'.  They
    are not equal so count the '27' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('36').
  * RESET probe state, then get "next" IN list value, which
    is -2 (again).
  + Get row from COLLS where COLLID is -2, i.e. ('26', 1).
  - See if ID column of ('26', -2) is NOT EQUAL to '36'.  They
    are not equal so count the '36' row from DOCS and quit the
    scan of COLLS.
  - Get DOCS row ('567').
  * RESET probe state, then get "next" IN list value, which
    is -2 (again).
  + Get row from COLLS where COLLID is -2, i.e. ('26', 1).
  - See if ID column of ('26', -2) is NOT EQUAL to '567'.  They
    are not equal so count the '567' row from DOCS and quit the
    scan of COLLS.
  - No more rows from DOCS, so done.

Now the total number of rows we "see" from COLLS is 6 instead of 7.  This is because we never fetched the row ('24', 1) from the table.  Instead, we only had to fetch the one row, ('26', -2), over and over.  Then, since we never fetched row ('24', 1) from the table we didn't have to filter it out.  So we end up with a total of 6 "rows seen" and 0 rows filtered. Meanwhile the query still returns a count total of 6 rows from DOCS.

I didn't run through this exercise for the other cases, but I imagine (hope!) things would work out in a similar fashion...Does that seem like a reasonable explanation to you?  If so, then I think it's okay to update the master file for subqueryFlattening.sql for this issue.

Oh, and thanks for writing the standalone repro, as well!

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Pendleton reassigned DERBY-3603:
--------------------------------------

    Assignee: Bryan Pendleton

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, derbydb.jar, derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Dyre Tjeldvoll (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dyre Tjeldvoll resolved DERBY-3603.
-----------------------------------

    Resolution: Fixed

Resolving with fixversion 10.4.0.0 (since there is no 10.4.1.3 yet) just to get it on the list of fixed bugs.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>             Fix For: 10.4.0.0
>
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff, updateSubqueryMaster.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Dyre Tjeldvoll (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dyre Tjeldvoll updated DERBY-3603:
----------------------------------

    Fix Version/s: 10.4.0.0

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>             Fix For: 10.4.0.0
>
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff, updateSubqueryMaster.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588591#action_12588591 ] 

Bryan Pendleton commented on DERBY-3603:
----------------------------------------

The 2 queries in subqueryFlattening.sql which get the different runtime statistics are:

SELECT count(ID) FROM DOCS WHERE ID <> ANY (SELECT ID FROM COLLS WHERE COLLID IN (-2,1) );
and
SELECT count(ID) FROM DOCS WHERE ID > ANY (SELECT ID FROM COLLS WHERE COLLID IN (-2,1) );

The queries get the correct final results, and the overall shape of the query
plans is the same, but the precise details of which rows are processed
at what stage of the query plan changes slightly.


> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Thomas Nielsen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587138#action_12587138 ] 

Thomas Nielsen commented on DERBY-3603:
---------------------------------------

DERBY-3061 adds a check for "otherPred.isInListProbePredicate()". This again checks andNode.leftOperand.isInListProbePredicate().
It is a possibility that in this case we neglect that there is a equivalent rightOperand (it's not simply a true as might be expected) - but I haven't verified this yet.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Thomas Nielsen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586816#action_12586816 ] 

Thomas Nielsen commented on DERBY-3603:
---------------------------------------

This is a regression introduced between 10.3.1.4 and 10.3.2.1.

I ran the queries on 10.3.1.4 with the provided database snapshot, and they both return 3 rows on 10.3.1.4, but 2 rows for 10.3.2.1.

--- [10.3.1.4] snip snip ---
ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
    spike.bookings booking
    WHERE booking.child_id = 2 AND
    admin_unit.admin_unit_id IN (1,21) AND
    booking.booking_date_time_out >= 20080331000000 AND
    booking.booking_date_time_in <= 20080406235900 AND
    account.account_id = booking.account_id AND
    admin_unit.admin_unit_id = account.admin_unit_id;
1
-----------
3

1 row selected
ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
    spike.bookings booking
    WHERE booking.child_id = 2 AND
    account.admin_unit_id IN (1,21) AND
    booking.booking_date_time_out >= 20080331000000 AND
    booking.booking_date_time_in <= 20080406235900 AND
    account.account_id = booking.account_id AND
    admin_unit.admin_unit_id = account.admin_unit_id;
1
-----------
3

1 row selected
--- snip snip ---

Hopefully someone with more historical insight may know what issue could have caused this sideeffect?

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590194#action_12590194 ] 

A B commented on DERBY-3603:
----------------------------

And thanks to you, Bryan, for taking this one to completion!

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>             Fix For: 10.3.2.2, 10.4.0.0
>
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff, updateSubqueryMaster.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589397#action_12589397 ] 

Bryan Pendleton commented on DERBY-3603:
----------------------------------------

Committed the change to the trunk as revision 648492.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff, updateSubqueryMaster.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-3603:
--------------------------------------

    Fix Version/s:     (was: 10.4.0.0)
                   10.4.1.3

Changed fix version from 10.4.0.0 to 10.4.1.3.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>             Fix For: 10.3.2.2, 10.4.1.3
>
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff, updateSubqueryMaster.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588404#action_12588404 ] 

Bryan Pendleton commented on DERBY-3603:
----------------------------------------

Thanks Thomas and Army for all the analysis and explanation; it is *very* helpful!

If I am understanding it all correctly, then I believe this script reproduces the problem.
Without Army's patch the first 2 queries return (a) count=1, and (b) 1 row, but *with*
Army's patch all three queries return a result of 2 rows (count = 2), which I believe
is the correct result.

I'll see if I can turn Army's patch and this repro script into a complete patch and run
the tests, etc.

drop table a;
drop table c;

create table a (a_id integer, c_id integer);
create table c (c_id integer not null, primary key(c_id),
                d_id integer, t_o bigint, t_i bigint);

insert into a (a_id, c_id) values (1, 1);
insert into a (a_id, c_id) values (2, 2);
insert into a (a_id, c_id) values (3, 1);
insert into c (c_id, d_id, t_o, t_i) values (1, 1, 1, 1);
insert into c (c_id, d_id, t_o, t_i) values (2, 2, 1, 1);
insert into c (c_id, d_id, t_o, t_i) values (21, 1, 1, 1);

select count(*) from a, c where a.a_id <> 2 and c.c_id in (1, 21)
       and a.c_id = c.c_id;
select a.a_id from a, c where a.a_id <> 2 and c.c_id in (1, 21)
       and a.c_id = c.c_id;
select a.a_id,c.d_id from a, c where a.a_id <> 2 and c.c_id in (1, 21)
       and a.c_id = c.c_id;

Results without the patch:
ij> select count(*) from a, c where a.a_id <> 2 and c.c_id in (1, 21)
       and a.c_id = c.c_id;
1
-----------
1

1 row selected
ij> select a.a_id from a, c where a.a_id <> 2 and c.c_id in (1, 21)
       and a.c_id = c.c_id;
A_ID
-----------
1

1 row selected
ij> select a.a_id,c.d_id from a, c where a.a_id <> 2 and c.c_id in (1, 21)
       and a.c_id = c.c_id;
A_ID       |D_ID
-----------------------
1          |1
3          |1

Results with the patch:

ij> select count(*) from a, c where a.a_id <> 2 and c.c_id in (1, 21)
       and a.c_id = c.c_id;
1
-----------
2

1 row selected
ij> select a.a_id from a, c where a.a_id <> 2 and c.c_id in (1, 21)
       and a.c_id = c.c_id;
A_ID
-----------
1
3

2 rows selected
ij> select a.a_id,c.d_id from a, c where a.a_id <> 2 and c.c_id in (1, 21)
       and a.c_id = c.c_id;
A_ID       |D_ID
-----------------------
1          |1
3          |1



> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, derbydb.jar, derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587196#action_12587196 ] 

Bryan Pendleton commented on DERBY-3603:
----------------------------------------

> the join on either the left or right side, causing it to be incorrectly evaluated if it ends up on the wrong side

That's a very intriguing theory, seems quite plausible, and worth following up. Interesting!

The other thing I was thinking about is that if I change the query to just IN (1) rather
than IN (1,21), then the problem does not occur. So I was wondering if the problem
might be that Predicate.java is now comparing the two predicates as *exactly* equal,
rather than just "equal enough for sorting during optimization", and hence was seeing
the IN (1,21) predicate, at some point, as being redundant with the join criterion and
was optimizing it entirely away.

Normally, erroneously optimizing away an entire predicate would cause us to get
more results, not fewer, but in this case the IN (1,21) predicate is effectively an "OR"
clause and so if we lost it, or maybe lost one "branch" of it, the result might have
fewer rows than we expected.

However, I tried breaking the "bad" query (the one that returns 2 rows) into 2 queries, 
one with IN (1) and one with IN (21), and got the following results:
 - query stating IN (1): count = 3
 - query stating IN (21): count = 0

So rows with admin_unit_id = 21 don't appear to be relevant to the actual results of
this query. I think that probably means this theory is not worth pursuing, but I
figured I'd post it anyway in case it was useful to others.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586916#action_12586916 ] 

Kathey Marsden commented on DERBY-3603:
---------------------------------------

I verified on the 10.3 branch that this was indeed related to DERBY-3061 (svn 574730).  Prior to that change, both queries return "3".


> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "A B (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

A B updated DERBY-3603:
-----------------------

    Attachment: d3603_v1.patch

The fact that "oneRowRightSide" is true for the NestedLoop join does indeed appear to be important here. Good find, Thomas!  With that in mind I did some tracing of the execution path.  My findings are below.

The query I used for tracing is as follows:

  select account.admin_unit_id, account.account_id,
    booking.booking_id, cast (admin_unit.admin_unit_name as varchar(30))
  from spike.accounts account, spike.bookings booking,
  spike.admin_units admin_unit
  where account.account_id = booking.account_id
    and admin_unit.admin_unit_id = account.admin_unit_id
    and admin_unit.admin_unit_id in (1, 21)
    and booking.child_id = 2;

When I run this query the execution plan shows a join order of:

  (ACCOUNTS x BOOKINGS) x ADMIN_UNITS

That is, we join ACCOUNTS and BOOKINGS to get a row R, then for each row R from that join, we join it (R) with a row from ADMIN_UNITS.  Due to the presence of the IN list the optimizer chooses to do multi-probing on ADMIN_UNITS, which means the probe predicate that represents "admin_unit.admin_unit_id IN (1, 21)" becomes the start and stop key.  Since we only allow a single start/stop key for any column, the join predicate between ACCOUNTS and ADMIN_UNITS is relegated to being a store qualifier.  This is notable for two reasons: 1) this explains the additional qualifier seen by Bryan in the query plan, and 2) as seen below, this qualifier is what causes the "oneRowRightSide" variable to get set to TRUE for ADMIN_UNITS.

Also note: before the change for DERBY-3061 the optimizer always sorted the join predicate as "stronger" than the probe predicate, and thus always chose the join predicate as the start/stop key.  That in turn meant that the probe predicate, which can only be used for probing if it is the start/stop key, was "reverted" back to a normal IN list qualifier.  So the problem described below did not show up.  But with DERBY-3061 the probe predicate and the join predicate have the same "strength" and thus the optimizer is free to use either one as the start/stop key. In this example it chooses the probe predicate.

All of that said, let's take a step back and note that the join between ACCOUNTS and BOOKINGS returns three rows:

select account.admin_unit_id, account.account_id, booking.booking_id
 from spike.accounts account, spike.bookings booking
 where account.account_id = booking.account_id
 and booking.child_id = 2;

ADMIN_UNIT&|ACCOUNT_ID |BOOKING_ID
-----------------------------------
1          |10         |1
1          |10         |3
1          |10         |4

So now, for *each* of the above three rows we want to fetch the matching row from ADMIN_UNITS.  So far so good.

The "oneRowRightSide" variable that Thomas mentioned comes into play because the optimizer thinks that for every outer row, i.e. for each of the three rows shown above, we will have at MOST one matching row from ADMIN_UNITS.  At first I thought this was incorrect--but upon further inspection it turns out be true.  The optimizer deduced this because the predicate refers to a primary key column, which has to be unique. So when we try to fetch a row such that the ADMIN_UNIT_ID column of ADMIN_UNITS equals some value, we will get at most one row back.  And since we have the join predicate as a store qualifier (mentioned above), we know that we _will_ in fact be trying "to get a row such that the ADMIN_UNIT_ID column of ADMIN_UNITS equals some value", where "some value" will be the ADMIN_UNIT_ID column from the intermediate join table shown above. (Note: the intermediate join table doesn't actually exist, it's just shown as such for discussion).

So at this point things are still working as expected.  We'll take the first row from the intermediate join, i.e. "(1, 10, 1)", and we'll do two things.  First we'll get the "next" probe value in the list of values.  Since the probe list is "(1, 21)", the next value is "1".  Second, we will see if there is a row in ADMIN_UNITS such that: 

  a) the ADMIN_UNIT_ID column of the row equals the current probe
     value, which is "1", AND

  b) the ADMIN_UNIT_ID column of the row equals the ADMIN_UNIT_ID of
     the outer row, which is "(1, 10, 1)".

Since both of these conditions are true, we'll return the first row as expected.  So our result at this point is:

ADMIN_UNIT&|ACCOUNT_ID |BOOKING_ID
-----------------------------------
1          |10         |1

Now we attempt to fetch the "next" row from ADMIN_UNITS that matches the outer row "(1, 10, 1)".  But as mentioned earlier, the NestedLoop result set that would do so has "oneRowRightSide" set to TRUE.  This means that the NestedLoop _knows_ it will not find any more rows in ADMIN_UNITS that satsify conditions "a" and "b" shown above.  So instead of going to the table, >>> it just QUITS the scan <<<.  See the following in NestedLoopJoinResultSet:

        if (oneRowRightSide && returnedRowMatchingRightSide)
        {
            rightRow = null;
            returnedRowMatchingRightSide = false;
        }

So we quit the inner scan of ADMIN_UNITS, which leads us to get the next "outer" row from the intermediate join rows shown earlier.  I.e.  we'll now set our outer row to be "(1, 10, 3)".  Then we'll execute:

        if (leftRow == null)
        {
            closeRight();
        }
        else
        {
            rowsSeenLeft++;
            openRight();
        }

"leftRow" here is the same as "outer" row, so it is "(1, 10, 3)" at this point.  Since it's not null we'll re-open the scan on ADMIN_UNITS, per the "else" branch.  Then, similar to what we did with the first outer row, we'll get the "next" probe value in the list of values for ADMIN_UNITS.

But this is where the problem occurs.  When we "quit" the scan in the above IF statement we left the MultiProbe result set in the middle of a scan.  So when we make the call to re-open the scan, we need to RESET the probing state.  But the code in MultiProbeTableScanResultSet does not currently do that.  More on that below.

For the current codeline we do *not* reset the probe scan, and thus the next probe value we see is "21", from the list (1, 21).  With that we will then try to get a row from ADMIN_UNIT_ID such that:

  a) the ADMIN_UNIT_ID column of the row equals the current probe
     value, which is >>> "21" <<<, AND

  b) the ADMIN_UNIT_ID column of the row equals the ADMIN_UNIT_ID of
     the outer row, which is "(1, 10, 3)".

But ADMIN_UNITS does not contain any rows which satisfy condition "a", so we will not return a row. Having no row from ADMIN_UNITS, we go on to fetch the next "outer" row from the intermediate join table, meaning we end up with "(1, 10, 4)".  So we'll use that outer row and do what we did with the others: i.e. First we'll get the next probe value in the list of values.  The logic in MultiProbeTableScan result set sees that we have exhaused the probe list at this point, so *NOW* it RESETS the probing state.  After that it gets the "next" probe value, which is the first value in the list (again), and thus we end up with a probe value of "1".  Second, we will see if there is a row in ADMIN_UNITS such that:

  a) the ADMIN_UNIT_ID column of the row equals the current probe
     value, which is "1", AND

  b) the ADMIN_UNIT_ID column of the row equals the ADMIN_UNIT_ID of
     the outer row, which is "(1, 10, 4)".

Since both of these conditions are true, we'll return the row as expected.  So our result at this point is:

ADMIN_UNIT&|ACCOUNT_ID |BOOKING_ID
-----------------------------------
1          |10         |1
1          |10         |4

And that's how the query ends--missing a row.

So what's going on with MultiProbeTableScanResultSet?  There are some comments in the "reopenCore()" method of that class which explain a) why we do NOT reset the probe state in some cases, and b) why we DO reset the probe state in others.  For reference the comment is:

     /* There are two scenarios for which we reopen this kind of scan:
      *
      *   A - The first is for join processing.  In this case we have
      * a(nother) row from some outer table and we want to reopen this
      * scan to look for rows matching the new outer row.
      *
      *   B - The second is for multi-probing.  Here we want to reopen
      * the scan on this table to look for rows matching the next value
      * in the probe list.
      *
      * If we are reopening the scan for scenario A (join processing)
      * then we need to reset our position within the probe list.
      * If we are reopening the scan for scenario B then we do *not*
      * want to reset our position within the probe list because that
      * position tells us where to find the next probe value.
      *
      * The way we tell the difference between the two scenarios is
      * by looking at our current position in the probe list (i.e. the
      * value of probeValIndex): if our current position is beyond the
      * length of the probe list then we know that we are reopening the
      * scan for scenario A.  Or put another away, we should never get
      * here for scenario B if probeValIndex is greater than or equal
      * to the length of the probe list.  The reason is that the call
      * to reopenCore() for scenario B will only ever happen when
      * moreInListVals() returns true--and in that case we know that
      * probeValIndex will be less than the length of the probeValues.
      */

This comment states one thing and assumes another. First it states (correctly) that if our current position is beyond the length of the probe list then we must be seeing scenario A.  It then goes on to assume (incorrectly) that if the current position is LESS than the length of the probe list we must be seeing scenario B.  It turns out that this is *NOT* true.  The reason it is not true is because of the "oneRowRightSide" processing described above: when that flag is set to true we quit the scan early, meaning that the current probe position will in fact be less than the length of the probe list.  Then we later come back with a(nother) row from the outer table (i.e. the intermediate join table shown above) and "we want to reopen this scan to look for rows matching the new outer row".  That means we need to reset the probe state: but due to the incorrect assumption, we don't reset it.  The result is incorrect results as described above.

I'm attaching a first attempt at a fix, d3603_v1.patch.  With this patch applied the repro case correctly returns three rows.  The patch does not include a test case and I don't know yet if it passes the regression tests.  I'm running them now but they won't complete for several more hours.  So far the lang/subqueryFlattening test has failed with a plan diff, which may require some investigation...

I'm still quite limited on time so if it is agreed that this is the correct fix (or at least, the right direction), it'd be great if someone could add the test and take it to completion--esp. if it could happen before the next 10.4 release candidate.  If no one picks it up I'll try to get to it sometime next week, but probably not in time for the second RC...

Regardless of whether or not this writeup proves correct, many thanks again to Bryan and Thomas for their investigation and comments.  As no one officially assigned the issue to him/herself I spent some time investigating, as written above.  I hope I didn't step on anyone's toes in the process.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: d3603_v1.patch, derbydb.jar, derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Thomas Nielsen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587926#action_12587926 ] 

Thomas Nielsen commented on DERBY-3603:
---------------------------------------

oneRowRightSide is set during code generation for NestedLoopJoinResultSet.

This happens in JoinNode.generateCore(), which ends up with the following callstack:
  org.apache.derby.impl.sql.compile.FromBaseTable.isOneRowResultSet(FromBaseTable.java:4405)
  org.apache.derby.impl.sql.compile.FromBaseTable.isOneRowResultSet(FromBaseTable.java:3825)
  org.apache.derby.impl.sql.compile.IndexToBaseRowNode.isOneRowResultSet(IndexToBaseRowNode.java:316)
  org.apache.derby.impl.sql.compile.SingleChildResultSetNode.isOneRowResultSet(SingleChildResultSetNode.java:531)
  org.apache.derby.impl.sql.compile.JoinNode.oneRowRightSide(JoinNode.java:1691)
  org.apache.derby.impl.sql.compile.JoinNode.getJoinArguments(JoinNode.java:1639)
  org.apache.derby.impl.sql.compile.JoinNode.generateCore(JoinNode.java:1555)
  ...

At this point both the restrictionList and predicateList only have a single entry, forcing us to return true.
Seems we missed a predicate somewhere along the path...

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.jar, derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Pendleton updated DERBY-3603:
-----------------------------------

    Attachment: updateSubqueryMaster.diff

Thanks Army for the clear and detailed description of the subquery processing.

To confirm it, I introduced some crude print statements into the MultiProbe
code and:
 - without the patch I can see it ping-ponging back and forth between -2 and 1
 - with the patch I can see that it simply uses the -2 row over and over,
   as you suggest.

Based on that evidence, and on a close read of Army's description,
I'm comfortable updating the subqueryFlattening.out file.

Attached is updateSubquerymaster.diff, which I believe is ready for review.

I intend to commit this patch to the trunk tonight or tomorrow morning,
and I'm also intending to merge it to 10.4, assuming no additional
concerns are raised via review or testing.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff, updateSubqueryMaster.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3603:
----------------------------------

    Attachment: derbydb.jar

Attaching jarred up db for easier access.


> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.jar, derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586850#action_12586850 ] 

A B commented on DERBY-3603:
----------------------------

> This is a regression introduced between 10.3.1.4 and 10.3.2.1.

I wonder if this was caused by DERBY-3061, which was an IN list change that went into 10.3.2.1 (and 10.4) but was not in 10.3.1.4?  I don't have a lot of time on my hands right now, but perhaps I can investigate later this evening (PST).  That said, if anyone else has more cycles to investigate, that'd be great....

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Thomas Nielsen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588492#action_12588492 ] 

Thomas Nielsen commented on DERBY-3603:
---------------------------------------

No toes feeling stepped on Army - again an excellent writeup!

I agree with Bryan that the subqueryFlattening.diff seems OK - it does produce the correct results but the filtering seems to happen at slightly different levels. I'll give the patch a spin too.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Thomas Nielsen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587894#action_12587894 ] 

Thomas Nielsen commented on DERBY-3603:
---------------------------------------

Looking at the queryplans I see something quite interesting.

Both queries S2 and S3 end up with the same query plan. The top looking like this:

                              Scalar
                                    |
                                 PRN
                                    |
                     NestedLoop ExistsJoin
                              /                     \
NestedLoopExistsJoin        IndexScanResultSet
   /         \                                          |
...           ...                                      ...

The interesting stuff happens in the lower, right-hand IndexScanResultSet (let's call it ISR1)

In the correct 3-row query S3, ISR1 sees 3 rows and qualifies 3 rows.
In the incorrect 2-row query S2, ISR1 seens 3 rows, but only qualifies 2 of them.

So for S2, this becomes
                              Scalar
                                 ("2")
                                    |
                                 PRN
                               (2 rows)
                                    |
                     NestedLoop ExistsJoin
                    (3 rows L, 2 rows R)
                              /                     \
NestedLoopExistsJoin        IndexScanResultSet
(3 rows L, 3 rows R)             (sees 3 rows, qualifies 2 rows)
   /         \                                          |
...           ...                                      ...

The query plan for both S2 and S3 state that ISR1 is an
"Index Scan ResultSet for ADMIN_UNITS using constraint ADMIN_UNIT_ID_PK at read committed isolation level using share row locking chosen by the optimizer".

Next task would be to find out why ISR1 rejects/discards the third row in S2, and not in S3.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.jar, derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588469#action_12588469 ] 

Bryan Pendleton commented on DERBY-3603:
----------------------------------------

At a quick glance, it looks like the subqueryFlattening diff might be OK,
and perhaps I just need to update the master output.

But it would be nice to have more eyes on this.


> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Thomas Nielsen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587919#action_12587919 ] 

Thomas Nielsen commented on DERBY-3603:
---------------------------------------

Seems my comment formatting was messed up - doesn't really matter :)

In the debugger I can see that the problem is actually located in the uppermost NestedLoopExistsJoin of the queryplan (which really is a NestedLoopJoinResultSet), having its oneRowRightSide incorrectly set to true. We are looking for the set (1,21), so the right hand side should not have a single result row in this particular case. Changing this to false in the debugger cause the query to return the correct 3 rows for S2.

Next question should then be - why was oneRowRightSide incorrectly set to true?



> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.jar, derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Pendleton updated DERBY-3603:
-----------------------------------

    Fix Version/s: 10.3.2.2

Committed to the 10.3 branch as revision 649226.

Updated fix version to reflect that.

Thanks again Army and Thomas for the help and reviews!

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>            Assignee: Bryan Pendleton
>             Fix For: 10.3.2.2, 10.4.0.0
>
>         Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar, derbydb.tar.bz2, subqueryFlattening.diff, updateSubqueryMaster.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Thomas Nielsen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587193#action_12587193 ] 

Thomas Nielsen commented on DERBY-3603:
---------------------------------------

Ran the queries through the debugger to see if there were any obvious errors.

The only obvious difference I see is that the 2-row result query (S2), enters the procedure changed by DERBY-3603 a total of 4 times during optimization, while the 3-row result query (S3) only enters a single time. The callstacks are identical for the S2 and S3 queries in all 5 invocations. This could mean part of the IN list is optimized away in S2. If one optimizer path in S2 does not take all rows into consideration it will probably be a lot cheaper, and will be used, resulting in an incomplete result.

> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (DERBY-3603) 'IN' clause ignores valid results, incorrect qualifier handling suspected

Posted by "Thomas Nielsen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587138#action_12587138 ] 

thomanie edited comment on DERBY-3603 at 4/9/08 4:05 AM:
---------------------------------------------------------------

DERBY-3061 adds a check for "otherPred.isInListProbePredicate()". This again checks andNode.leftOperand.isInListProbePredicate().
It is a possibility that in this case we neglect that there is a equivalent rightOperand (it's not simply a true as might be expected) - but I haven't verified this yet.

The idea is that certain ordering of the elements effectively place the join on either the left or right side, causing it to be incorrectly evaluated if it ends up on the wrong side - which ever side that may be.

      was (Author: thomanie):
    DERBY-3061 adds a check for "otherPred.isInListProbePredicate()". This again checks andNode.leftOperand.isInListProbePredicate().
It is a possibility that in this case we neglect that there is a equivalent rightOperand (it's not simply a true as might be expected) - but I haven't verified this yet.
  
> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
>                 Key: DERBY-3603
>                 URL: https://issues.apache.org/jira/browse/DERBY-3603
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.2.1, 10.4.1.1
>            Reporter: David Butterworth
>         Attachments: derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e
> Using the test database attatched the following 2 queries produce the issue:
> ij>  select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     admin_unit.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 2          
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
>     spike.bookings booking
>     WHERE booking.child_id = 2 AND
>     account.admin_unit_id IN (1,21) AND
>     booking.booking_date_time_out >= 20080331000000 AND
>     booking.booking_date_time_in <= 20080406235900 AND
>     account.account_id = booking.account_id AND
>     admin_unit.admin_unit_id = account.admin_unit_id;
> 1          
> -----------
> 3          
> 1 row selected
> ij> 
> The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
>     admin_unit.admin_unit_id IN (1)
> or
>     account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
>            qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
>            Number of rows qualified=2
>            Number of rows visited=3
> but in the second case we see:
>            Number of rows qualified=3
>            Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.