You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Rick Hillegas (JIRA)" <de...@db.apache.org> on 2006/08/16 00:59:14 UTC

[jira] Commented: (DERBY-1633) Regression: The fields of views are not being calculated properly since 10.1.2.4

    [ http://issues.apache.org/jira/browse/DERBY-1633?page=comments#action_12428256 ] 
            
Rick Hillegas commented on DERBY-1633:
--------------------------------------

Thanks for the extensive analysis, Army! This has helped me to understand some of the issues. At this point I'm afraid that I don't have a very firm grasp of the high level problem and I hope that you will teach me more. It is clear that you have studied this tricky piece of code. Please bear with me.

I lost the thread of your explanation when I got to OBS#2 in DERBY-1633_v2.html. I did not understand this statement: "When a predicate is pushed to a UNION that appears in a Select list, the predicate will be "remapped" so that the column references point to their source result columns." I did not grasp how the problem was caused by the UNION. Why does a UNION require this mapping?

Instead, it seemed to me that the remapping was required by what was underneath the UNION. In this case, it is a Permuting Node (that's a term I just made up), that is, an operator which changes column order. In this case, the Permuting Node is a view, although I suppose it could also be a subquery.

It seems to me that positional remapping is required for other predicates that we try to push down, even if no UNION is involved. For instance, I think we need positional remapping in the following cases. And this brings me to the heart of my confusion: Why does predicate pushdown work in the following cases--or does it? If it does work, how is it that UNIONs break the logic? Here are some cases that come to mind:

select *
from 
  t1,
  ( select b as x, a as y from t2) v2
where t1.a = v2.y

select *
from ( select b as x, a as y from t2) v2
where v2.y = 1



> Regression: The fields of views are not being calculated properly since 10.1.2.4
> --------------------------------------------------------------------------------
>
>                 Key: DERBY-1633
>                 URL: http://issues.apache.org/jira/browse/DERBY-1633
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.0, 10.1.3.1
>         Environment: 2.8 GHZ dual PIV on Windows XP SP2, 2 GB memory
>            Reporter: Prasenjit Sarkar
>         Assigned To: A B
>             Fix For: 10.2.0.0
>
>         Attachments: d1633_repro.sql, d1633_v1_reviewOnly.patch, d1633_v2.patch, DERBY-1633_v1.html, DERBY-1633_v2.html
>
>
> Database can be assumed to be same as in Derby - 1205 Jira issue
> SELECT PORT1.PORT_ID FROM T_RES_PORT PORT1, T_VIEW_ENTITY2PORT ENTITY2PORT WHERE ENTITY2PORT.PORT_ID = PORT1.PORT_ID
> This works fine in 10.1.2.1 but fails thereafter complaining that Comparison between INTEGER and CHAR is not supported
> for some reason, it thinks one of the PORT_ID columns is a character, when in reality both are integers.
> 		SELECT DISTINCT 
> 		    ZONE.ZONE_ID ZONE_ID, 
> 			 PORT2ZONE.ZONE_MEMBER_ID  
> 		FROM  
> 			 T_RES_ZONE ZONE left outer join T_VIEW_PORT2ZONE PORT2ZONE on  
> 			 ZONE.ZONE_ID = PORT2ZONE.ZONE_ID   ,  T_RES_FABRIC FABRIC 
> In this query, it is complaining that one of the columns is a VARCHAR and cannot be compared to INTEGER, when clearly this is not the case...
> Same issue

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] Commented: (DERBY-1633) Regression: The fields of views are not being calculated properly since 10.1.2.4

Posted by Army <qo...@gmail.com>.

Thank you for volunteering to review the DERBY-1633 changes, Rick!  I welcome 
and appreciate your comments.

Rick Hillegas (JIRA) wrote:
> 
> I lost the thread of your explanation when I got to OBS#2 in DERBY-1633_v2.html. 
> I did not understand this statement: "When a predicate is pushed to a UNION that 
> appears in a Select list, the predicate will be "remapped" so that the column 
> references point to their source result columns." I did not grasp how the problem
> was caused by the UNION. Why does a UNION require this mapping?

Great question.  I think the answer to this is (intended to be) contained in the 
line that follows the one you quoted.  If we look at the OBS#2, we see the 
following:

<begin quote>

OBS#2: When a predicate is pushed to a UNION that appears in a Select list, the 
predicate will be "remapped" so that the column references point to their source 
result columns.  This remapping occurs as part of the "pushOptPredicate()" 
method in ProjectRestrictNode, which will be called because all items in a 
Select's FROM list are required to have ProjectRestrictNodes above them (we add 
them at bind time).

<end quote>

So you were definitely on the right track: it's not the fact that we have a 
UNION node that necessitates the remapping.  Rather, the remapping is required 
because the UNION node appears in the FROM list of a SELECT node, which means 
that there is a ProjectRestrictNode (PRN) above it.  When the predicate is 
pushed to the ProjectRestrictNode, it will then be remapped as part of the 
pushing process.

> Instead, it seemed to me that the remapping was required by what
> was underneath the UNION.

Technically, as per comments above, the remapping is required by what is 
immediately *above* the UNION node--namely, the ProjectRestrictNode.

> In this case, it is a Permuting Node (that's a term I just made up),
> that is, an operator which changes column order.

Again, this is along the right lines.  I think if you replace "Permuting Node" 
with "ProjectRestrictNode", this sentence is then correct.  The predicate is 
being pushed to the ProjectRestrictNode that sits above the UnionNode, and the 
action of pushing the predicate to the PRN causes it (the predicate) to be remapped.

> It seems to me that positional remapping is required for other predicates 
> that we try to push down, even if no UNION is involved.

Yes, correct again.  All items in a FROM list have a ProjectRestrictNode above 
them, so if we push a predicate to one of those PRNs, then the predicate is 
going to be remapped.  This behavior is not specific to UnionNodes, it is 
specific to ProjectRestrictNodes.  For the example in the DERBY-1633 document, 
it just so happens that the PRN's child is a UnionNode, but the same predicate 
remapping will happen if the PRN's child is any other kind of ResultSet node, as 
well.

That said, the PRN will then give its child a chance to push the predicate 
further down the tree.  For most types of children this subsequent push will not 
happen, so the predicate will remain at the level of the PRN and that's where it 
will be enforced.  For UnionNodes, though, it *is* possible to the push the 
predicate further down--that's what the work for DERBY-805 does.  In order to 
complete that push, though, the predicates have to be "scoped" to the 
UnionNode's left and right children.  This scoping operation is what is at issue 
for the regression described in DERBY-1633.

> For instance, I think we need positional remapping in the following
> cases. And this brings me to the heart of my confusion: Why does predicate
> pushdown work in the following cases--or does it? If it does work, how is
> it that UNIONs break the logic? Here are some cases that come to mind:

Again, great questions.  To begin with, I should mention that for the two 
queries you show, Derby will actually flatten the subqueries into the top result 
set, effectively changing them into the following queries:

> select *
> from 
>   t1,
>   ( select b as x, a as y from t2) v2
> where t1.a = v2.y

effectively becomes

select *
from
   t1,
   t2
where t1.a = t2.a

and

> select *
> from ( select b as x, a as y from t2) v2
> where v2.y = 1

effectively becomes

select *
from
   t2
where t2.a = 1

I assume, though, that you didn't intend for subquery flattening to occur when 
you asked your question, so let's pretend it doesn't happen.  Or better, let's 
assume there's a "DISTINCT" keyword in the subqueries, which means that Derby 
will not flatten them.  Then the answer to your questions are as follows:

> Why does predicate pushdown work in the following cases--or does it?

Yes, predicate pushdown will work in the example queries.  As mentioned above, 
the predicates are pushed to the ProjectRestrictNodes that sit above the FROM 
tables.  So in the first example, the predicate will be pushed to a PRN that 
sits above a SelectNode that represents "v2".  As part of that pushing, the 
predicate will be remapped.  The PRN will then give its child--the SelectNode--a 
chance to push the predicate further down the tree, but the SelectNode will not 
do so (the code path to show this is a bit tricky; if you want me to explain it 
I can, but for now I'm just skipping it for brevity).  Thus the predicate will 
remain at its pushed position in the PRN, and that's where it will be enforced.

The same is true for the second example, with one minor difference: since the 
predicate is one-sided, it will be pushed to the PRN above v2 as part of 
pre-processing; i.e. before optimization begins.  But other than that minor 
detail, the behavior will be the same as for the first example: namely, the 
predicate will be remapped when it is pushed to the PRN and then that's where it 
will be enforced.

> If it does work, how is it that UNIONs break the logic?

The case of UNIONs is unique for two reasons.  First, as mentioned above, a 
UnionNode is one of few types of nodes that will take a predicate and attempt to 
push it further down the tree.  This means that the predicate is no longer going 
to be enforced by the PRN above the UnionNode; instead, the UnionNode's 
children--and/or the UnionNode itself--become(s) responsible for enforcing the 
predicate.

The second difference is that, as part of the process for pushing the predicate 
further down the tree--i.e. to the UnionNode's children--we have to do the extra 
work of scoping the predicate to each child.  It is this scoping 
operation--which currently *only* occurs for UnionNodes, per DERBY-805--that 
makes the case of UNIONs unique.

All of that said, "OBS#2" in the write-up for DERBY-1633 is basically saying 
that "since the predicate was remapped before we got here," (i.e. as part of the 
logic to push the predicate to the PRN above the UnionNode) "the column 
references are already pointing to their source result sets."  This fact then 
influences the behavior of the scoping operation, and that behavior is what is 
at issue for DERBY-1633.

Hopefully that answers your question--and my apologies if that was too much 
information.  If there are still things that aren't clear from my explanation, 
please feel free to ask more.  I'll try to answer as best as I can...

Thanks for taking the time to review the document.  I know it's a lot of detail, 
so I definitely appreciate the time of those who are willing to read it over and 
provide feedback...

Army