You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Rick Hillegas (JIRA)" <de...@db.apache.org> on 2006/08/16 00:59:14 UTC
[jira] Commented: (DERBY-1633) Regression: The fields of views are
not being calculated properly since 10.1.2.4
[ http://issues.apache.org/jira/browse/DERBY-1633?page=comments#action_12428256 ]
Rick Hillegas commented on DERBY-1633:
--------------------------------------
Thanks for the extensive analysis, Army! This has helped me to understand some of the issues. At this point I'm afraid that I don't have a very firm grasp of the high level problem and I hope that you will teach me more. It is clear that you have studied this tricky piece of code. Please bear with me.
I lost the thread of your explanation when I got to OBS#2 in DERBY-1633_v2.html. I did not understand this statement: "When a predicate is pushed to a UNION that appears in a Select list, the predicate will be "remapped" so that the column references point to their source result columns." I did not grasp how the problem was caused by the UNION. Why does a UNION require this mapping?
Instead, it seemed to me that the remapping was required by what was underneath the UNION. In this case, it is a Permuting Node (that's a term I just made up), that is, an operator which changes column order. In this case, the Permuting Node is a view, although I suppose it could also be a subquery.
It seems to me that positional remapping is required for other predicates that we try to push down, even if no UNION is involved. For instance, I think we need positional remapping in the following cases. And this brings me to the heart of my confusion: Why does predicate pushdown work in the following cases--or does it? If it does work, how is it that UNIONs break the logic? Here are some cases that come to mind:
select *
from
t1,
( select b as x, a as y from t2) v2
where t1.a = v2.y
select *
from ( select b as x, a as y from t2) v2
where v2.y = 1
> Regression: The fields of views are not being calculated properly since 10.1.2.4
> --------------------------------------------------------------------------------
>
> Key: DERBY-1633
> URL: http://issues.apache.org/jira/browse/DERBY-1633
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.1.3.0, 10.1.3.1
> Environment: 2.8 GHZ dual PIV on Windows XP SP2, 2 GB memory
> Reporter: Prasenjit Sarkar
> Assigned To: A B
> Fix For: 10.2.0.0
>
> Attachments: d1633_repro.sql, d1633_v1_reviewOnly.patch, d1633_v2.patch, DERBY-1633_v1.html, DERBY-1633_v2.html
>
>
> Database can be assumed to be same as in Derby - 1205 Jira issue
> SELECT PORT1.PORT_ID FROM T_RES_PORT PORT1, T_VIEW_ENTITY2PORT ENTITY2PORT WHERE ENTITY2PORT.PORT_ID = PORT1.PORT_ID
> This works fine in 10.1.2.1 but fails thereafter complaining that Comparison between INTEGER and CHAR is not supported
> for some reason, it thinks one of the PORT_ID columns is a character, when in reality both are integers.
> SELECT DISTINCT
> ZONE.ZONE_ID ZONE_ID,
> PORT2ZONE.ZONE_MEMBER_ID
> FROM
> T_RES_ZONE ZONE left outer join T_VIEW_PORT2ZONE PORT2ZONE on
> ZONE.ZONE_ID = PORT2ZONE.ZONE_ID , T_RES_FABRIC FABRIC
> In this query, it is complaining that one of the columns is a VARCHAR and cannot be compared to INTEGER, when clearly this is not the case...
> Same issue
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Commented: (DERBY-1633) Regression: The fields of views
are not being calculated properly since 10.1.2.4
Posted by Army <qo...@gmail.com>.
Thank you for volunteering to review the DERBY-1633 changes, Rick! I welcome
and appreciate your comments.
Rick Hillegas (JIRA) wrote:
>
> I lost the thread of your explanation when I got to OBS#2 in DERBY-1633_v2.html.
> I did not understand this statement: "When a predicate is pushed to a UNION that
> appears in a Select list, the predicate will be "remapped" so that the column
> references point to their source result columns." I did not grasp how the problem
> was caused by the UNION. Why does a UNION require this mapping?
Great question. I think the answer to this is (intended to be) contained in the
line that follows the one you quoted. If we look at the OBS#2, we see the
following:
<begin quote>
OBS#2: When a predicate is pushed to a UNION that appears in a Select list, the
predicate will be "remapped" so that the column references point to their source
result columns. This remapping occurs as part of the "pushOptPredicate()"
method in ProjectRestrictNode, which will be called because all items in a
Select's FROM list are required to have ProjectRestrictNodes above them (we add
them at bind time).
<end quote>
So you were definitely on the right track: it's not the fact that we have a
UNION node that necessitates the remapping. Rather, the remapping is required
because the UNION node appears in the FROM list of a SELECT node, which means
that there is a ProjectRestrictNode (PRN) above it. When the predicate is
pushed to the ProjectRestrictNode, it will then be remapped as part of the
pushing process.
> Instead, it seemed to me that the remapping was required by what
> was underneath the UNION.
Technically, as per comments above, the remapping is required by what is
immediately *above* the UNION node--namely, the ProjectRestrictNode.
> In this case, it is a Permuting Node (that's a term I just made up),
> that is, an operator which changes column order.
Again, this is along the right lines. I think if you replace "Permuting Node"
with "ProjectRestrictNode", this sentence is then correct. The predicate is
being pushed to the ProjectRestrictNode that sits above the UnionNode, and the
action of pushing the predicate to the PRN causes it (the predicate) to be remapped.
> It seems to me that positional remapping is required for other predicates
> that we try to push down, even if no UNION is involved.
Yes, correct again. All items in a FROM list have a ProjectRestrictNode above
them, so if we push a predicate to one of those PRNs, then the predicate is
going to be remapped. This behavior is not specific to UnionNodes, it is
specific to ProjectRestrictNodes. For the example in the DERBY-1633 document,
it just so happens that the PRN's child is a UnionNode, but the same predicate
remapping will happen if the PRN's child is any other kind of ResultSet node, as
well.
That said, the PRN will then give its child a chance to push the predicate
further down the tree. For most types of children this subsequent push will not
happen, so the predicate will remain at the level of the PRN and that's where it
will be enforced. For UnionNodes, though, it *is* possible to the push the
predicate further down--that's what the work for DERBY-805 does. In order to
complete that push, though, the predicates have to be "scoped" to the
UnionNode's left and right children. This scoping operation is what is at issue
for the regression described in DERBY-1633.
> For instance, I think we need positional remapping in the following
> cases. And this brings me to the heart of my confusion: Why does predicate
> pushdown work in the following cases--or does it? If it does work, how is
> it that UNIONs break the logic? Here are some cases that come to mind:
Again, great questions. To begin with, I should mention that for the two
queries you show, Derby will actually flatten the subqueries into the top result
set, effectively changing them into the following queries:
> select *
> from
> t1,
> ( select b as x, a as y from t2) v2
> where t1.a = v2.y
effectively becomes
select *
from
t1,
t2
where t1.a = t2.a
and
> select *
> from ( select b as x, a as y from t2) v2
> where v2.y = 1
effectively becomes
select *
from
t2
where t2.a = 1
I assume, though, that you didn't intend for subquery flattening to occur when
you asked your question, so let's pretend it doesn't happen. Or better, let's
assume there's a "DISTINCT" keyword in the subqueries, which means that Derby
will not flatten them. Then the answer to your questions are as follows:
> Why does predicate pushdown work in the following cases--or does it?
Yes, predicate pushdown will work in the example queries. As mentioned above,
the predicates are pushed to the ProjectRestrictNodes that sit above the FROM
tables. So in the first example, the predicate will be pushed to a PRN that
sits above a SelectNode that represents "v2". As part of that pushing, the
predicate will be remapped. The PRN will then give its child--the SelectNode--a
chance to push the predicate further down the tree, but the SelectNode will not
do so (the code path to show this is a bit tricky; if you want me to explain it
I can, but for now I'm just skipping it for brevity). Thus the predicate will
remain at its pushed position in the PRN, and that's where it will be enforced.
The same is true for the second example, with one minor difference: since the
predicate is one-sided, it will be pushed to the PRN above v2 as part of
pre-processing; i.e. before optimization begins. But other than that minor
detail, the behavior will be the same as for the first example: namely, the
predicate will be remapped when it is pushed to the PRN and then that's where it
will be enforced.
> If it does work, how is it that UNIONs break the logic?
The case of UNIONs is unique for two reasons. First, as mentioned above, a
UnionNode is one of few types of nodes that will take a predicate and attempt to
push it further down the tree. This means that the predicate is no longer going
to be enforced by the PRN above the UnionNode; instead, the UnionNode's
children--and/or the UnionNode itself--become(s) responsible for enforcing the
predicate.
The second difference is that, as part of the process for pushing the predicate
further down the tree--i.e. to the UnionNode's children--we have to do the extra
work of scoping the predicate to each child. It is this scoping
operation--which currently *only* occurs for UnionNodes, per DERBY-805--that
makes the case of UNIONs unique.
All of that said, "OBS#2" in the write-up for DERBY-1633 is basically saying
that "since the predicate was remapped before we got here," (i.e. as part of the
logic to push the predicate to the PRN above the UnionNode) "the column
references are already pointing to their source result sets." This fact then
influences the behavior of the scoping operation, and that behavior is what is
at issue for DERBY-1633.
Hopefully that answers your question--and my apologies if that was too much
information. If there are still things that aren't clear from my explanation,
please feel free to ask more. I'll try to answer as best as I can...
Thanks for taking the time to review the document. I know it's a lot of detail,
so I definitely appreciate the time of those who are willing to read it over and
provide feedback...
Army