You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by Dave Birdsall <da...@esgyn.com> on 2018/01/08 20:23:46 UTC

Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Dave Birdsall <da...@esgyn.com>.
Apologies... I misread your question.

I answered the question, "should we support updatable first n views". The answer to that is No.

As for first n views, well, we already support them.


-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 3:26 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

We don't. That was the point of JIRA TRAFODION-2822. But there is this one case that escaped that fix. That's what I'm trying to patch here.

-----Original Message-----
From: Rohit Jain [mailto:rohit.jain@esgyn.com] 
Sent: Monday, January 8, 2018 2:54 PM
To: dev@trafodion.apache.org
Subject: Re: Anomaly with [first n] and ORDER BY

Stupid question, but why do we need to support first n in views?  The use case I see for first n seems does not warrant the need for use in a view that often, of at all. Is this a customer request?

Rohit

> On Jan 8, 2018, at 4:17 PM, Hans Zeller <ha...@esgyn.com> wrote:
> 
> Hi Dave,
> 
> The simple reason is that the person who implemented the [first n] feature is not a compiler developer.
> 
> Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.
> 
> One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.
> 
> Thanks,
> 
> Hans
> 
> -----Original Message-----
> From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
> Sent: Monday, January 8, 2018 12:24 PM
> To: dev@trafodion.apache.org
> Subject: Anomaly with [first n] and ORDER BY
> 
> Hi,
> 
> I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.
> 
> I attempted to fix the latter case by making [first n] views not updatable.
> 
> But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.
> 
> I figured out why.
> 
> At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.
> 
> But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.
> 
> Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.
> 
> But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.
> 
> I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.
> 
> Any thoughts?
> 
> Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Dave Birdsall <da...@esgyn.com>.
We don't. That was the point of JIRA TRAFODION-2822. But there is this one case that escaped that fix. That's what I'm trying to patch here.

-----Original Message-----
From: Rohit Jain [mailto:rohit.jain@esgyn.com] 
Sent: Monday, January 8, 2018 2:54 PM
To: dev@trafodion.apache.org
Subject: Re: Anomaly with [first n] and ORDER BY

Stupid question, but why do we need to support first n in views?  The use case I see for first n seems does not warrant the need for use in a view that often, of at all. Is this a customer request?

Rohit

> On Jan 8, 2018, at 4:17 PM, Hans Zeller <ha...@esgyn.com> wrote:
> 
> Hi Dave,
> 
> The simple reason is that the person who implemented the [first n] feature is not a compiler developer.
> 
> Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.
> 
> One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.
> 
> Thanks,
> 
> Hans
> 
> -----Original Message-----
> From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
> Sent: Monday, January 8, 2018 12:24 PM
> To: dev@trafodion.apache.org
> Subject: Anomaly with [first n] and ORDER BY
> 
> Hi,
> 
> I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.
> 
> I attempted to fix the latter case by making [first n] views not updatable.
> 
> But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.
> 
> I figured out why.
> 
> At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.
> 
> But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.
> 
> Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.
> 
> But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.
> 
> I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.
> 
> Any thoughts?
> 
> Dave

Re: Anomaly with [first n] and ORDER BY

Posted by Rohit Jain <ro...@esgyn.com>.
Stupid question, but why do we need to support first n in views?  The use case I see for first n seems does not warrant the need for use in a view that often, of at all. Is this a customer request?

Rohit

> On Jan 8, 2018, at 4:17 PM, Hans Zeller <ha...@esgyn.com> wrote:
> 
> Hi Dave,
> 
> The simple reason is that the person who implemented the [first n] feature is not a compiler developer.
> 
> Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.
> 
> One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.
> 
> Thanks,
> 
> Hans
> 
> -----Original Message-----
> From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
> Sent: Monday, January 8, 2018 12:24 PM
> To: dev@trafodion.apache.org
> Subject: Anomaly with [first n] and ORDER BY
> 
> Hi,
> 
> I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.
> 
> I attempted to fix the latter case by making [first n] views not updatable.
> 
> But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.
> 
> I figured out why.
> 
> At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.
> 
> But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.
> 
> Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.
> 
> But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.
> 
> I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.
> 
> Any thoughts?
> 
> Dave

Re: Anomaly with [first n] and ORDER BY

Posted by Qifan Chen <qi...@esgyn.com>.
Hi,


One possible optimization for the two identical sort orders (one at the root and one at the root(firstN)) is to eliminate the one at the root from RelRoot::createContextForChild(). This probably can help reduce the # of contexts to be optimized.


We also need to make sure that this new scheme (I like it a lot) works with a parallel plan. Logically in such a plan, each firstN instance will produce n sorted rows and the esp exchange operators help merge them into a sorted data streams in which the first n rows are returned.


Thanks --Qifan

________________________________
From: Dave Birdsall <da...@esgyn.com>
Sent: Monday, January 8, 2018 7:47:02 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

Cool example! Thanks, this will help a lot. Hope to have an implementation by tomorrow.

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com]
Sent: Monday, January 8, 2018 5:40 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave, here is the case where this won't work:

1. The root node requires an order by A from its child.

2. The sort enforcer rule fires and the sort that gets created requires no order from its child (the FirstN)

3. The FirstN gets the required physical properties from the sort, which specifies no order

4. Now, the FirstN does not require any order from its child

If we force the FirstN to require an order for every context it creates, this won't happen. I was also thinking of the equivalent way to do this: select * from t where row_number() over(order by A) <= 10. This works the same way, we include the "order by A" in the Sequence function RelExpr and solve the required property issue that way.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Monday, January 8, 2018 4:50 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

Thanks. I was looking just now at RelExpr::createAContextForAChild. It seems to pass its own required property down to its left-most child (at least for sorting). I'm confused why this isn't good enough.

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com]
Sent: Monday, January 8, 2018 4:30 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave, you could just add a ValueIdList data member to the FirstN and store the ORDER BY there as well as in the Root operator. Then, in a new virtual method createAContextForAChild() method of the FirstN, it would need to add a required order, like the Root operator does. This is not ideal, having to store the order by list twice, but I guess that's the price we pay for having an operator that does not cleanly separate logical and physical properties. Also, the meaning of this order is different from the meaning in the root node, we just syntactically force the two orders to the be same.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Monday, January 8, 2018 3:40 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

Thanks. An elemental question: How do I make both operators require an order?

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com]
Sent: Monday, January 8, 2018 3:28 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

Overall, I like the idea of moving some of this logic into the optimizer. It's about time to do that.

One small comment: The sort enforcer rule does not have a pattern, so it cannot look at the FIRST_N node. It can look at the group, so you could probably mark the group of the FIRST_N somehow.

To be honest, I don't really like this solution, because it uses some flags to deal with required physical properties. Ideally, we would figure out some way to make it work in a way that required physical properties are passed as such, while things that alter the logical properties are stored in RelExprs. The problem is that the FIRST_N operator, especially with an ORDER BY, violates this separation of logical and physical properties. We have another operator like that, the partial groupby.

Here is what I would do - it's not a perfect solution but hopefully it makes the required physical properties more accurate:

For queries with a [FIRST n] and an ORDER BY, let both the root and the FIRST_N node (which is then always added in the binder) ask for the sort order. This is more realistic. Both operators require an order. We might consider a sort on top of the FIRST_N, but that sort would be eliminated, because it is unnecessary, as the FIRST_N would always return its result in order.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Monday, January 8, 2018 2:37 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

So my first attempt at a fix for Trafodion 2840 is to add code to RelRoot::isUpdatableBasic to check if getFirstNRows != -1 or getFirstNParams is non-null.

That fix does half the trick. It makes new [first n] + ORDER BY views not updatable. The half of the trick that it doesn't do is take care of existing [first n] + ORDER BY views; they remain marked updatable in the metadata.

So I was trying to imagine approaches that would catch existing views as well.

In RelRoot::codeGen I saw these comments:

// if root has GET_N indication set, insert a FirstN node.
  // Usually this transformation is done in the binder, but in
  // some special cases it is not.
  // For example, if there is an 'order by' in the query, then
  // the Sort node is added by the optimizer. In this case, we
  // want to add the FirstN node on top of the Sort node and not
  // below it. If we add the FirstN node in the binder, the optimizer
  // will add the Sort node on top of the FirstN node. Maybe we
  // can teach optimizer to do this.

So, I thought: Well, let's explore the idea of having the Optimizer insert the Sort below FirstN instead of above it. I've coded a first attempt (still getting it to compile cleanly at this moment though). The idea was: Change the binder to always insert FirstN (even when ORDER BY is present). Change SortEnforcerRule::topMatch so it does not match on FirstN nodes (that prevents the Sort from being placed on top of FirstN). Add a new rule SortEnforcerFirstNRule that matches FirstN trees only, and transforms FirstN(CutOp) to FirstN(Sort(CutOp)). Hopefully that eliminates the need for the generator to insert the FirstN node, and gets the Sort node in the right place.

This should catch existing [first n] + ORDER BY views since view composition happens in the binder. Now the FirstN node will be generated there and the existing Normalizer check will catch it and flag it not updatable.

What do you think?

Dave


-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com]
Sent: Monday, January 8, 2018 2:17 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

The simple reason is that the person who implemented the [first n] feature is not a compiler developer.

Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.

One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Monday, January 8, 2018 12:24 PM
To: dev@trafodion.apache.org
Subject: Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Dave Birdsall <da...@esgyn.com>.
Hi Hans,

Cool example! Thanks, this will help a lot. Hope to have an implementation by tomorrow.

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 5:40 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave, here is the case where this won't work:

1. The root node requires an order by A from its child.

2. The sort enforcer rule fires and the sort that gets created requires no order from its child (the FirstN)

3. The FirstN gets the required physical properties from the sort, which specifies no order

4. Now, the FirstN does not require any order from its child

If we force the FirstN to require an order for every context it creates, this won't happen. I was also thinking of the equivalent way to do this: select * from t where row_number() over(order by A) <= 10. This works the same way, we include the "order by A" in the Sequence function RelExpr and solve the required property issue that way.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 4:50 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

Thanks. I was looking just now at RelExpr::createAContextForAChild. It seems to pass its own required property down to its left-most child (at least for sorting). I'm confused why this isn't good enough.

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 4:30 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave, you could just add a ValueIdList data member to the FirstN and store the ORDER BY there as well as in the Root operator. Then, in a new virtual method createAContextForAChild() method of the FirstN, it would need to add a required order, like the Root operator does. This is not ideal, having to store the order by list twice, but I guess that's the price we pay for having an operator that does not cleanly separate logical and physical properties. Also, the meaning of this order is different from the meaning in the root node, we just syntactically force the two orders to the be same.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 3:40 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

Thanks. An elemental question: How do I make both operators require an order?

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 3:28 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

Overall, I like the idea of moving some of this logic into the optimizer. It's about time to do that.

One small comment: The sort enforcer rule does not have a pattern, so it cannot look at the FIRST_N node. It can look at the group, so you could probably mark the group of the FIRST_N somehow.

To be honest, I don't really like this solution, because it uses some flags to deal with required physical properties. Ideally, we would figure out some way to make it work in a way that required physical properties are passed as such, while things that alter the logical properties are stored in RelExprs. The problem is that the FIRST_N operator, especially with an ORDER BY, violates this separation of logical and physical properties. We have another operator like that, the partial groupby.

Here is what I would do - it's not a perfect solution but hopefully it makes the required physical properties more accurate:

For queries with a [FIRST n] and an ORDER BY, let both the root and the FIRST_N node (which is then always added in the binder) ask for the sort order. This is more realistic. Both operators require an order. We might consider a sort on top of the FIRST_N, but that sort would be eliminated, because it is unnecessary, as the FIRST_N would always return its result in order.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 2:37 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

So my first attempt at a fix for Trafodion 2840 is to add code to RelRoot::isUpdatableBasic to check if getFirstNRows != -1 or getFirstNParams is non-null.

That fix does half the trick. It makes new [first n] + ORDER BY views not updatable. The half of the trick that it doesn't do is take care of existing [first n] + ORDER BY views; they remain marked updatable in the metadata.

So I was trying to imagine approaches that would catch existing views as well.

In RelRoot::codeGen I saw these comments:

// if root has GET_N indication set, insert a FirstN node.
  // Usually this transformation is done in the binder, but in
  // some special cases it is not.
  // For example, if there is an 'order by' in the query, then
  // the Sort node is added by the optimizer. In this case, we
  // want to add the FirstN node on top of the Sort node and not
  // below it. If we add the FirstN node in the binder, the optimizer
  // will add the Sort node on top of the FirstN node. Maybe we
  // can teach optimizer to do this.

So, I thought: Well, let's explore the idea of having the Optimizer insert the Sort below FirstN instead of above it. I've coded a first attempt (still getting it to compile cleanly at this moment though). The idea was: Change the binder to always insert FirstN (even when ORDER BY is present). Change SortEnforcerRule::topMatch so it does not match on FirstN nodes (that prevents the Sort from being placed on top of FirstN). Add a new rule SortEnforcerFirstNRule that matches FirstN trees only, and transforms FirstN(CutOp) to FirstN(Sort(CutOp)). Hopefully that eliminates the need for the generator to insert the FirstN node, and gets the Sort node in the right place.

This should catch existing [first n] + ORDER BY views since view composition happens in the binder. Now the FirstN node will be generated there and the existing Normalizer check will catch it and flag it not updatable.

What do you think?

Dave


-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 2:17 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

The simple reason is that the person who implemented the [first n] feature is not a compiler developer.

Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.

One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 12:24 PM
To: dev@trafodion.apache.org
Subject: Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Hans Zeller <ha...@esgyn.com>.
Hi Dave, here is the case where this won't work:

1. The root node requires an order by A from its child.

2. The sort enforcer rule fires and the sort that gets created requires no order from its child (the FirstN)

3. The FirstN gets the required physical properties from the sort, which specifies no order

4. Now, the FirstN does not require any order from its child

If we force the FirstN to require an order for every context it creates, this won't happen. I was also thinking of the equivalent way to do this: select * from t where row_number() over(order by A) <= 10. This works the same way, we include the "order by A" in the Sequence function RelExpr and solve the required property issue that way.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 4:50 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

Thanks. I was looking just now at RelExpr::createAContextForAChild. It seems to pass its own required property down to its left-most child (at least for sorting). I'm confused why this isn't good enough.

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 4:30 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave, you could just add a ValueIdList data member to the FirstN and store the ORDER BY there as well as in the Root operator. Then, in a new virtual method createAContextForAChild() method of the FirstN, it would need to add a required order, like the Root operator does. This is not ideal, having to store the order by list twice, but I guess that's the price we pay for having an operator that does not cleanly separate logical and physical properties. Also, the meaning of this order is different from the meaning in the root node, we just syntactically force the two orders to the be same.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 3:40 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

Thanks. An elemental question: How do I make both operators require an order?

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 3:28 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

Overall, I like the idea of moving some of this logic into the optimizer. It's about time to do that.

One small comment: The sort enforcer rule does not have a pattern, so it cannot look at the FIRST_N node. It can look at the group, so you could probably mark the group of the FIRST_N somehow.

To be honest, I don't really like this solution, because it uses some flags to deal with required physical properties. Ideally, we would figure out some way to make it work in a way that required physical properties are passed as such, while things that alter the logical properties are stored in RelExprs. The problem is that the FIRST_N operator, especially with an ORDER BY, violates this separation of logical and physical properties. We have another operator like that, the partial groupby.

Here is what I would do - it's not a perfect solution but hopefully it makes the required physical properties more accurate:

For queries with a [FIRST n] and an ORDER BY, let both the root and the FIRST_N node (which is then always added in the binder) ask for the sort order. This is more realistic. Both operators require an order. We might consider a sort on top of the FIRST_N, but that sort would be eliminated, because it is unnecessary, as the FIRST_N would always return its result in order.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 2:37 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

So my first attempt at a fix for Trafodion 2840 is to add code to RelRoot::isUpdatableBasic to check if getFirstNRows != -1 or getFirstNParams is non-null.

That fix does half the trick. It makes new [first n] + ORDER BY views not updatable. The half of the trick that it doesn't do is take care of existing [first n] + ORDER BY views; they remain marked updatable in the metadata.

So I was trying to imagine approaches that would catch existing views as well.

In RelRoot::codeGen I saw these comments:

// if root has GET_N indication set, insert a FirstN node.
  // Usually this transformation is done in the binder, but in
  // some special cases it is not.
  // For example, if there is an 'order by' in the query, then
  // the Sort node is added by the optimizer. In this case, we
  // want to add the FirstN node on top of the Sort node and not
  // below it. If we add the FirstN node in the binder, the optimizer
  // will add the Sort node on top of the FirstN node. Maybe we
  // can teach optimizer to do this.

So, I thought: Well, let's explore the idea of having the Optimizer insert the Sort below FirstN instead of above it. I've coded a first attempt (still getting it to compile cleanly at this moment though). The idea was: Change the binder to always insert FirstN (even when ORDER BY is present). Change SortEnforcerRule::topMatch so it does not match on FirstN nodes (that prevents the Sort from being placed on top of FirstN). Add a new rule SortEnforcerFirstNRule that matches FirstN trees only, and transforms FirstN(CutOp) to FirstN(Sort(CutOp)). Hopefully that eliminates the need for the generator to insert the FirstN node, and gets the Sort node in the right place.

This should catch existing [first n] + ORDER BY views since view composition happens in the binder. Now the FirstN node will be generated there and the existing Normalizer check will catch it and flag it not updatable.

What do you think?

Dave


-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 2:17 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

The simple reason is that the person who implemented the [first n] feature is not a compiler developer.

Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.

One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 12:24 PM
To: dev@trafodion.apache.org
Subject: Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Dave Birdsall <da...@esgyn.com>.
Hi Hans,

Thanks. I was looking just now at RelExpr::createAContextForAChild. It seems to pass its own required property down to its left-most child (at least for sorting). I'm confused why this isn't good enough.

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 4:30 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave, you could just add a ValueIdList data member to the FirstN and store the ORDER BY there as well as in the Root operator. Then, in a new virtual method createAContextForAChild() method of the FirstN, it would need to add a required order, like the Root operator does. This is not ideal, having to store the order by list twice, but I guess that's the price we pay for having an operator that does not cleanly separate logical and physical properties. Also, the meaning of this order is different from the meaning in the root node, we just syntactically force the two orders to the be same.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 3:40 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

Thanks. An elemental question: How do I make both operators require an order?

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 3:28 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

Overall, I like the idea of moving some of this logic into the optimizer. It's about time to do that.

One small comment: The sort enforcer rule does not have a pattern, so it cannot look at the FIRST_N node. It can look at the group, so you could probably mark the group of the FIRST_N somehow.

To be honest, I don't really like this solution, because it uses some flags to deal with required physical properties. Ideally, we would figure out some way to make it work in a way that required physical properties are passed as such, while things that alter the logical properties are stored in RelExprs. The problem is that the FIRST_N operator, especially with an ORDER BY, violates this separation of logical and physical properties. We have another operator like that, the partial groupby.

Here is what I would do - it's not a perfect solution but hopefully it makes the required physical properties more accurate:

For queries with a [FIRST n] and an ORDER BY, let both the root and the FIRST_N node (which is then always added in the binder) ask for the sort order. This is more realistic. Both operators require an order. We might consider a sort on top of the FIRST_N, but that sort would be eliminated, because it is unnecessary, as the FIRST_N would always return its result in order.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 2:37 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

So my first attempt at a fix for Trafodion 2840 is to add code to RelRoot::isUpdatableBasic to check if getFirstNRows != -1 or getFirstNParams is non-null.

That fix does half the trick. It makes new [first n] + ORDER BY views not updatable. The half of the trick that it doesn't do is take care of existing [first n] + ORDER BY views; they remain marked updatable in the metadata.

So I was trying to imagine approaches that would catch existing views as well.

In RelRoot::codeGen I saw these comments:

// if root has GET_N indication set, insert a FirstN node.
  // Usually this transformation is done in the binder, but in
  // some special cases it is not.
  // For example, if there is an 'order by' in the query, then
  // the Sort node is added by the optimizer. In this case, we
  // want to add the FirstN node on top of the Sort node and not
  // below it. If we add the FirstN node in the binder, the optimizer
  // will add the Sort node on top of the FirstN node. Maybe we
  // can teach optimizer to do this.

So, I thought: Well, let's explore the idea of having the Optimizer insert the Sort below FirstN instead of above it. I've coded a first attempt (still getting it to compile cleanly at this moment though). The idea was: Change the binder to always insert FirstN (even when ORDER BY is present). Change SortEnforcerRule::topMatch so it does not match on FirstN nodes (that prevents the Sort from being placed on top of FirstN). Add a new rule SortEnforcerFirstNRule that matches FirstN trees only, and transforms FirstN(CutOp) to FirstN(Sort(CutOp)). Hopefully that eliminates the need for the generator to insert the FirstN node, and gets the Sort node in the right place.

This should catch existing [first n] + ORDER BY views since view composition happens in the binder. Now the FirstN node will be generated there and the existing Normalizer check will catch it and flag it not updatable.

What do you think?

Dave


-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 2:17 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

The simple reason is that the person who implemented the [first n] feature is not a compiler developer.

Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.

One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 12:24 PM
To: dev@trafodion.apache.org
Subject: Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Hans Zeller <ha...@esgyn.com>.
Hi Dave, you could just add a ValueIdList data member to the FirstN and store the ORDER BY there as well as in the Root operator. Then, in a new virtual method createAContextForAChild() method of the FirstN, it would need to add a required order, like the Root operator does. This is not ideal, having to store the order by list twice, but I guess that's the price we pay for having an operator that does not cleanly separate logical and physical properties. Also, the meaning of this order is different from the meaning in the root node, we just syntactically force the two orders to the be same.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 3:40 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

Thanks. An elemental question: How do I make both operators require an order?

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 3:28 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

Overall, I like the idea of moving some of this logic into the optimizer. It's about time to do that.

One small comment: The sort enforcer rule does not have a pattern, so it cannot look at the FIRST_N node. It can look at the group, so you could probably mark the group of the FIRST_N somehow.

To be honest, I don't really like this solution, because it uses some flags to deal with required physical properties. Ideally, we would figure out some way to make it work in a way that required physical properties are passed as such, while things that alter the logical properties are stored in RelExprs. The problem is that the FIRST_N operator, especially with an ORDER BY, violates this separation of logical and physical properties. We have another operator like that, the partial groupby.

Here is what I would do - it's not a perfect solution but hopefully it makes the required physical properties more accurate:

For queries with a [FIRST n] and an ORDER BY, let both the root and the FIRST_N node (which is then always added in the binder) ask for the sort order. This is more realistic. Both operators require an order. We might consider a sort on top of the FIRST_N, but that sort would be eliminated, because it is unnecessary, as the FIRST_N would always return its result in order.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 2:37 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

So my first attempt at a fix for Trafodion 2840 is to add code to RelRoot::isUpdatableBasic to check if getFirstNRows != -1 or getFirstNParams is non-null.

That fix does half the trick. It makes new [first n] + ORDER BY views not updatable. The half of the trick that it doesn't do is take care of existing [first n] + ORDER BY views; they remain marked updatable in the metadata.

So I was trying to imagine approaches that would catch existing views as well.

In RelRoot::codeGen I saw these comments:

// if root has GET_N indication set, insert a FirstN node.
  // Usually this transformation is done in the binder, but in
  // some special cases it is not.
  // For example, if there is an 'order by' in the query, then
  // the Sort node is added by the optimizer. In this case, we
  // want to add the FirstN node on top of the Sort node and not
  // below it. If we add the FirstN node in the binder, the optimizer
  // will add the Sort node on top of the FirstN node. Maybe we
  // can teach optimizer to do this.

So, I thought: Well, let's explore the idea of having the Optimizer insert the Sort below FirstN instead of above it. I've coded a first attempt (still getting it to compile cleanly at this moment though). The idea was: Change the binder to always insert FirstN (even when ORDER BY is present). Change SortEnforcerRule::topMatch so it does not match on FirstN nodes (that prevents the Sort from being placed on top of FirstN). Add a new rule SortEnforcerFirstNRule that matches FirstN trees only, and transforms FirstN(CutOp) to FirstN(Sort(CutOp)). Hopefully that eliminates the need for the generator to insert the FirstN node, and gets the Sort node in the right place.

This should catch existing [first n] + ORDER BY views since view composition happens in the binder. Now the FirstN node will be generated there and the existing Normalizer check will catch it and flag it not updatable.

What do you think?

Dave


-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 2:17 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

The simple reason is that the person who implemented the [first n] feature is not a compiler developer.

Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.

One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 12:24 PM
To: dev@trafodion.apache.org
Subject: Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Dave Birdsall <da...@esgyn.com>.
Hi Hans,

Thanks. An elemental question: How do I make both operators require an order?

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 3:28 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

Overall, I like the idea of moving some of this logic into the optimizer. It's about time to do that.

One small comment: The sort enforcer rule does not have a pattern, so it cannot look at the FIRST_N node. It can look at the group, so you could probably mark the group of the FIRST_N somehow.

To be honest, I don't really like this solution, because it uses some flags to deal with required physical properties. Ideally, we would figure out some way to make it work in a way that required physical properties are passed as such, while things that alter the logical properties are stored in RelExprs. The problem is that the FIRST_N operator, especially with an ORDER BY, violates this separation of logical and physical properties. We have another operator like that, the partial groupby.

Here is what I would do - it's not a perfect solution but hopefully it makes the required physical properties more accurate:

For queries with a [FIRST n] and an ORDER BY, let both the root and the FIRST_N node (which is then always added in the binder) ask for the sort order. This is more realistic. Both operators require an order. We might consider a sort on top of the FIRST_N, but that sort would be eliminated, because it is unnecessary, as the FIRST_N would always return its result in order.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 2:37 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

So my first attempt at a fix for Trafodion 2840 is to add code to RelRoot::isUpdatableBasic to check if getFirstNRows != -1 or getFirstNParams is non-null.

That fix does half the trick. It makes new [first n] + ORDER BY views not updatable. The half of the trick that it doesn't do is take care of existing [first n] + ORDER BY views; they remain marked updatable in the metadata.

So I was trying to imagine approaches that would catch existing views as well.

In RelRoot::codeGen I saw these comments:

// if root has GET_N indication set, insert a FirstN node.
  // Usually this transformation is done in the binder, but in
  // some special cases it is not.
  // For example, if there is an 'order by' in the query, then
  // the Sort node is added by the optimizer. In this case, we
  // want to add the FirstN node on top of the Sort node and not
  // below it. If we add the FirstN node in the binder, the optimizer
  // will add the Sort node on top of the FirstN node. Maybe we
  // can teach optimizer to do this.

So, I thought: Well, let's explore the idea of having the Optimizer insert the Sort below FirstN instead of above it. I've coded a first attempt (still getting it to compile cleanly at this moment though). The idea was: Change the binder to always insert FirstN (even when ORDER BY is present). Change SortEnforcerRule::topMatch so it does not match on FirstN nodes (that prevents the Sort from being placed on top of FirstN). Add a new rule SortEnforcerFirstNRule that matches FirstN trees only, and transforms FirstN(CutOp) to FirstN(Sort(CutOp)). Hopefully that eliminates the need for the generator to insert the FirstN node, and gets the Sort node in the right place.

This should catch existing [first n] + ORDER BY views since view composition happens in the binder. Now the FirstN node will be generated there and the existing Normalizer check will catch it and flag it not updatable.

What do you think?

Dave


-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 2:17 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

The simple reason is that the person who implemented the [first n] feature is not a compiler developer.

Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.

One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 12:24 PM
To: dev@trafodion.apache.org
Subject: Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Hans Zeller <ha...@esgyn.com>.
Hi Dave,

Overall, I like the idea of moving some of this logic into the optimizer. It's about time to do that.

One small comment: The sort enforcer rule does not have a pattern, so it cannot look at the FIRST_N node. It can look at the group, so you could probably mark the group of the FIRST_N somehow.

To be honest, I don't really like this solution, because it uses some flags to deal with required physical properties. Ideally, we would figure out some way to make it work in a way that required physical properties are passed as such, while things that alter the logical properties are stored in RelExprs. The problem is that the FIRST_N operator, especially with an ORDER BY, violates this separation of logical and physical properties. We have another operator like that, the partial groupby.

Here is what I would do - it's not a perfect solution but hopefully it makes the required physical properties more accurate:

For queries with a [FIRST n] and an ORDER BY, let both the root and the FIRST_N node (which is then always added in the binder) ask for the sort order. This is more realistic. Both operators require an order. We might consider a sort on top of the FIRST_N, but that sort would be eliminated, because it is unnecessary, as the FIRST_N would always return its result in order.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 2:37 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Hans,

So my first attempt at a fix for Trafodion 2840 is to add code to RelRoot::isUpdatableBasic to check if getFirstNRows != -1 or getFirstNParams is non-null.

That fix does half the trick. It makes new [first n] + ORDER BY views not updatable. The half of the trick that it doesn't do is take care of existing [first n] + ORDER BY views; they remain marked updatable in the metadata.

So I was trying to imagine approaches that would catch existing views as well.

In RelRoot::codeGen I saw these comments:

// if root has GET_N indication set, insert a FirstN node.
  // Usually this transformation is done in the binder, but in
  // some special cases it is not.
  // For example, if there is an 'order by' in the query, then
  // the Sort node is added by the optimizer. In this case, we
  // want to add the FirstN node on top of the Sort node and not
  // below it. If we add the FirstN node in the binder, the optimizer
  // will add the Sort node on top of the FirstN node. Maybe we
  // can teach optimizer to do this.

So, I thought: Well, let's explore the idea of having the Optimizer insert the Sort below FirstN instead of above it. I've coded a first attempt (still getting it to compile cleanly at this moment though). The idea was: Change the binder to always insert FirstN (even when ORDER BY is present). Change SortEnforcerRule::topMatch so it does not match on FirstN nodes (that prevents the Sort from being placed on top of FirstN). Add a new rule SortEnforcerFirstNRule that matches FirstN trees only, and transforms FirstN(CutOp) to FirstN(Sort(CutOp)). Hopefully that eliminates the need for the generator to insert the FirstN node, and gets the Sort node in the right place.

This should catch existing [first n] + ORDER BY views since view composition happens in the binder. Now the FirstN node will be generated there and the existing Normalizer check will catch it and flag it not updatable.

What do you think?

Dave


-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 2:17 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

The simple reason is that the person who implemented the [first n] feature is not a compiler developer.

Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.

One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 12:24 PM
To: dev@trafodion.apache.org
Subject: Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Dave Birdsall <da...@esgyn.com>.
Hi Hans,

So my first attempt at a fix for Trafodion 2840 is to add code to RelRoot::isUpdatableBasic to check if getFirstNRows != -1 or getFirstNParams is non-null.

That fix does half the trick. It makes new [first n] + ORDER BY views not updatable. The half of the trick that it doesn't do is take care of existing [first n] + ORDER BY views; they remain marked updatable in the metadata.

So I was trying to imagine approaches that would catch existing views as well.

In RelRoot::codeGen I saw these comments:

// if root has GET_N indication set, insert a FirstN node.
  // Usually this transformation is done in the binder, but in
  // some special cases it is not.
  // For example, if there is an 'order by' in the query, then
  // the Sort node is added by the optimizer. In this case, we
  // want to add the FirstN node on top of the Sort node and not
  // below it. If we add the FirstN node in the binder, the optimizer
  // will add the Sort node on top of the FirstN node. Maybe we
  // can teach optimizer to do this.

So, I thought: Well, let's explore the idea of having the Optimizer insert the Sort below FirstN instead of above it. I've coded a first attempt (still getting it to compile cleanly at this moment though). The idea was: Change the binder to always insert FirstN (even when ORDER BY is present). Change SortEnforcerRule::topMatch so it does not match on FirstN nodes (that prevents the Sort from being placed on top of FirstN). Add a new rule SortEnforcerFirstNRule that matches FirstN trees only, and transforms FirstN(CutOp) to FirstN(Sort(CutOp)). Hopefully that eliminates the need for the generator to insert the FirstN node, and gets the Sort node in the right place.

This should catch existing [first n] + ORDER BY views since view composition happens in the binder. Now the FirstN node will be generated there and the existing Normalizer check will catch it and flag it not updatable.

What do you think?

Dave


-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com] 
Sent: Monday, January 8, 2018 2:17 PM
To: dev@trafodion.apache.org
Subject: RE: Anomaly with [first n] and ORDER BY

Hi Dave,

The simple reason is that the person who implemented the [first n] feature is not a compiler developer.

Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.

One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 12:24 PM
To: dev@trafodion.apache.org
Subject: Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave

RE: Anomaly with [first n] and ORDER BY

Posted by Hans Zeller <ha...@esgyn.com>.
Hi Dave,

The simple reason is that the person who implemented the [first n] feature is not a compiler developer.

Ideally, we would be aware of the [first n] throughout the compilation and have a new required property in the optimizer that says "optimize for first N rows", so that we could favor certain query plans such as nested joins, but this is not happening today and it would be a significant project.

One other comment about being able to update a [first n] view: Ideally, such a view would be updatable if no WITH CHECK OPTION was specified, and it would not be updatable when the WITH CHECK OPTION was specified in the CREATE VIEW DDL. Again, that's the ideal case, and we may not be able to make that happen today.

Thanks,

Hans

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Monday, January 8, 2018 12:24 PM
To: dev@trafodion.apache.org
Subject: Anomaly with [first n] and ORDER BY

Hi,

I've been studying https://issues.apache.org/jira/browse/TRAFODION-2840, and the related case https://issues.apache.org/jira/browse/TRAFODION-2822.

I attempted to fix the latter case by making [first n] views not updatable.

But the former case documents a hole in my fix. It seems that if we add ORDER BY to the view definition, the checks in 2822 are circumvented.

I figured out why.

At bind time, [first n] scans are transformed to a firstN(scan) tree (that is, a firstN node is created and inserted on top of the scan). EXCEPT, if there is an ORDER BY clause, we don't do this. Instead, we generate the firstN node at code generation time.

But that means the Normalizer sees a [first n] + ORDERBY as just a scan, and a [first n] without ORDER BY as firstN(scan). The fix for 2822 was in the Normalizer; so this anomaly explains why the fix didn't work when ORDER BY was present.

Now, I've figured out how to improve the fix so the Normalizer catches the ORDER BY example.

But I am curious why we do this strange thing of deferring firstN insertion to generation time. It seems to me doing so could defeat many other checks for firstN processing. For example, an optimizer rule that does something for firstNs wouldn't fire if an ORDER BY is present.

I'm wondering, for example, why we didn't have the Binder simply insert a firstN and a sort node into the tree.

Any thoughts?

Dave