You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Jacques Nadeau <ja...@apache.org> on 2021/10/28 00:11:39 UTC

Proposal: Better abstraction/encapsulation for RelMetadataQuery

Hey all,

I've been working on AOT compilation with Graal (Janino not usable) and I'm
struggling with the hierarchy of classes related to RelMetadataQuery. Right
now it feels like there is tight coupling between how the
JaninoRelMetadataProvider works and RelMetdataQuery. In a perfect world, it
seems like the current implementation should be separated into
JaninoRelMetdataQuery and RelMetdataQuery.
- JaninoRelMetdataQuery: The current version of RelMetdataQuery (roughly)
- RelMetdataQuery: An abstract class with minimal implementation and
doesn't extend RelMetadataQueryBase (which is fully coupled
to JaninoRelMetadataProvider)

Existing users who extend RelMetadatQuery today would move to extending
JaninoRelMetdataQuery. Some examples of the current problematic
encapsulation:
- Weird behavior in RelOptCluster: [1]
- Several internal concerns leaked via public fields in
RelMetadataQueryBase [2]
- No way to override "initialHandler" in RelMetadataQueryBase because it is
declared as a static method for [3]

Does anyone have strong opinions about this type of change?

FYI, I know one of the key points of focus on this code was performance but
unfortunately I don't notice any benchmarks in Calcite focused on this code
(did I miss some?).

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L156
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQueryBase.java#L69
[3]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQueryBase.java#L84

Re: Proposal: Better abstraction/encapsulation for RelMetadataQuery

Posted by Jacques Nadeau <ja...@apache.org>.
Hey James,

The changes in the PR don't actually address the core issue:
RelMetadataQuery assumes that you're going to have concrete handler objects
to resolve the metadata. This is only one way to satisfy
RelMetdataQuery operations but it becomes quite ugly to do it differently
since the implementation is embedded in the interface.

My issue with the current implementation is that it basically forces you to
have two-levels of indirection (find a handler for metadata type, then find
the specific rel type that matches the requested rel) when one will do
just fine (invoke the method associated with this metadata+rel type). The
two levels increase the complexity and boilerplate of a modern
non-compilation implementation (with no benefit that I can see). I'll
provide more comments on your PR since it seems to be doing several things
(and actually further exposes internals in one case).

Additionally, on the performance measurement: did you build a decomposed
version of the testJoinFiveWay test so you avoid including timing
measurements for things like object construction, verification, execution,
etc? In general, we should probably follow a policy that any patch that is
proposed for performance improvements or that we evaluate based on
performance should carry benchmarks with it (or leverage existing ones).

Thanks,
Jacques




On Wed, Oct 27, 2021 at 10:01 PM James Starr <ja...@gmail.com> wrote:

> Hi Jacques,
>
> I have a PR[1] for decoupling the JaninoRelMetadataProvider from the
> RelMetadataQuery.  Instead of using inheritance and requiring downstream
> projects to update their code, composition was used for the creation of the
> initial handler stub, handler creation, and creation of a cache.  This does
> not require downstream projects to extend a class that is coupled with how
> the handlers are generated and maintains all of the existing API while
> decoupling the handler creation from the RelMetadataQuery.  The weird
> behavior in RelOptCluster[2] is encapsulated into a single class and easily
> disregarded by using a different implementation.  My implementation does
> still have a public cache object in RelMetadataQueryBase which is how the
> VolcanoPlanner currently invalidates cache.
>
> Previously, JdbcTest.testJoinFiveWay was used to benchmark metadata.  I ran
> some benchmarks a while back[3], and very similar numbers to what had
> reported Julian in an email thread when the initial move to Janino was
> done.   JdbcTest.testJoinManyWay is another good candidate for generating a
> large number of metadata calls.
>
> Overall, I would prefer a change that did not require downstream projects
> to extend classes that exposed internal implementation and require them to
> change their code.
>
> James
>
> [1]
> https://github.com/apache/calcite/pull/2378
> [2]
>
> core/src/main/java/org/apache/calcite/rel/metadata/JaninoMetadataHandlerProvider.java
> [3]
> https://issues.apache.org/jira/browse/CALCITE-4546
>
> On Wed, Oct 27, 2021 at 5:11 PM Jacques Nadeau <ja...@apache.org> wrote:
>
> > Hey all,
> >
> > I've been working on AOT compilation with Graal (Janino not usable) and
> I'm
> > struggling with the hierarchy of classes related to RelMetadataQuery.
> Right
> > now it feels like there is tight coupling between how the
> > JaninoRelMetadataProvider works and RelMetdataQuery. In a perfect world,
> it
> > seems like the current implementation should be separated into
> > JaninoRelMetdataQuery and RelMetdataQuery.
> > - JaninoRelMetdataQuery: The current version of RelMetdataQuery (roughly)
> > - RelMetdataQuery: An abstract class with minimal implementation and
> > doesn't extend RelMetadataQueryBase (which is fully coupled
> > to JaninoRelMetadataProvider)
> >
> > Existing users who extend RelMetadatQuery today would move to extending
> > JaninoRelMetdataQuery. Some examples of the current problematic
> > encapsulation:
> > - Weird behavior in RelOptCluster: [1]
> > - Several internal concerns leaked via public fields in
> > RelMetadataQueryBase [2]
> > - No way to override "initialHandler" in RelMetadataQueryBase because it
> is
> > declared as a static method for [3]
> >
> > Does anyone have strong opinions about this type of change?
> >
> > FYI, I know one of the key points of focus on this code was performance
> but
> > unfortunately I don't notice any benchmarks in Calcite focused on this
> code
> > (did I miss some?).
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L156
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQueryBase.java#L69
> > [3]
> >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQueryBase.java#L84
> >
>

Re: Proposal: Better abstraction/encapsulation for RelMetadataQuery

Posted by James Starr <ja...@gmail.com>.
Hi Jacques,

I have a PR[1] for decoupling the JaninoRelMetadataProvider from the
RelMetadataQuery.  Instead of using inheritance and requiring downstream
projects to update their code, composition was used for the creation of the
initial handler stub, handler creation, and creation of a cache.  This does
not require downstream projects to extend a class that is coupled with how
the handlers are generated and maintains all of the existing API while
decoupling the handler creation from the RelMetadataQuery.  The weird
behavior in RelOptCluster[2] is encapsulated into a single class and easily
disregarded by using a different implementation.  My implementation does
still have a public cache object in RelMetadataQueryBase which is how the
VolcanoPlanner currently invalidates cache.

Previously, JdbcTest.testJoinFiveWay was used to benchmark metadata.  I ran
some benchmarks a while back[3], and very similar numbers to what had
reported Julian in an email thread when the initial move to Janino was
done.   JdbcTest.testJoinManyWay is another good candidate for generating a
large number of metadata calls.

Overall, I would prefer a change that did not require downstream projects
to extend classes that exposed internal implementation and require them to
change their code.

James

[1]
https://github.com/apache/calcite/pull/2378
[2]
core/src/main/java/org/apache/calcite/rel/metadata/JaninoMetadataHandlerProvider.java
[3]
https://issues.apache.org/jira/browse/CALCITE-4546

On Wed, Oct 27, 2021 at 5:11 PM Jacques Nadeau <ja...@apache.org> wrote:

> Hey all,
>
> I've been working on AOT compilation with Graal (Janino not usable) and I'm
> struggling with the hierarchy of classes related to RelMetadataQuery. Right
> now it feels like there is tight coupling between how the
> JaninoRelMetadataProvider works and RelMetdataQuery. In a perfect world, it
> seems like the current implementation should be separated into
> JaninoRelMetdataQuery and RelMetdataQuery.
> - JaninoRelMetdataQuery: The current version of RelMetdataQuery (roughly)
> - RelMetdataQuery: An abstract class with minimal implementation and
> doesn't extend RelMetadataQueryBase (which is fully coupled
> to JaninoRelMetadataProvider)
>
> Existing users who extend RelMetadatQuery today would move to extending
> JaninoRelMetdataQuery. Some examples of the current problematic
> encapsulation:
> - Weird behavior in RelOptCluster: [1]
> - Several internal concerns leaked via public fields in
> RelMetadataQueryBase [2]
> - No way to override "initialHandler" in RelMetadataQueryBase because it is
> declared as a static method for [3]
>
> Does anyone have strong opinions about this type of change?
>
> FYI, I know one of the key points of focus on this code was performance but
> unfortunately I don't notice any benchmarks in Calcite focused on this code
> (did I miss some?).
>
> [1]
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L156
> [2]
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQueryBase.java#L69
> [3]
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQueryBase.java#L84
>