You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Enrico Olivelli <eo...@gmail.com> on 2020/09/18 07:01:17 UTC

Disabling JaninoRelMetadataProvider

Hi,
I am seeing that JaninoRelMetadataProvider is quite expensive, at least for
the "boostrap" phase of the system.

It is a cool piece of software and it is working well, but in some cases
probably it could be quite overkill, for instance in test cases of
applications that mostly run only one query and then end.

You could argue that the price of the compilation is paid only once and
there is no need to complicate things for some corner cases that probably
do not affect production cases.

In CALCITE-3901 we introduced calcite.enable.regenerate.metadata.handler
system property in order to limit the number of times that Janino kicks in
but not to drop it at all.

From the code I see that it is a subclass of RelMetadataProvider, but there
is no way to get rid of it in VolcanoPlanner.

Also, Janino is a third party library, and having the ability of dropping
it will help in having a smaller set of dependencies downstream (but this
is not blocker at the moment, it is only a nice-to-have)

Do you have any experience with this problem ? Is there any chance to add
some configuration option to pass a RelMetadataProvider implementation
that does not rely on dynamic code generation ?
If the idea is valuable I will be happy to work on it.

See this issue for reference, with a full stacktrace of the execution
https://github.com/diennea/herddb/issues/517

Regards
Enrico

Re: Disabling JaninoRelMetadataProvider

Posted by Fan Liya <li...@gmail.com>.
Hi Julian,

Thanks for your good suggestion.
The idea of delayed freeze sounds reasonable to me, and it provides more
flexibility.
I also like the idea of an output report which is helpful for performance
diagnostics.

Best,
Liya Fan


On Fri, Sep 25, 2020 at 7:39 AM Julian Hyde <jh...@apache.org> wrote:

> Would it make sense to delay the 'freeze'? E.g. receive 1,000 requests
> (as measured using an AtomicInteger) and then freeze the
> JaninoRelMetadataProvider. That way, tests would not be competing with
> each other.
>
> Similarly, in production, you could create a production instance of
> JaninoRelMetadataProvider with what you believe to be a good list of
> classes and metadata. The first request that asks for a class or
> metadata will not fail. But when you have gathered, say, 100 missing
> classes (and therefore re-generated 100 times), at that point it can
> output a list of the classes that were missing from the initial
> config.
>
> Julian
>
> On Mon, Sep 21, 2020 at 4:53 AM Fan Liya <li...@gmail.com> wrote:
> >
> > Hi Julian,
> >
> > Thank you for the feedback and good suggestions.
> >
> > The idea of lazy load is reasonable. However, the testing of this feature
> > is still tricky, because:
> > 1. When we have changed the flag, all subsequent test cases are all also
> > affected (many components of JaninoRelMetadataProvider are declared as
> > static, which is used by all test cases in the same JVM, so it is
> difficult
> > to revert the changes made by a single test case).
> > 2.  All test cases may be run in parallel, so when we change the flag,
> > other test cases running concurrently may fail. (The
> > JaninoRelMetadataProvider object is a singleton, which is shared by all
> > threads).
> >
> > I have updated the PR [1] to implement the idea of lazy load. Could
> someone
> > please take a look?
> >
> > Best,
> > Liya Fan
> >
> > [1] https://github.com/apache/calcite/pull/1901
> >
> > On Sat, Sep 19, 2020 at 1:50 AM Julian Hyde <jh...@gmail.com>
> wrote:
> >
> > > CALCITE-3901 looks like a reasonable approach. I like the idea that it
> > > would throw and tell you the classes and metadata types that you have
> > > missed.
> > >
> > > The packaging in 3901 isn’t great. I don’t believe that the test needs
> to
> > > read a system property and not even via the CalciteSystemProperty
> > > mechanism.  We can do better.
> > >
> > > Maybe we have a class name and arguments for the metadata provider,
> which
> > > is loaded on demand. Will work for the test and also in production.
> > >
> > > Julian
> > >
> > > > On Sep 18, 2020, at 12:57 AM, Fan Liya <li...@gmail.com> wrote:
> > > >
> > > > Hi Enrico,
> > > >
> > > > We have also observed the problem.
> > > >
> > > > Janino compilation is time consuming. For each single Java class, it
> > > takes
> > > > tens of milli-seconds.
> > > > Moreover, the compilation will take place multiple times, because:
> > > > 1. We have multiple types of metadata.
> > > > 2. The class for a particular type of metadata may be regenerated and
> > > > recompiled, because a rel node was not registered at first.
> > > >
> > > > We opened CALCITE-3901 and provided a PR [1] to address problem 2 by
> > > > reducing the number of compilations. However, so far it has not been
> > > > accepted yet.
> > > >
> > > > To solve problem 1, I think we have two potential solutions:
> > > > 1. Provide a metadata provider that is not based on codegen and
> Janino
> > > > compilation (Obviously, this will be a long term plan).
> > > > 2. Save the generated Java class, and compile it with the client
> code,
> > > and
> > > > change the implementation of JaninoMetaDataProvider#load3 so that if
> a
> > > > handler can be found in the current class loader, then it skips the
> > > codegen
> > > > and compilation.
> > > >
> > > > Best,
> > > > Liya Fan
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > [1] https://github.com/apache/calcite/pull/1901
> > > >
> > > >
> > > >
> > > >> On Fri, Sep 18, 2020 at 3:01 PM Enrico Olivelli <
> eolivelli@gmail.com>
> > > wrote:
> > > >>
> > > >> Hi,
> > > >> I am seeing that JaninoRelMetadataProvider is quite expensive, at
> least
> > > for
> > > >> the "boostrap" phase of the system.
> > > >>
> > > >> It is a cool piece of software and it is working well, but in some
> cases
> > > >> probably it could be quite overkill, for instance in test cases of
> > > >> applications that mostly run only one query and then end.
> > > >>
> > > >> You could argue that the price of the compilation is paid only once
> and
> > > >> there is no need to complicate things for some corner cases that
> > > probably
> > > >> do not affect production cases.
> > > >>
> > > >> In CALCITE-3901 we introduced
> calcite.enable.regenerate.metadata.handler
> > > >> system property in order to limit the number of times that Janino
> kicks
> > > in
> > > >> but not to drop it at all.
> > > >>
> > > >> From the code I see that it is a subclass of RelMetadataProvider,
> but
> > > there
> > > >> is no way to get rid of it in VolcanoPlanner.
> > > >>
> > > >> Also, Janino is a third party library, and having the ability of
> > > dropping
> > > >> it will help in having a smaller set of dependencies downstream (but
> > > this
> > > >> is not blocker at the moment, it is only a nice-to-have)
> > > >>
> > > >> Do you have any experience with this problem ? Is there any chance
> to
> > > add
> > > >> some configuration option to pass a RelMetadataProvider
> implementation
> > > >> that does not rely on dynamic code generation ?
> > > >> If the idea is valuable I will be happy to work on it.
> > > >>
> > > >> See this issue for reference, with a full stacktrace of the
> execution
> > > >> https://github.com/diennea/herddb/issues/517
> > > >>
> > > >> Regards
> > > >> Enrico
> > > >>
> > >
>

Re: Disabling JaninoRelMetadataProvider

Posted by Julian Hyde <jh...@apache.org>.
Would it make sense to delay the 'freeze'? E.g. receive 1,000 requests
(as measured using an AtomicInteger) and then freeze the
JaninoRelMetadataProvider. That way, tests would not be competing with
each other.

Similarly, in production, you could create a production instance of
JaninoRelMetadataProvider with what you believe to be a good list of
classes and metadata. The first request that asks for a class or
metadata will not fail. But when you have gathered, say, 100 missing
classes (and therefore re-generated 100 times), at that point it can
output a list of the classes that were missing from the initial
config.

Julian

On Mon, Sep 21, 2020 at 4:53 AM Fan Liya <li...@gmail.com> wrote:
>
> Hi Julian,
>
> Thank you for the feedback and good suggestions.
>
> The idea of lazy load is reasonable. However, the testing of this feature
> is still tricky, because:
> 1. When we have changed the flag, all subsequent test cases are all also
> affected (many components of JaninoRelMetadataProvider are declared as
> static, which is used by all test cases in the same JVM, so it is difficult
> to revert the changes made by a single test case).
> 2.  All test cases may be run in parallel, so when we change the flag,
> other test cases running concurrently may fail. (The
> JaninoRelMetadataProvider object is a singleton, which is shared by all
> threads).
>
> I have updated the PR [1] to implement the idea of lazy load. Could someone
> please take a look?
>
> Best,
> Liya Fan
>
> [1] https://github.com/apache/calcite/pull/1901
>
> On Sat, Sep 19, 2020 at 1:50 AM Julian Hyde <jh...@gmail.com> wrote:
>
> > CALCITE-3901 looks like a reasonable approach. I like the idea that it
> > would throw and tell you the classes and metadata types that you have
> > missed.
> >
> > The packaging in 3901 isn’t great. I don’t believe that the test needs to
> > read a system property and not even via the CalciteSystemProperty
> > mechanism.  We can do better.
> >
> > Maybe we have a class name and arguments for the metadata provider, which
> > is loaded on demand. Will work for the test and also in production.
> >
> > Julian
> >
> > > On Sep 18, 2020, at 12:57 AM, Fan Liya <li...@gmail.com> wrote:
> > >
> > > Hi Enrico,
> > >
> > > We have also observed the problem.
> > >
> > > Janino compilation is time consuming. For each single Java class, it
> > takes
> > > tens of milli-seconds.
> > > Moreover, the compilation will take place multiple times, because:
> > > 1. We have multiple types of metadata.
> > > 2. The class for a particular type of metadata may be regenerated and
> > > recompiled, because a rel node was not registered at first.
> > >
> > > We opened CALCITE-3901 and provided a PR [1] to address problem 2 by
> > > reducing the number of compilations. However, so far it has not been
> > > accepted yet.
> > >
> > > To solve problem 1, I think we have two potential solutions:
> > > 1. Provide a metadata provider that is not based on codegen and Janino
> > > compilation (Obviously, this will be a long term plan).
> > > 2. Save the generated Java class, and compile it with the client code,
> > and
> > > change the implementation of JaninoMetaDataProvider#load3 so that if a
> > > handler can be found in the current class loader, then it skips the
> > codegen
> > > and compilation.
> > >
> > > Best,
> > > Liya Fan
> > >
> > >
> > >
> > >
> > >
> > >
> > > [1] https://github.com/apache/calcite/pull/1901
> > >
> > >
> > >
> > >> On Fri, Sep 18, 2020 at 3:01 PM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> > >>
> > >> Hi,
> > >> I am seeing that JaninoRelMetadataProvider is quite expensive, at least
> > for
> > >> the "boostrap" phase of the system.
> > >>
> > >> It is a cool piece of software and it is working well, but in some cases
> > >> probably it could be quite overkill, for instance in test cases of
> > >> applications that mostly run only one query and then end.
> > >>
> > >> You could argue that the price of the compilation is paid only once and
> > >> there is no need to complicate things for some corner cases that
> > probably
> > >> do not affect production cases.
> > >>
> > >> In CALCITE-3901 we introduced calcite.enable.regenerate.metadata.handler
> > >> system property in order to limit the number of times that Janino kicks
> > in
> > >> but not to drop it at all.
> > >>
> > >> From the code I see that it is a subclass of RelMetadataProvider, but
> > there
> > >> is no way to get rid of it in VolcanoPlanner.
> > >>
> > >> Also, Janino is a third party library, and having the ability of
> > dropping
> > >> it will help in having a smaller set of dependencies downstream (but
> > this
> > >> is not blocker at the moment, it is only a nice-to-have)
> > >>
> > >> Do you have any experience with this problem ? Is there any chance to
> > add
> > >> some configuration option to pass a RelMetadataProvider implementation
> > >> that does not rely on dynamic code generation ?
> > >> If the idea is valuable I will be happy to work on it.
> > >>
> > >> See this issue for reference, with a full stacktrace of the execution
> > >> https://github.com/diennea/herddb/issues/517
> > >>
> > >> Regards
> > >> Enrico
> > >>
> >

Re: Disabling JaninoRelMetadataProvider

Posted by Fan Liya <li...@gmail.com>.
Hi Julian,

Thank you for the feedback and good suggestions.

The idea of lazy load is reasonable. However, the testing of this feature
is still tricky, because:
1. When we have changed the flag, all subsequent test cases are all also
affected (many components of JaninoRelMetadataProvider are declared as
static, which is used by all test cases in the same JVM, so it is difficult
to revert the changes made by a single test case).
2.  All test cases may be run in parallel, so when we change the flag,
other test cases running concurrently may fail. (The
JaninoRelMetadataProvider object is a singleton, which is shared by all
threads).

I have updated the PR [1] to implement the idea of lazy load. Could someone
please take a look?

Best,
Liya Fan

[1] https://github.com/apache/calcite/pull/1901

On Sat, Sep 19, 2020 at 1:50 AM Julian Hyde <jh...@gmail.com> wrote:

> CALCITE-3901 looks like a reasonable approach. I like the idea that it
> would throw and tell you the classes and metadata types that you have
> missed.
>
> The packaging in 3901 isn’t great. I don’t believe that the test needs to
> read a system property and not even via the CalciteSystemProperty
> mechanism.  We can do better.
>
> Maybe we have a class name and arguments for the metadata provider, which
> is loaded on demand. Will work for the test and also in production.
>
> Julian
>
> > On Sep 18, 2020, at 12:57 AM, Fan Liya <li...@gmail.com> wrote:
> >
> > Hi Enrico,
> >
> > We have also observed the problem.
> >
> > Janino compilation is time consuming. For each single Java class, it
> takes
> > tens of milli-seconds.
> > Moreover, the compilation will take place multiple times, because:
> > 1. We have multiple types of metadata.
> > 2. The class for a particular type of metadata may be regenerated and
> > recompiled, because a rel node was not registered at first.
> >
> > We opened CALCITE-3901 and provided a PR [1] to address problem 2 by
> > reducing the number of compilations. However, so far it has not been
> > accepted yet.
> >
> > To solve problem 1, I think we have two potential solutions:
> > 1. Provide a metadata provider that is not based on codegen and Janino
> > compilation (Obviously, this will be a long term plan).
> > 2. Save the generated Java class, and compile it with the client code,
> and
> > change the implementation of JaninoMetaDataProvider#load3 so that if a
> > handler can be found in the current class loader, then it skips the
> codegen
> > and compilation.
> >
> > Best,
> > Liya Fan
> >
> >
> >
> >
> >
> >
> > [1] https://github.com/apache/calcite/pull/1901
> >
> >
> >
> >> On Fri, Sep 18, 2020 at 3:01 PM Enrico Olivelli <eo...@gmail.com>
> wrote:
> >>
> >> Hi,
> >> I am seeing that JaninoRelMetadataProvider is quite expensive, at least
> for
> >> the "boostrap" phase of the system.
> >>
> >> It is a cool piece of software and it is working well, but in some cases
> >> probably it could be quite overkill, for instance in test cases of
> >> applications that mostly run only one query and then end.
> >>
> >> You could argue that the price of the compilation is paid only once and
> >> there is no need to complicate things for some corner cases that
> probably
> >> do not affect production cases.
> >>
> >> In CALCITE-3901 we introduced calcite.enable.regenerate.metadata.handler
> >> system property in order to limit the number of times that Janino kicks
> in
> >> but not to drop it at all.
> >>
> >> From the code I see that it is a subclass of RelMetadataProvider, but
> there
> >> is no way to get rid of it in VolcanoPlanner.
> >>
> >> Also, Janino is a third party library, and having the ability of
> dropping
> >> it will help in having a smaller set of dependencies downstream (but
> this
> >> is not blocker at the moment, it is only a nice-to-have)
> >>
> >> Do you have any experience with this problem ? Is there any chance to
> add
> >> some configuration option to pass a RelMetadataProvider implementation
> >> that does not rely on dynamic code generation ?
> >> If the idea is valuable I will be happy to work on it.
> >>
> >> See this issue for reference, with a full stacktrace of the execution
> >> https://github.com/diennea/herddb/issues/517
> >>
> >> Regards
> >> Enrico
> >>
>

Re: Disabling JaninoRelMetadataProvider

Posted by Julian Hyde <jh...@gmail.com>.
CALCITE-3901 looks like a reasonable approach. I like the idea that it would throw and tell you the classes and metadata types that you have missed. 

The packaging in 3901 isn’t great. I don’t believe that the test needs to read a system property and not even via the CalciteSystemProperty mechanism.  We can do better.

Maybe we have a class name and arguments for the metadata provider, which is loaded on demand. Will work for the test and also in production.

Julian

> On Sep 18, 2020, at 12:57 AM, Fan Liya <li...@gmail.com> wrote:
> 
> Hi Enrico,
> 
> We have also observed the problem.
> 
> Janino compilation is time consuming. For each single Java class, it takes
> tens of milli-seconds.
> Moreover, the compilation will take place multiple times, because:
> 1. We have multiple types of metadata.
> 2. The class for a particular type of metadata may be regenerated and
> recompiled, because a rel node was not registered at first.
> 
> We opened CALCITE-3901 and provided a PR [1] to address problem 2 by
> reducing the number of compilations. However, so far it has not been
> accepted yet.
> 
> To solve problem 1, I think we have two potential solutions:
> 1. Provide a metadata provider that is not based on codegen and Janino
> compilation (Obviously, this will be a long term plan).
> 2. Save the generated Java class, and compile it with the client code, and
> change the implementation of JaninoMetaDataProvider#load3 so that if a
> handler can be found in the current class loader, then it skips the codegen
> and compilation.
> 
> Best,
> Liya Fan
> 
> 
> 
> 
> 
> 
> [1] https://github.com/apache/calcite/pull/1901
> 
> 
> 
>> On Fri, Sep 18, 2020 at 3:01 PM Enrico Olivelli <eo...@gmail.com> wrote:
>> 
>> Hi,
>> I am seeing that JaninoRelMetadataProvider is quite expensive, at least for
>> the "boostrap" phase of the system.
>> 
>> It is a cool piece of software and it is working well, but in some cases
>> probably it could be quite overkill, for instance in test cases of
>> applications that mostly run only one query and then end.
>> 
>> You could argue that the price of the compilation is paid only once and
>> there is no need to complicate things for some corner cases that probably
>> do not affect production cases.
>> 
>> In CALCITE-3901 we introduced calcite.enable.regenerate.metadata.handler
>> system property in order to limit the number of times that Janino kicks in
>> but not to drop it at all.
>> 
>> From the code I see that it is a subclass of RelMetadataProvider, but there
>> is no way to get rid of it in VolcanoPlanner.
>> 
>> Also, Janino is a third party library, and having the ability of dropping
>> it will help in having a smaller set of dependencies downstream (but this
>> is not blocker at the moment, it is only a nice-to-have)
>> 
>> Do you have any experience with this problem ? Is there any chance to add
>> some configuration option to pass a RelMetadataProvider implementation
>> that does not rely on dynamic code generation ?
>> If the idea is valuable I will be happy to work on it.
>> 
>> See this issue for reference, with a full stacktrace of the execution
>> https://github.com/diennea/herddb/issues/517
>> 
>> Regards
>> Enrico
>> 

Re: Disabling JaninoRelMetadataProvider

Posted by Fan Liya <li...@gmail.com>.
Hi Enrico,

We have also observed the problem.

Janino compilation is time consuming. For each single Java class, it takes
tens of milli-seconds.
Moreover, the compilation will take place multiple times, because:
1. We have multiple types of metadata.
2. The class for a particular type of metadata may be regenerated and
recompiled, because a rel node was not registered at first.

We opened CALCITE-3901 and provided a PR [1] to address problem 2 by
reducing the number of compilations. However, so far it has not been
accepted yet.

To solve problem 1, I think we have two potential solutions:
1. Provide a metadata provider that is not based on codegen and Janino
compilation (Obviously, this will be a long term plan).
2. Save the generated Java class, and compile it with the client code, and
change the implementation of JaninoMetaDataProvider#load3 so that if a
handler can be found in the current class loader, then it skips the codegen
and compilation.

Best,
Liya Fan






[1] https://github.com/apache/calcite/pull/1901



On Fri, Sep 18, 2020 at 3:01 PM Enrico Olivelli <eo...@gmail.com> wrote:

> Hi,
> I am seeing that JaninoRelMetadataProvider is quite expensive, at least for
> the "boostrap" phase of the system.
>
> It is a cool piece of software and it is working well, but in some cases
> probably it could be quite overkill, for instance in test cases of
> applications that mostly run only one query and then end.
>
> You could argue that the price of the compilation is paid only once and
> there is no need to complicate things for some corner cases that probably
> do not affect production cases.
>
> In CALCITE-3901 we introduced calcite.enable.regenerate.metadata.handler
> system property in order to limit the number of times that Janino kicks in
> but not to drop it at all.
>
> From the code I see that it is a subclass of RelMetadataProvider, but there
> is no way to get rid of it in VolcanoPlanner.
>
> Also, Janino is a third party library, and having the ability of dropping
> it will help in having a smaller set of dependencies downstream (but this
> is not blocker at the moment, it is only a nice-to-have)
>
> Do you have any experience with this problem ? Is there any chance to add
> some configuration option to pass a RelMetadataProvider implementation
> that does not rely on dynamic code generation ?
> If the idea is valuable I will be happy to work on it.
>
> See this issue for reference, with a full stacktrace of the execution
> https://github.com/diennea/herddb/issues/517
>
> Regards
> Enrico
>