You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Vladimir Sitnikov <si...@gmail.com> on 2020/09/26 21:33:22 UTC

[DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Hi,

I suggest we add nullness verification to the main (non-test) code.
That would make the code easier to follow, and it would avoid NPEs.
checkerframework is an established verification framework for Java, and
they do ~monthly releases ( https://checkerframework.org/changelog.txt )

The verification for linq4j and core takes ~10min, so I guess we could run
it for each PR.

Frankly speaking, I'm quite happy with the results.
The overall idea is "assume all references (except local vars) are
non-nullable, then @Nullable can be added when needed".

The more I work with it the more I like it. @Nullable annotations make it
immediately visible if the field/parameter/result value can be null or not.

Recently Kenneth shared their strongly-positive experience when adding
checkerframework to Beam
https://lists.apache.org/thread.html/rb4991ed24668b0523fe9ac0ef185b471f5a8171655550138a674c407%40%3Cdev.calcite.apache.org%3E

I wonder what do you think?

Even though the patch looks big, I rebase it from time to time, and the
rare conflicts are easy to resolve.

---

The corresponding issue is
https://issues.apache.org/jira/browse/CALCITE-4199 , and it has been open
for a while, however, now I guess
the patch is more-or-less ready for the review.
Note: there are still nullability violations left, however, I think the
most part of the significant changes is already implemented.

Note: there are lots of sub-tasks under 4199, and many of them are not that
trivial.
For instance, RelNode#estimateRowCount must return non-null int, and
metadata returns null for "unknown rowcount" case. Apparently, that might
fail with NPE,
and it is not clear how to approach that.

The PR is https://github.com/apache/calcite/pull/1929
Note: I moved "add @Nullable annotations"  to the separate commits (the
commit messages end with "@nullable annotations only").

For instance,
linq4j refactor:
https://github.com/apache/calcite/pull/1929/commits/44628938ef723547dbb1a0aaa4c3233b724bba1d
core refactor:
https://github.com/apache/calcite/pull/1929/commits/31e17a09c45c555ac07dc3f82ee06ce8b1108ef8

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Julian Hyde <jh...@gmail.com>.
I’m done. Feel free to squash into your own commits if it makes sense and yields a clean history. 

The SqlParserPos thing should stay as a separate commit (still part of this change). I’ve been meaning to do that for some time. It’s a hot code path for the parser.

Julian

> On Nov 24, 2020, at 14:06, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
> Julian>I have added some commits in
> https://github.com/apache/calcite/pull/2278 <
> https://github.com/apache/calcite/pull/2278> that fix the three points I
> raised.
> 
> Thanks. Are you done with it? Please let me know if you have more commits
> coming.
> 
> I had very similar refactorings in mind and the reason I skipped them as I
> wanted to avoid mixing all the fixes into a single PR.
> 
> For instance, I know there were "field can be final" issues, and it was
> very tempting to fix all of them :)
> 
> Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Julian>I have added some commits in
https://github.com/apache/calcite/pull/2278 <
https://github.com/apache/calcite/pull/2278> that fix the three points I
raised.

Thanks. Are you done with it? Please let me know if you have more commits
coming.

I had very similar refactorings in mind and the reason I skipped them as I
wanted to avoid mixing all the fixes into a single PR.

For instance, I know there were "field can be final" issues, and it was
very tempting to fix all of them :)

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Julian Hyde <jh...@gmail.com>.
Vladimir,

I have added some commits in https://github.com/apache/calcite/pull/2278 <https://github.com/apache/calcite/pull/2278> that fix the three points I raised.

Julian


> On Nov 23, 2020, at 4:08 PM, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
> Julian>I can’t tell whether you sincerely want my help. I’ll create a patch
> fixing the three items. I’m fearful that you will take issue with what I
> put in the patch and my time will be wasted.
> 
> We can update javadocs at any time. If you want to make that simultaneously
> with PR 2268, I'm ok.
> Please prepare the patch and I can commit --amend it. You could commit
> --amend on your own if you like that better.
> 
> However, please use PR 2268 as a base branch (2273 is a bit stale)
> 
> The only thing that bothers me is I do not want javadoc update to delay
> nullness merge since it would increase merge conflicts.
> 
> The changes are orthogonal after all. Nullness does not alter behaviour,
> and javadoc clarifies behaviour.
> Both of them are useful.
> 
> Julia>Most people won’t read it, or remember it, and we should not expect
> them too
> 
> /develop/index.md is useful for those who contribute to Calcite for the
> first time.
> It is useful as a reference as well (e.g. commit message guidelines are
> there).
> 
> Julian>I think PolyNull needs to be explained in words each time it is used
> in a public method. I’m happy to do that.
> 
> I'm ok provided we do not add rules like "every polynull must be explained".
> In my experience, PolyNull was easy to read and write after a day or so of
> playing with it.
> 
> Julian>When I challenged you about it last time you implied it was there by
> accident (because you had recently rebased), and yet it’s still here.
> 
> Last time you reverted that note from master with no comments. At least I
> failed to understand why it was reverted.
> I can commit "1.26 simplification warning" to master if you will.
> 
> Vladimir


Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Julian>I can’t tell whether you sincerely want my help. I’ll create a patch
fixing the three items. I’m fearful that you will take issue with what I
put in the patch and my time will be wasted.

We can update javadocs at any time. If you want to make that simultaneously
with PR 2268, I'm ok.
Please prepare the patch and I can commit --amend it. You could commit
--amend on your own if you like that better.

However, please use PR 2268 as a base branch (2273 is a bit stale)

The only thing that bothers me is I do not want javadoc update to delay
nullness merge since it would increase merge conflicts.

The changes are orthogonal after all. Nullness does not alter behaviour,
and javadoc clarifies behaviour.
Both of them are useful.

Julia>Most people won’t read it, or remember it, and we should not expect
them too

/develop/index.md is useful for those who contribute to Calcite for the
first time.
It is useful as a reference as well (e.g. commit message guidelines are
there).

Julian>I think PolyNull needs to be explained in words each time it is used
in a public method. I’m happy to do that.

I'm ok provided we do not add rules like "every polynull must be explained".
In my experience, PolyNull was easy to read and write after a day or so of
playing with it.

Julian>When I challenged you about it last time you implied it was there by
accident (because you had recently rebased), and yet it’s still here.

Last time you reverted that note from master with no comments. At least I
failed to understand why it was reverted.
I can commit "1.26 simplification warning" to master if you will.

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Julian Hyde <jh...@gmail.com>.

> Are you willing to help with fixing the bits that you identified?
> Could you suggest timeframes when you can implement that?

I can’t tell whether you sincerely want my help. I’ll create a patch fixing the three items. I’m fearful that you will take issue with what I put in the patch and my time will be wasted.

> Julian>PolyNull is difficult to understand
> 
> I agree PolyNull is not the most obvious thing, however, "arrays vs
> annotations" is even harder :)
> I think I can add a couple of words on PolyNull to develop/index.md.

I don’t think that’s the solution. Most people won’t read it, or remember it, and we should not expect them too. I think PolyNull needs to be explained in words each time it is used in a public method. I’m happy to do that.

> Julian>* Remove the changes to 2020-10-06-release-1.26.0.md
> 
> Can you please clarify what is wrong with that change?
> Do you think Calcite 1.26 is a good release which people should use in
> production systems like 1.25 and other versions?
> Do you think people should prefer upgrading to 1.26 as soon as possible?
> Do you have better wording for the warning?

It doesn’t belong as part of this change.

When I challenged you about it last time you implied it was there by accident (because you had recently rebased), and yet it’s still here.

> The added Wrapper.unwrapOrThrow is API.Status.INTERNAL. What is the problem
> with adding such a method?

Because you promised you would not.

Julian



Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Thanks for the review, Julian.

Are you willing to help with fixing the bits that you identified?
Could you suggest timeframes when you can implement that?

Julian>PolyNull is difficult to understand

I agree PolyNull is not the most obvious thing, however, "arrays vs
annotations" is even harder :)
I think I can add a couple of words on PolyNull to develop/index.md.

Julian>* Remove the changes to 2020-10-06-release-1.26.0.md

Can you please clarify what is wrong with that change?
Do you think Calcite 1.26 is a good release which people should use in
production systems like 1.25 and other versions?
Do you think people should prefer upgrading to 1.26 as soon as possible?
Do you have better wording for the warning?

Julian> * Add descriptions to javadoc for PolyNull. signature of public
methods (i.e. methods that have javadoc),

There are ~18 methods that have PolyNull and javadoc at the same time.
However, I'm not sure nullness is really description is really needed.

You mention

  /** Returns the value of {@link CalciteConnectionProperty#FUN}.
   * If {@code defaultOperatorTable} is not null, the return will never be
null. */
  <T> @PolyNull T fun(Class<T> operatorTableClass, @PolyNull T
defaultOperatorTable);

I believe the following would be better:

  /**
   * Returns connection operator table or defaultOperatorTable if missing.
   * @param defaultOperatorTable default operator table
   * @param <T> the type of the operator table
   * @return connection operator table or defaultOperatorTable if missing
   * @see CalciteConnectionProperty#FUN
   **/
  <T> @PolyNull T fun(Class<T> operatorTableClass, @PolyNull T
defaultOperatorTable);

In other words, I agree PolyNull is slightly more complex than regular
Nullable.
However, I believe javadoc update can happen after the change is merged,
and sometimes
the behavior is much more important than nullability

For instance, CalciteConnectionConfigImpl has the following implementation:

    if (fun == null || fun.equals("") || fun.equals("standard")) {
      return defaultOperatorTable;
    }

Should those "empty string" and "standard" be documented in the interface?
I would say it is not the most obvious thing when one looks into the
following javadoc

  /** Returns the value of {@link CalciteConnectionProperty#FUN}.
   * If {@code defaultOperatorTable} is not null, the return will never be
null. */

Julian>* Remove Wrapper.unwrapOrThrow.

The added Wrapper.unwrapOrThrow is API.Status.INTERNAL. What is the problem
with adding such a method?

unwrapOrThrow is used 16 times, and allows to have concise code and produce
meaningful exception messages at the same time.
I can easily inline the methods, however, it would make all the 16 usages
more verbose.

For instance the following code
      final String timeZone = filter.getCluster().getPlanner().getContext()
          .unwrapOrThrow(CalciteConnectionConfig.class).timeZone();

would become
      Wrapper wrapper = filter.getCluster().getPlanner().getContext();
      final String timeZone =
requireNonNull(wrapper.unwrap(CalciteConnectionConfig.class),
          () -> "Can't unwrap " + CalciteConnectionConfig.class + " from "
+ wrapper).timeZone();

That is way harder to both read and write.

There are similar helper methods in NonNullableAccessors which is used 20
times, and SqlNonNullableAccessors is used 39 times.

If you think you have a much better solution, I do not hold you. Please go
ahead and implement it.

In the mean-time, I suggest we merge the code with unwrapOrThrow, etc, etc,
and later you (or someone else) can suggest and implement your own
way provided it is even better than the current one.

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Julian Hyde <jh...@gmail.com>.
Like Stamatis, I have quickly gone over the changes, and I find that the code is not significantly different, and overall it is an improvement. Thanks for making this change, Vladimir.

PolyNull is difficult to understand, and I don’t think we should require developers to understand it. Therefore, when it is used in the signature of public methods (i.e. methods that have javadoc), the javadoc should explain the nullness behavior in words. For example, in CalciteConnectionConfig:

  /** Returns the value of {@link CalciteConnectionProperty#FUN}.
   * If {@code defaultOperatorTable} is not null, the return will never be null. */
  <T> @PolyNull T fun(Class<T> operatorTableClass, @PolyNull T defaultOperatorTable);

A few things need to be fixed:
 * Remove the changes to 2020-10-06-release-1.26.0.md 
 * Add descriptions to javadoc for PolyNull.
 * Remove Wrapper.unwrapOrThrow. We did not reach consensus in
   https://issues.apache.org/jira/browse/CALCITE-4311 <https://issues.apache.org/jira/browse/CALCITE-4311> and Vladimir
   said that he would proceed without it.

+1 to merge when those things are fixed.

Julian


> On Nov 23, 2020, at 1:31 AM, Stamatis Zampetakis <za...@gmail.com> wrote:
> 
> Hello,
> 
> I went fast over the diff and overall the changes do not make the code that
> different than before.
> Moreover, the refactoring for preparing for nullable annotations seems
> beneficial in any case (finalizing instance variables, adding
> requireNonNull for enforcing fail fast behavior, removing usages of raw
> types, etc.).
> The checker can find some bugs so I think it will be nice to have an extra
> layer of checks in the CI.
> The 10min might be a bit long compared with our unit tests that are around
> ~5min but we can give it a try and if we are not happy with it we can
> possibly modify the frequency that these checks are performed.
> 
> Although, I didn't do a line-by-line review, I am rather positive with
> merging this to master.
> 
> Best,
> Stamatis
> 
> On Fri, Nov 20, 2020 at 1:23 PM Vladimir Sitnikov <
> sitnikov.vladimir@gmail.com> wrote:
> 
>> Fan Liya>However, I think it would be nice if we could document it
>> somewhere, so
>> Fan Liya>other developers will follow the convention without being
>> confused.
>> 
>> The PR does include documentation update:
>> 
>> https://github.com/vlsi/calcite/blob/checkerframework/site/develop/index.md#null-safety
>> 
>> Fan Liya>I mean that a large PR requires a long time to review.
>> 
>> Do you commit to reviewing the change?
>> How much time would it take you to review the change?
>> 
>> As I said, the PR was there for several months, and I do NOT believe adding
>> two or six months more would make a difference.
>> 
>> Fan Liya>During this time, you have to resolve conflicts many times, which
>> is a
>> Fan Liya>large amount of effort.
>> 
>> The added nullability annotations have minimal impact on Calcite API and
>> its behavior.
>> All the existing tests pass, and I change only a few test files.
>> 
>> Of course, I do not like the idea of rebasing PR again and again.
>> 
>> Just as an example, Calcite 1.25 was released with a significant
>> backward-incompatible change:
>> 
>> 998cd83eb 2020-07-08 Julian Hyde [CALCITE-3923] Refactor how planner rules
>> are parameterized
>> 198 files changed, 9018 insertions(+), 5503 deletions(-)
>> 
>> The refactor alone was bigger than nullability PR if we compare the number
>> of added and removed lines.
>> Just to remind, "prepare for nullability" is (757 files changed, 7487
>> insertions(+), 3827 deletions(-))
>> 
>> Vladimir
>> 


Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hello,

I went fast over the diff and overall the changes do not make the code that
different than before.
Moreover, the refactoring for preparing for nullable annotations seems
beneficial in any case (finalizing instance variables, adding
requireNonNull for enforcing fail fast behavior, removing usages of raw
types, etc.).
The checker can find some bugs so I think it will be nice to have an extra
layer of checks in the CI.
The 10min might be a bit long compared with our unit tests that are around
~5min but we can give it a try and if we are not happy with it we can
possibly modify the frequency that these checks are performed.

Although, I didn't do a line-by-line review, I am rather positive with
merging this to master.

Best,
Stamatis

On Fri, Nov 20, 2020 at 1:23 PM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> Fan Liya>However, I think it would be nice if we could document it
> somewhere, so
> Fan Liya>other developers will follow the convention without being
> confused.
>
> The PR does include documentation update:
>
> https://github.com/vlsi/calcite/blob/checkerframework/site/develop/index.md#null-safety
>
> Fan Liya>I mean that a large PR requires a long time to review.
>
> Do you commit to reviewing the change?
> How much time would it take you to review the change?
>
> As I said, the PR was there for several months, and I do NOT believe adding
> two or six months more would make a difference.
>
> Fan Liya>During this time, you have to resolve conflicts many times, which
> is a
> Fan Liya>large amount of effort.
>
> The added nullability annotations have minimal impact on Calcite API and
> its behavior.
> All the existing tests pass, and I change only a few test files.
>
> Of course, I do not like the idea of rebasing PR again and again.
>
> Just as an example, Calcite 1.25 was released with a significant
> backward-incompatible change:
>
> 998cd83eb 2020-07-08 Julian Hyde [CALCITE-3923] Refactor how planner rules
> are parameterized
>  198 files changed, 9018 insertions(+), 5503 deletions(-)
>
> The refactor alone was bigger than nullability PR if we compare the number
> of added and removed lines.
> Just to remind, "prepare for nullability" is (757 files changed, 7487
> insertions(+), 3827 deletions(-))
>
> Vladimir
>

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Fan Liya>However, I think it would be nice if we could document it
somewhere, so
Fan Liya>other developers will follow the convention without being confused.

The PR does include documentation update:
https://github.com/vlsi/calcite/blob/checkerframework/site/develop/index.md#null-safety

Fan Liya>I mean that a large PR requires a long time to review.

Do you commit to reviewing the change?
How much time would it take you to review the change?

As I said, the PR was there for several months, and I do NOT believe adding
two or six months more would make a difference.

Fan Liya>During this time, you have to resolve conflicts many times, which
is a
Fan Liya>large amount of effort.

The added nullability annotations have minimal impact on Calcite API and
its behavior.
All the existing tests pass, and I change only a few test files.

Of course, I do not like the idea of rebasing PR again and again.

Just as an example, Calcite 1.25 was released with a significant
backward-incompatible change:

998cd83eb 2020-07-08 Julian Hyde [CALCITE-3923] Refactor how planner rules
are parameterized
 198 files changed, 9018 insertions(+), 5503 deletions(-)

The refactor alone was bigger than nullability PR if we compare the number
of added and removed lines.
Just to remind, "prepare for nullability" is (757 files changed, 7487
insertions(+), 3827 deletions(-))

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Fan Liya <li...@gmail.com>.
Hi Vladimir,

Thanks for your effort. I still believe in the value of nullable
annotations.

Please see my comments below

> I'm puzzled. First you say "annotations are beneficial for code
> readability", then
> you proceed with "annotations does not make the code more readable".

For ordinary Java code, variables are nullable by default, and only a few
variables are non-nullable (I think).
But Calcite is different, as most variables are non-nullable (as you have
indicated).
If so, it would be reasonable to annotate nullable variables only.
However, I think it would be nice if we could document it somewhere, so
other developers will follow the convention without being confused.

> It won't be difficult to merge. Git works just fine.

I mean that a large PR requires a long time to review.
During this time, you have to resolve conflicts many times, which is a
large amount of effort.
This is not a problem, of course, if you do not mind it.

Best,
Liya Fan


On Fri, Nov 20, 2020 at 7:35 AM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> Liya Fan>is it possible to break the PR into a number of small ones?
>
> I can split the commit in two for review-only purposes: a commit that
> refactors the code, and the second commit that adds @Nullable annotations.
> Here you go: https://github.com/apache/calcite/pull/2273 (see
> [CALCITE-4199] Refactor: prepare code for nullability annotations)
> The "refactor" commit is still big though (757 files changed, 7487
> insertions(+), 3827 deletions(-))
>
> ---
>
> I tried logging findings as individual issues under CALCITE-4199, however,
> that resulted in quite an unfriendly "stop doing that" feedback from Julian
> (see [1]).
> * Nobody expressed "please create more of these"
> * It takes me time to create each ticket
> * That is why I stopped creating tickets and now I perform the refactoring
> as I add nullability annotations
>
> I would be glad if Julian admits the nullness checker finds useful bugs
> (e.g. see npeTest in [2], [3], etc).
> It would be great if Julian refrains from false allegations against me
> (e.g. "Vladimir has been trying to fix things that aren’t broken". "He did
> not leave time for anyone to review", etc, etc).
>
> I have provided countless samples when the nullness checker detects true
> bugs, and the detected bugs are both in old and in the recently added code.
> For instance, the recent discussion on SEARCH vs nullness (see [4]) was
> triggered by myself trying to rebase nullness annotations on top of Calcite
> 1.26.
> Search/Sarg failed nullness in a non-trivial way, and I asked Julian to fix
> the code.
>
> [1]:
>
> https://lists.apache.org/thread.html/rf2ee66ad385624a35829a4ce092ff13797e812e03abb80cb45b33421%40%3Cdev.calcite.apache.org%3E
> [2]:
>
> https://lists.apache.org/thread.html/rf863922cf4418f994a123ecbbb624a74e6e375aa421a4a7812ce52b0%40%3Cdev.calcite.apache.org%3E
> [3]: https://issues.apache.org/jira/browse/CALCITE-4234
> [4]:
>
> https://lists.apache.org/thread.html/rce6e84d0e0cd5a227d64c9f670b43650833f43e0cbccb80e04f5e0cf%40%3Cdev.calcite.apache.org%3E
>
> Vladimir
>

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Liya Fan>is it possible to break the PR into a number of small ones?

I can split the commit in two for review-only purposes: a commit that
refactors the code, and the second commit that adds @Nullable annotations.
Here you go: https://github.com/apache/calcite/pull/2273 (see
[CALCITE-4199] Refactor: prepare code for nullability annotations)
The "refactor" commit is still big though (757 files changed, 7487
insertions(+), 3827 deletions(-))

---

I tried logging findings as individual issues under CALCITE-4199, however,
that resulted in quite an unfriendly "stop doing that" feedback from Julian
(see [1]).
* Nobody expressed "please create more of these"
* It takes me time to create each ticket
* That is why I stopped creating tickets and now I perform the refactoring
as I add nullability annotations

I would be glad if Julian admits the nullness checker finds useful bugs
(e.g. see npeTest in [2], [3], etc).
It would be great if Julian refrains from false allegations against me
(e.g. "Vladimir has been trying to fix things that aren’t broken". "He did
not leave time for anyone to review", etc, etc).

I have provided countless samples when the nullness checker detects true
bugs, and the detected bugs are both in old and in the recently added code.
For instance, the recent discussion on SEARCH vs nullness (see [4]) was
triggered by myself trying to rebase nullness annotations on top of Calcite
1.26.
Search/Sarg failed nullness in a non-trivial way, and I asked Julian to fix
the code.

[1]:
https://lists.apache.org/thread.html/rf2ee66ad385624a35829a4ce092ff13797e812e03abb80cb45b33421%40%3Cdev.calcite.apache.org%3E
[2]:
https://lists.apache.org/thread.html/rf863922cf4418f994a123ecbbb624a74e6e375aa421a4a7812ce52b0%40%3Cdev.calcite.apache.org%3E
[3]: https://issues.apache.org/jira/browse/CALCITE-4234
[4]:
https://lists.apache.org/thread.html/rce6e84d0e0cd5a227d64c9f670b43650833f43e0cbccb80e04f5e0cf%40%3Cdev.calcite.apache.org%3E

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Liya Fan>I believe annotations for
Liya Fan>nullability are beneficial for code readability and reasoning.

Thanks for the feedback.

Liya Fan>Intuitively, the "@Nullable" annotation does not provide much
information,
Liya Fan>and it does not make the code more readable (IMO).

I'm puzzled. First you say "annotations are beneficial for code
readability", then
you proceed with "annotations does not make the code more readable".

Liya Fan>In addition, is it possible to break the PR into a number of small
ones?

I'm open to suggestions. I believe the current set of commits is the best
way to merge it.

Liya Fan>A large PR is difficult to review and difficult to merge.

It won't be difficult to merge. Git works just fine.

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Fan Liya <li...@gmail.com>.
  Hi Vladimir,

Thanks for bringing this up. In general, I believe annotations for
nullability are beneficial for code readability and reasoning.
However, after reading the PR, I am having the same feelings as Nikolay.

Intuitively, the "@Nullable" annotation does not provide much information,
and it does not make the code more readable (IMO).

In addition, is it possible to break the PR into a number of small ones? A
large PR is difficult to review and difficult to merge.

Best,
Liya Fan

On Thu, Nov 19, 2020 at 5:38 AM Julian Hyde <jh...@apache.org> wrote:

> Regardless of how it got there, the change to
> 2020-10-06-release-1.26.0.md is currently in the change that you are
> proposing to commit (see the top of
> https://patch-diff.githubusercontent.com/raw/apache/calcite/pull/2268.patch
> ),
> and it shouldn't be.
>
> Julian
>
> On Wed, Nov 18, 2020 at 1:12 PM Vladimir Sitnikov
> <si...@gmail.com> wrote:
> >
> > Julian>Please remove the change you made to
> > Julian>site/_posts/2020-10-06-release-1.26.0.md in this PR
> >
> > Julian, I rebase checkerframework branch from time to time, and the
> commit
> > you mention was on master branch.
> >
> > I'm all ears if you have questions or suggestions regarding the checker
> > framework.
> >
> > Vladimir
>

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Julian Hyde <jh...@apache.org>.
Regardless of how it got there, the change to
2020-10-06-release-1.26.0.md is currently in the change that you are
proposing to commit (see the top of
https://patch-diff.githubusercontent.com/raw/apache/calcite/pull/2268.patch),
and it shouldn't be.

Julian

On Wed, Nov 18, 2020 at 1:12 PM Vladimir Sitnikov
<si...@gmail.com> wrote:
>
> Julian>Please remove the change you made to
> Julian>site/_posts/2020-10-06-release-1.26.0.md in this PR
>
> Julian, I rebase checkerframework branch from time to time, and the commit
> you mention was on master branch.
>
> I'm all ears if you have questions or suggestions regarding the checker
> framework.
>
> Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Julian>Please remove the change you made to
Julian>site/_posts/2020-10-06-release-1.26.0.md in this PR

Julian, I rebase checkerframework branch from time to time, and the commit
you mention was on master branch.

I'm all ears if you have questions or suggestions regarding the checker
framework.

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Julian Hyde <jh...@apache.org>.
Vladimir,

Please remove the change you made to
site/_posts/2020-10-06-release-1.26.0.md in this PR. It is not related
to CALCITE-4199.

Julian

On Wed, Nov 18, 2020 at 9:53 AM Vladimir Sitnikov
<si...@gmail.com> wrote:
>
> Here's a recent issue identified by the checker framework. The bug was
> committed in CALCITE-4251 a couple of hours ago.
> https://github.com/apache/calcite/commit/f3c173c9220a83be0dfa0c80a39b015f384ffcd1#diff-639d388c38e469c9329bdf0456d429be21a5031ea6d59188e8abb7e4173e5e4aR699
>
> org/apache/calcite/rel/rules/LoptMultiJoin.java:723: error:
> [dereference.of.nullable] dereference of possibly-null reference colOrigin
>       if (colOrigin != null || !*colOrigin.isDerived*()) {
>                                * ^*
>
> It means colOrigin.isDerived() would fail with NPE in case colOrigin is null
> .
>
> Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Here's a recent issue identified by the checker framework. The bug was
committed in CALCITE-4251 a couple of hours ago.
https://github.com/apache/calcite/commit/f3c173c9220a83be0dfa0c80a39b015f384ffcd1#diff-639d388c38e469c9329bdf0456d429be21a5031ea6d59188e8abb7e4173e5e4aR699

org/apache/calcite/rel/rules/LoptMultiJoin.java:723: error:
[dereference.of.nullable] dereference of possibly-null reference colOrigin
      if (colOrigin != null || !*colOrigin.isDerived*()) {
                               * ^*

It means colOrigin.isDerived() would fail with NPE in case colOrigin is null
.

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Nikolay> annotation that doesn’t add any value to the code is a bad thing

It is really sad to hear "doesn't add any value".
Nullability annotations detect bugs, they make coding easier (IDE
understands nullability),
it makes the intention behind method signatures easier to understand.

Nikolay>Java is known as a very verbose programming language, so making it
even more verbose

Thanks for bringing this point.
I had exactly the same question/feeling at the very beginning.
In practice, it turned out that null-heavy code is harder to work with in
Java.
In other words, if the code has limited use of nulls, then it is easier to
read and reason about.
At the same time, the number of @Nullable entries is low.

For instance, 865 files have at least one occurrence of @Nullable.
304 of them have exactly one line with @Nullable
142 files use @Nullable just two times
In total, 768 files (~90%) have less than 10 lines with @Nullable.
That is yet another way to highlight that @Nullable does not make the code
much more verbose.

a) Java's verbosity is one of its powers. Once you open a declaration, you
see what it means
Let me quote a sample constructor from Sort class:

  /**
   * Creates a Sort.
   *
   * @param cluster   Cluster this relational expression belongs to
   * @param traits    Traits
   * @param child     input relational expression
   * @param collation array of sort specifications
   * @param offset    Expression for number of rows to discard before
returning
   *                  first row
   * @param fetch     Expression for number of rows to fetch
   */
  protected Sort(
      RelOptCluster cluster,
      RelTraitSet traits,
      RelNode child,
      RelCollation collation,
      @Nullable RexNode offset,
      @Nullable RexNode fetch) {
    super(cluster, traits, child);
    this.collation = collation;
    this.offset = offset;
    this.fetch = fetch;

@Nullable annotations make it immediately visible that offset and fetch can
be nullable.
Of course, it would be great if we could document special values (e.g. what
fetch=null means),
however, we have no way to enforce a policy to make all javadocs meaningful.
In practice, we do not even enforce javadocs since global enforcement would
result in lots of time spent on writing "Returns x. @return x"
documentation for "int getX()" methods.

b) Nullable annotations enable IDE to better understand the code and
highlight defects before you even compile the code.
For instance, if you pass null argument to a non-nullable parameter, then
IDE would show a warning.

c) Nullable annotations improve integrations with languages that have
stronger null safety.
As you might know, Kotlin, Swift, Dart, and others bake null-safety right
into the type system of the language.
If we keep the API null-safe (annotate nullable types), then the API would
be way easier for the consumers.

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Nikolay>Do we have some code style convention that requires it?

Even existing code convention is to explain if parameter can be nullable or
not.
Often it is explained in javadoc or requireNotNull(...) call.

The PR enforces that convention. For instance, it forbids (build would
fail) to return nulls if the return type is non-nullable.

Nikolay>Every java reference variable can be null.

I'm afraid you are not quite right.
If every reference is nullable, then the code must be "if (null)" all over
the place.

Calcite uses Guava's Immutable collections heavily, and the collections
never permit nulls.
For instance, JoinRel has 5 fields non-nullable, so no annotations needed.

public abstract class Join extends BiRel implements Hintable {
  protected final RexNode condition;
  protected final ImmutableSet<CorrelationId> variablesSet;
  protected final ImmutableList<RelHint> hints;
  protected final JoinRelType joinType;
  protected final JoinInfo joinInfo;

Here are the most @nullable-heavy files.

"number of lines with @Nullable" "total lines" "file"

95 6958 src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
70 6248 src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java
60 3675 src/main/java/org/apache/calcite/tools/RelBuilder.java
59 4573 src/main/java/org/apache/calcite/plan/RelOptUtil.java
43 1144 src/main/java/org/apache/calcite/adapter/enumerable/EnumUtils.java
42 913 src/main/java/org/apache/calcite/adapter/clone/ArrayTable.java
40 276 src/main/java/org/apache/calcite/sql/SqlSelect.java
38 419 src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java
37 821 src/main/java/org/apache/calcite/runtime/JsonFunctions.java
36 559 src/main/java/org/apache/calcite/rel/metadata/RelMdCollation.java
35 696 src/main/java/org/apache/calcite/rel/metadata/BuiltInMetadata.java
33 3013 src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java
33 1602 src/main/java/org/apache/calcite/sql/SqlDialect.java
33 1419
src/main/java/org/apache/calcite/rel/rules/materialize/MaterializedViewRule.java
32 881 src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java
31 547 src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java
30 1357 src/main/java/org/apache/calcite/sql/validate/SqlValidatorUtil.java

I would say the null density is very far from your "every java reference".
So the convention is to assume nulls are not permitted, and mark nullable
types when nulls are needed.

Nikolay>Can you, please, show the list of the bugs founded

Please find the samples I logged as sub-tickets:
https://issues.apache.org/jira/browse/CALCITE-4199
Note: that is not the full set of issues. My initial plan was to log every
issue, discuss it and resolve it.
However, later I realized the tickets don't get enough attention, so the
plan to log them all and fix them is not really viable.
That is why I kept NPE in many places which otherwise would discussions to
resolve.

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Vladimir.

I’m looked through your PR and find it questionable.
Every java reference variable can be null.
So, is it really necessary to highlight this fact in the code one more time?

Do we have some code style convention that requires it?

Java is known as a very verbose programming language, so making it even more verbose with an annotation that doesn’t add any value to the code is a bad thing.

> * I've provided multiple samples of the bugs identified by the checker. The are SQL-level bugs as well.

Can you, please, show the list of the bugs founded while adding @nullable all over the code?


> 18 нояб. 2020 г., в 02:25, Vladimir Sitnikov <si...@gmail.com> написал(а):
> 
> Hi,
> 
> I've completed @nullable annotations for linq4j and core modules.
> The PR is https://github.com/apache/calcite/pull/2268
> 
> All existing tests pass.
> There might be minor tweaks, however, I expect the change is mergeable.
> 
> if no-one objects within a week (~1 December), I'll assume lazy consensus
> and commit it (see
> https://www.apache.org/foundation/voting.html#LazyConsensus )
> 
> However, do not even think that I use lazy consensus as an escape hatch to
> commit something doubtful.
> On contrary, I truly believe that merging nullness annotations is the right
> thing to do.
> It fixes certain types of bugs, and it prevents new NPE.
> The change makes it easier to use Calcite API, and it opens a way to using
> https://github.com/jspecify/jspecify when it is ready.
> 
> 
> Just to recap:
> * The first discussions regarding the checker framework was in April
> * I've provided multiple samples of the bugs identified by the checker. The
> are SQL-level bugs as well.
> * I completed linq4j annotation by 30 August
> * Other projects use the checker framework (e.g. Guava)
> * Kenneth Knowles shared a strongly positive experience with adding
> nullness annotations to Beam
> * Michael Mior was positive regarding the change
> * I did my best to minimize the diff, so I did my best to keep the old
> behavior when possible
> 
> I assume there was more than enough time to review the approach and play
> with the technology (more than 2 months),
> that is why I set lazy consensus timeframe to 1 week.
> 
> The PR touches a lot of lines, so I do not expect somebody would fully
> review the PR. It would be nice to have feedback though.
> The most part of the PR is the addition of the relevant @Nullable
> annotations.
> Of course, sometimes it triggers formatting (due to 100 char line limit)
> 
> As I added nullness verification, I routinely rebased over the new changes
> coming to the master branch.
> The rebase is not hard to do. Of course, there might be conflicts, however,
> the resolution was trivial.
> In my experience, conflict resolution is way easier to do when nullability
> annotations are added as a single commit.
> 
> Vladimir


Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Hi,

I've completed @nullable annotations for linq4j and core modules.
The PR is https://github.com/apache/calcite/pull/2268

All existing tests pass.
There might be minor tweaks, however, I expect the change is mergeable.

if no-one objects within a week (~1 December), I'll assume lazy consensus
and commit it (see
https://www.apache.org/foundation/voting.html#LazyConsensus )

However, do not even think that I use lazy consensus as an escape hatch to
commit something doubtful.
On contrary, I truly believe that merging nullness annotations is the right
thing to do.
It fixes certain types of bugs, and it prevents new NPE.
The change makes it easier to use Calcite API, and it opens a way to using
https://github.com/jspecify/jspecify when it is ready.


Just to recap:
* The first discussions regarding the checker framework was in April
* I've provided multiple samples of the bugs identified by the checker. The
are SQL-level bugs as well.
* I completed linq4j annotation by 30 August
* Other projects use the checker framework (e.g. Guava)
* Kenneth Knowles shared a strongly positive experience with adding
nullness annotations to Beam
* Michael Mior was positive regarding the change
* I did my best to minimize the diff, so I did my best to keep the old
behavior when possible

I assume there was more than enough time to review the approach and play
with the technology (more than 2 months),
that is why I set lazy consensus timeframe to 1 week.

The PR touches a lot of lines, so I do not expect somebody would fully
review the PR. It would be nice to have feedback though.
The most part of the PR is the addition of the relevant @Nullable
annotations.
Of course, sometimes it triggers formatting (due to 100 char line limit)

As I added nullness verification, I routinely rebased over the new changes
coming to the master branch.
The rebase is not hard to do. Of course, there might be conflicts, however,
the resolution was trivial.
In my experience, conflict resolution is way easier to do when nullability
annotations are added as a single commit.

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
>Does it require installing something in the developer's machine?

It downloads everything it needs from Central (AFAIK it uses no native
dependencies).
The verifier requires Java 11+.

Vladimir

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Fan Liya <li...@gmail.com>.
Sounds great! Thanks for your effort.
Does it require installing something in the developer's machine?

Best,
Liya Fan

On Sun, Sep 27, 2020 at 7:10 PM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> Here's a sample issue:
>
> core/src/main/java/org/apache/calcite/sql/SqlBinaryOperator.java:161:
> error: [dereference.of.nullable] dereference of possibly-null reference
> call.getOperandLiteralValue(1, BigDecimal.class)
>           switch (call.getOperandLiteralValue(1,
> BigDecimal.class).signum()) {
>                                              ^
> The code in question is
>
> @Override public SqlMonotonicity getMonotonicity(SqlOperatorBinding call) {
>   if (getName().equals("/")) {
>     final SqlMonotonicity mono0 = call.getOperandMonotonicity(0);
>     final SqlMonotonicity mono1 = call.getOperandMonotonicity(1);
>     if (mono1 == SqlMonotonicity.CONSTANT) {
>       if (call.isOperandLiteral(1, false)) {
>         switch (call.getOperandLiteralValue(1, BigDecimal.class).signum())
> {
>         case -1:
>
>
> The error message suggests that the test for 1/null is missing, and it
> turns out it is a true bug:
>
> @Test void testMonotonic() {
>   sql("select stream 1/null from orders")
>       .monotonic(SqlMonotonicity.CONSTANT);
>
>
> java.lang.NullPointerException
> at
>
> org.apache.calcite.sql.SqlBinaryOperator.getMonotonicity(SqlBinaryOperator.java:161)
> at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:193)
> at
>
> org.apache.calcite.sql.validate.SelectScope.getMonotonicity(SelectScope.java:159)
> at
>
> org.apache.calcite.sql.validate.SelectNamespace.getMonotonicity(SelectNamespace.java:76)
> at
>
> org.apache.calcite.sql.test.AbstractSqlTester.checkMonotonic(AbstractSqlTester.java:497)
> at
>
> org.apache.calcite.test.SqlValidatorTestCase$Sql.monotonic(SqlValidatorTestCase.java:373)
>
> Vladimir
>

Re: [DISCUSS][CALCITE-4199] Ensure nullness verification via checkerframework

Posted by Vladimir Sitnikov <si...@gmail.com>.
Here's a sample issue:

core/src/main/java/org/apache/calcite/sql/SqlBinaryOperator.java:161:
error: [dereference.of.nullable] dereference of possibly-null reference
call.getOperandLiteralValue(1, BigDecimal.class)
          switch (call.getOperandLiteralValue(1,
BigDecimal.class).signum()) {
                                             ^
The code in question is

@Override public SqlMonotonicity getMonotonicity(SqlOperatorBinding call) {
  if (getName().equals("/")) {
    final SqlMonotonicity mono0 = call.getOperandMonotonicity(0);
    final SqlMonotonicity mono1 = call.getOperandMonotonicity(1);
    if (mono1 == SqlMonotonicity.CONSTANT) {
      if (call.isOperandLiteral(1, false)) {
        switch (call.getOperandLiteralValue(1, BigDecimal.class).signum()) {
        case -1:


The error message suggests that the test for 1/null is missing, and it
turns out it is a true bug:

@Test void testMonotonic() {
  sql("select stream 1/null from orders")
      .monotonic(SqlMonotonicity.CONSTANT);


java.lang.NullPointerException
at
org.apache.calcite.sql.SqlBinaryOperator.getMonotonicity(SqlBinaryOperator.java:161)
at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:193)
at
org.apache.calcite.sql.validate.SelectScope.getMonotonicity(SelectScope.java:159)
at
org.apache.calcite.sql.validate.SelectNamespace.getMonotonicity(SelectNamespace.java:76)
at
org.apache.calcite.sql.test.AbstractSqlTester.checkMonotonic(AbstractSqlTester.java:497)
at
org.apache.calcite.test.SqlValidatorTestCase$Sql.monotonic(SqlValidatorTestCase.java:373)

Vladimir