You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Reuven Lax <re...@google.com> on 2021/03/02 08:04:48 UTC

Re: Null checking in Beam

I'm running into a problem with this check. I added a protocol-buffer file
to a module (google-cloud-platform) that previously did have any protobuf
files in it. The generated files contain lines that violate this null
checker, so they won't compile. I can't annotate the files, because they
are codegen files. I tried adding the package to spotbugs-filter.xml, but
that didn't help.

Any suggestions?

Reuven

On Fri, Jan 22, 2021 at 10:38 AM Brian Hulette <bh...@google.com> wrote:

>
>
> On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Hi,
>>
>> I'll give my two cents here.
>>
>> I'm not 100% sure that the 1-5% of bugs are as severe as other types of
>> bugs. Yes, throwing NPEs at user is not very polite. On the other hand,
>> many of these actually boil down to user errors. Then we might ask what a
>> correct solution would be. If we manage to figure out what the actual
>> problem is and tell user what specifically is missing or going wrong, that
>> would be just awesome. On the other hand, if a tool used for avoiding
>> "unexpected" NPEs forces us to code
>>
>>    Object value = Objects.requireNonNull(myNullableObject); // or similar
>> using Preconditions
>>    value.doStuff();
>>
>> instead of just
>>
>>   myNullableObject.doStuff()
>>
>> what we actually did, is a) made a framework happy, and b) changed a line
>> at which NPE is thrown by 1 (and yes, internally prevented JVM from thrown
>> SIGSEGV at itself, but that is deeply technical thing). Nothing changed
>> semantically, from user perspective.
>>
> I'd argue there's value in asking Beam developers to make that change. It
> makes us acknowledge that there's a possibility myNullableObject is null.
> If myNullableObject being null is something relevant to the user we would
> likely want to wrap it in some other exception or provide a more useful
> message than just NPE(!!).
>
>>
>> Now, given that the framework significantly rises compile time (due to
>> all the checks), causes many "bugs" being reported by static code analysis
>> tools (because the framework adds @Nonnull default annotations everywhere,
>> even when Beam's code actually counts with nullability of a field), and
>> given how much we currently suppress these checks ($ git grep BEAM-10402 |
>> wc -l -> 1981), I'd say this deserves a deeper discussion.
>>
> The reason there are so many suppressions is that fixing all the errors is
> a monumental task. Rather than addressing them all, Kenn programmatically
> added suppressions for classes that failed the checks (
> https://github.com/apache/beam/pull/13261). This allowed us to start
> running the checker on the code that passes it while the errors are fixed.
>
>>  Jan
>>
>>
>> On 1/20/21 10:48 PM, Kenneth Knowles wrote:
>>
>> Yes, completely sound nullability checking has been added to the project
>> via checkerframework, based on a large number of NPE bugs (1-5% depending
>> on how you search, but many other bugs likely attributable to
>> nullness-based design errors) which are extra embarrassing because NPEs
>> have were essentially solved, even in practice for Java, well before Beam
>> existed.
>>
>> Checker framework is a pluggable type system analysis with some amount of
>> control flow sensitivity. Every value has a type that is either nullable or
>> not, and certain control structures (like checking for null) can alter the
>> type inside a scope. The best way to think about it is to consider every
>> value in the program as either nullable or not, much like you think of
>> every value as either a string or not, and to view method calls as
>> inherently stateful/nondetermistic. This can be challenging in esoteric
>> cases, but usually makes the overall code health better anyhow.
>>
>> Your example illustrates how problematic the design of the Java language
>> is: the analysis cannot assume that `getDescription` is a pure function,
>> and neither should you. Even if it is aware of boolean-short-circuit it
>> would not be sound to accept this code. There is an annotation for this in
>> the cases where it is true (like proto-generate getters):
>> https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html
>>
>> The refactor for cases like this is trivial so there isn't a lot of value
>> to thinking too hard about it.
>>
>> if (statusCode.equals(Code.INVALID_ARGUMENT)
>>   @Nullable String desc = statusCode.toStatus().getDescription();
>>   if (desc != null && desc.contains("finalized")) {
>>     return false;
>>   }
>> }
>>
>> To a casual eye, this may look like a noop change. To the analysis it
>> makes all the difference. And IMO this difference is real. Humans may
>> assume it is a noop and humans would be wrong. So many times when you
>> think/expect/hope that `getXYZ()` is a trivial getter method you later
>> learn that it was tweaked for some odd reason. I believe this code change
>> makes the code better. Suppressing these errors should be exceptionally
>> rare, and never in normal code. It is far better to improve your code than
>> to suppress the issue.
>>
>> It would be very cool for the proto compiler to annotate enough that
>> new-and-improved type checkers could make things more convenient.
>>
>> Kenn
>>
>> On Mon, Jan 11, 2021 at 8:53 PM Reuven Lax <re...@google.com> wrote:
>>
>>> I can use that trick. However I'm surprised that the check appears to be
>>> so simplistic.
>>>
>>> For example, the following code triggers the check on
>>> getDescription().contains(), since getDescription returns a Nullable
>>> string. However even a simplistic analysis should realize that
>>> getDescription() was checked for null first! I have a dozen or so cases
>>> like this, and I question how useful such a simplistic check it will be.
>>>
>>> if (statusCode.equals(Code.INVALID_ARGUMENT)    && statusCode.toStatus().getDescription() != null    && statusCode.toStatus().getDescription().contains("finalized")) {  return false;
>>> }
>>>
>>>
>>> On Mon, Jan 11, 2021 at 8:32 PM Boyuan Zhang <bo...@google.com> wrote:
>>>
>>>> Yeah it seems like the checker is enabled:
>>>> https://issues.apache.org/jira/browse/BEAM-10402. I used
>>>> @SuppressWarnings({"nullness" )}) to suppress the error when I think it's
>>>> not really a concern.
>>>>
>>>> On Mon, Jan 11, 2021 at 8:28 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Has extra Nullable checking been enabled in the Beam project? I have a
>>>>> PR that was on hold for several months, and I'm struggling now with compile
>>>>> failing to complaints about assigning something that is nullable to
>>>>> something that is not nullable. Even when the immediate control flow makes
>>>>> it absolutely impossible for the variable to be null.
>>>>>
>>>>> Has something changed here?
>>>>>
>>>>> Reuven
>>>>>
>>>>

Re: Null checking in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

I agree that there are _some_ added annotations at _some_, that are 
useful - most notably @NonNull on method arguments, possibly return 
values. Adding @NonNull into exception type being thrown seems awkward. 
The @UnknownKeyFor probably should not be there, as it brings no value. 
Did we raise the issue with the checkerframework? It seems to me, that 
the biggest problem lies there. It might have two modes of operation - 
after the check it could have a way of specifying which (and where) 
annotations should be in the compiled byte-code and which should be 
removed. Or can we post-process that with some different tool?

  Jan

On 4/5/21 6:03 PM, Kenneth Knowles wrote:
>
> On Thu, Apr 1, 2021 at 9:57 AM Brian Hulette <bhulette@google.com 
> <ma...@google.com>> wrote:
>
>     What value does it add? Is it that it enables them to use
>     checkerframework with our interfaces?
>
>
> Actually if they are also using checkerframework the defaults are the 
> same so it is not usually needed (though some defaults can be 
> changed). Making defaults explicit is most useful for interfacing with 
> other tools with different defaults, such as Spotbugs [1], IDEs [2] 
> [3], or JVM languages with null safety bult-in, etc [4] [5].
>
> Kenn
>
> [1] 
> https://spotbugs.readthedocs.io/en/stable/annotations.html#edu-umd-cs-findbugs-annotations-nullable 
> <https://spotbugs.readthedocs.io/en/stable/annotations.html#edu-umd-cs-findbugs-annotations-nullable>
> [2] 
> https://www.jetbrains.com/help/idea/2021.1/nullable-and-notnull-annotations.html 
> <https://www.jetbrains.com/help/idea/2021.1/nullable-and-notnull-annotations.html>
> [3] https://wiki.eclipse.org/JDT_Core/Null_Analysis 
> <https://wiki.eclipse.org/JDT_Core/Null_Analysis>
> [4] https://kotlinlang.org/docs/null-safety.html 
> <https://kotlinlang.org/docs/null-safety.html>
> [5] 
> https://kotlinlang.org/docs/java-interop.html#null-safety-and-platform-types 
> <https://kotlinlang.org/docs/java-interop.html#null-safety-and-platform-types>
>
>     On Thu, Apr 1, 2021 at 8:54 AM Kenneth Knowles <kenn@apache.org
>     <ma...@apache.org>> wrote:
>
>         Thanks for filing that. Once it is fixed in IntelliJ, the
>         annotations actually add value for downstream users.
>
>         Kenn
>
>         On Thu, Apr 1, 2021 at 1:10 AM Jan Lukavský <je.ik@seznam.cz
>         <ma...@seznam.cz>> wrote:
>
>             Hi,
>
>             I created the issue in JetBrains tracker [1]. I'm still
>             not 100% convinced that it is correct for the checker to
>             actually modify the bytecode. An open questions is - in
>             guava this does not happen. Does guava apply the check on
>             code being released? From what is in this thread is seems
>             to me, that the answer is no.
>
>              Jan
>
>             [1] https://youtrack.jetbrains.com/issue/IDEA-265658
>             <https://youtrack.jetbrains.com/issue/IDEA-265658>
>
>             On 4/1/21 6:15 AM, Kenneth Knowles wrote:
>>             Hi all,
>>
>>             About the IntelliJ automatic method stub issue: I raised
>>             it to the checkerframework list and got a helpful
>>             response:
>>             https://groups.google.com/g/checker-framework-discuss/c/KHQdjF4lesk/m/dJ4u1BBNBgAJ
>>             <https://groups.google.com/g/checker-framework-discuss/c/KHQdjF4lesk/m/dJ4u1BBNBgAJ>
>>
>>             It eventually reached back to Jetbrains and they would
>>             appreciate a detailed report with steps to reproduce,
>>             preferably a sample project. Would you - Jan or Ismaël or
>>             Reuven - provide them with this issue report? It sounds
>>             like Jan you have an example ready to go.
>>
>>             Kenn
>>
>>             On Mon, Mar 15, 2021 at 1:29 PM Jan Lukavský
>>             <je.ik@seznam.cz <ma...@seznam.cz>> wrote:
>>
>>                 Yes, annotations that we add to the code base on
>>                 purpose (like @Nullable or @SuppressWarnings) are
>>                 aboslutely fine. What is worse is that the checked is
>>                 not only checked, but a code generator. :)
>>
>>                 For example when one wants to implement Coder by
>>                 extending CustomCoder and use auto-generating the
>>                 overridden methods, they look like
>>
>>                 @Override public void encode(Long value, @UnknownKeyFor @NonNull @Initialized OutputStream outStream)throws @UnknownKeyFor@NonNull@Initialized CoderException, @UnknownKeyFor@NonNull@Initialized IOException {
>>
>>                 }
>>
>>                 Which is really ugly. :-(
>>
>>                  Jan
>>
>>                 On 3/15/21 6:37 PM, Ismaël Mejía wrote:
>>>                 +1
>>>
>>>                 Even if I like the strictness for Null checking, I also think that
>>>                 this is adding too much extra time for builds (that I noticed locally
>>>                 when enabled) and also I agree with Jan that the annotations are
>>>                 really an undesired side effect. For reference when you try to auto
>>>                 complete some method signatures on IntelliJ on downstream projects
>>>                 with C-A-v it generates some extra Checkers annotations like @NonNull
>>>                 and others even if the user isn't using them which is not desirable.
>>>
>>>
>>>
>>>                 On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver<kc...@google.com>  <ma...@google.com>  wrote:
>>>>>                 Big +1 for moving this to separate CI job. I really don't like what annotations are currently added to the code we ship. Tools like Idea add these annotations to code they generate when overriding classes and that's very annoying. Users should not be exposed to internal tools like nullability checking.
>>>>                 I was only planning on moving this to a separate CI job. The job would still be release blocking, so the same annotations would still be required.
>>>>
>>>>                 I'm not sure which annotations you are concerned about. There are two annotations involved with nullness checking, @SuppressWarnings and @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't be exposed to users at all. @Nullable is not just for internal tooling, it also provides useful information about our APIs to users. The user should not have to guess whether a method argument etc. can be null or not, and for better or worse, these annotations are the standard way of expressing that in Java.
>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

On Thu, Apr 1, 2021 at 9:57 AM Brian Hulette <bh...@google.com> wrote:

> What value does it add? Is it that it enables them to use checkerframework
> with our interfaces?
>

Actually if they are also using checkerframework the defaults are the same
so it is not usually needed (though some defaults can be changed). Making
defaults explicit is most useful for interfacing with other tools with
different defaults, such as Spotbugs [1], IDEs [2] [3], or JVM languages
with null safety bult-in, etc [4] [5].

Kenn

[1]
https://spotbugs.readthedocs.io/en/stable/annotations.html#edu-umd-cs-findbugs-annotations-nullable
[2]
https://www.jetbrains.com/help/idea/2021.1/nullable-and-notnull-annotations.html
[3] https://wiki.eclipse.org/JDT_Core/Null_Analysis
[4] https://kotlinlang.org/docs/null-safety.html
[5]
https://kotlinlang.org/docs/java-interop.html#null-safety-and-platform-types

On Thu, Apr 1, 2021 at 8:54 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Thanks for filing that. Once it is fixed in IntelliJ, the annotations
>> actually add value for downstream users.
>>
>> Kenn
>>
>> On Thu, Apr 1, 2021 at 1:10 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Hi,
>>>
>>> I created the issue in JetBrains tracker [1]. I'm still not 100%
>>> convinced that it is correct for the checker to actually modify the
>>> bytecode. An open questions is - in guava this does not happen. Does guava
>>> apply the check on code being released? From what is in this thread is
>>> seems to me, that the answer is no.
>>>
>>>  Jan
>>>
>>> [1] https://youtrack.jetbrains.com/issue/IDEA-265658
>>> On 4/1/21 6:15 AM, Kenneth Knowles wrote:
>>>
>>> Hi all,
>>>
>>> About the IntelliJ automatic method stub issue: I raised it to the
>>> checkerframework list and got a helpful response:
>>> https://groups.google.com/g/checker-framework-discuss/c/KHQdjF4lesk/m/dJ4u1BBNBgAJ
>>>
>>> It eventually reached back to Jetbrains and they would appreciate a
>>> detailed report with steps to reproduce, preferably a sample project. Would
>>> you - Jan or Ismaël or Reuven - provide them with this issue report? It
>>> sounds like Jan you have an example ready to go.
>>>
>>> Kenn
>>>
>>> On Mon, Mar 15, 2021 at 1:29 PM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> Yes, annotations that we add to the code base on purpose (like
>>>> @Nullable or @SuppressWarnings) are aboslutely fine. What is worse is that
>>>> the checked is not only checked, but a code generator. :)
>>>>
>>>> For example when one wants to implement Coder by extending CustomCoder
>>>> and use auto-generating the overridden methods, they look like
>>>>
>>>> @Overridepublic void encode(Long value, @UnknownKeyFor @NonNull @Initialized OutputStream outStream) throws @UnknownKeyFor@NonNull@Initialized CoderException, @UnknownKeyFor@NonNull@Initialized IOException {
>>>>
>>>> }
>>>>
>>>> Which is really ugly. :-(
>>>>
>>>>  Jan
>>>>
>>>> On 3/15/21 6:37 PM, Ismaël Mejía wrote:
>>>>
>>>> +1
>>>>
>>>> Even if I like the strictness for Null checking, I also think that
>>>> this is adding too much extra time for builds (that I noticed locally
>>>> when enabled) and also I agree with Jan that the annotations are
>>>> really an undesired side effect. For reference when you try to auto
>>>> complete some method signatures on IntelliJ on downstream projects
>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>> and others even if the user isn't using them which is not desirable.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com> <kc...@google.com> wrote:
>>>>
>>>> Big +1 for moving this to separate CI job. I really don't like what annotations are currently added to the code we ship. Tools like Idea add these annotations to code they generate when overriding classes and that's very annoying. Users should not be exposed to internal tools like nullability checking.
>>>>
>>>> I was only planning on moving this to a separate CI job. The job would still be release blocking, so the same annotations would still be required.
>>>>
>>>> I'm not sure which annotations you are concerned about. There are two annotations involved with nullness checking, @SuppressWarnings and @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't be exposed to users at all. @Nullable is not just for internal tooling, it also provides useful information about our APIs to users. The user should not have to guess whether a method argument etc. can be null or not, and for better or worse, these annotations are the standard way of expressing that in Java.
>>>>
>>>>

Re: Null checking in Beam

Posted by Brian Hulette <bh...@google.com>.

What value does it add? Is it that it enables them to use checkerframework
with our interfaces?

On Thu, Apr 1, 2021 at 8:54 AM Kenneth Knowles <ke...@apache.org> wrote:

> Thanks for filing that. Once it is fixed in IntelliJ, the annotations
> actually add value for downstream users.
>
> Kenn
>
> On Thu, Apr 1, 2021 at 1:10 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Hi,
>>
>> I created the issue in JetBrains tracker [1]. I'm still not 100%
>> convinced that it is correct for the checker to actually modify the
>> bytecode. An open questions is - in guava this does not happen. Does guava
>> apply the check on code being released? From what is in this thread is
>> seems to me, that the answer is no.
>>
>>  Jan
>>
>> [1] https://youtrack.jetbrains.com/issue/IDEA-265658
>> On 4/1/21 6:15 AM, Kenneth Knowles wrote:
>>
>> Hi all,
>>
>> About the IntelliJ automatic method stub issue: I raised it to the
>> checkerframework list and got a helpful response:
>> https://groups.google.com/g/checker-framework-discuss/c/KHQdjF4lesk/m/dJ4u1BBNBgAJ
>>
>> It eventually reached back to Jetbrains and they would appreciate a
>> detailed report with steps to reproduce, preferably a sample project. Would
>> you - Jan or Ismaël or Reuven - provide them with this issue report? It
>> sounds like Jan you have an example ready to go.
>>
>> Kenn
>>
>> On Mon, Mar 15, 2021 at 1:29 PM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Yes, annotations that we add to the code base on purpose (like @Nullable
>>> or @SuppressWarnings) are aboslutely fine. What is worse is that the
>>> checked is not only checked, but a code generator. :)
>>>
>>> For example when one wants to implement Coder by extending CustomCoder
>>> and use auto-generating the overridden methods, they look like
>>>
>>> @Overridepublic void encode(Long value, @UnknownKeyFor @NonNull @Initialized OutputStream outStream) throws @UnknownKeyFor@NonNull@Initialized CoderException, @UnknownKeyFor@NonNull@Initialized IOException {
>>>
>>> }
>>>
>>> Which is really ugly. :-(
>>>
>>>  Jan
>>>
>>> On 3/15/21 6:37 PM, Ismaël Mejía wrote:
>>>
>>> +1
>>>
>>> Even if I like the strictness for Null checking, I also think that
>>> this is adding too much extra time for builds (that I noticed locally
>>> when enabled) and also I agree with Jan that the annotations are
>>> really an undesired side effect. For reference when you try to auto
>>> complete some method signatures on IntelliJ on downstream projects
>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>> and others even if the user isn't using them which is not desirable.
>>>
>>>
>>>
>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com> <kc...@google.com> wrote:
>>>
>>> Big +1 for moving this to separate CI job. I really don't like what annotations are currently added to the code we ship. Tools like Idea add these annotations to code they generate when overriding classes and that's very annoying. Users should not be exposed to internal tools like nullability checking.
>>>
>>> I was only planning on moving this to a separate CI job. The job would still be release blocking, so the same annotations would still be required.
>>>
>>> I'm not sure which annotations you are concerned about. There are two annotations involved with nullness checking, @SuppressWarnings and @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't be exposed to users at all. @Nullable is not just for internal tooling, it also provides useful information about our APIs to users. The user should not have to guess whether a method argument etc. can be null or not, and for better or worse, these annotations are the standard way of expressing that in Java.
>>>
>>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

Thanks for filing that. Once it is fixed in IntelliJ, the annotations
actually add value for downstream users.

Kenn

On Thu, Apr 1, 2021 at 1:10 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hi,
>
> I created the issue in JetBrains tracker [1]. I'm still not 100% convinced
> that it is correct for the checker to actually modify the bytecode. An open
> questions is - in guava this does not happen. Does guava apply the check on
> code being released? From what is in this thread is seems to me, that the
> answer is no.
>
>  Jan
>
> [1] https://youtrack.jetbrains.com/issue/IDEA-265658
> On 4/1/21 6:15 AM, Kenneth Knowles wrote:
>
> Hi all,
>
> About the IntelliJ automatic method stub issue: I raised it to the
> checkerframework list and got a helpful response:
> https://groups.google.com/g/checker-framework-discuss/c/KHQdjF4lesk/m/dJ4u1BBNBgAJ
>
> It eventually reached back to Jetbrains and they would appreciate a
> detailed report with steps to reproduce, preferably a sample project. Would
> you - Jan or Ismaël or Reuven - provide them with this issue report? It
> sounds like Jan you have an example ready to go.
>
> Kenn
>
> On Mon, Mar 15, 2021 at 1:29 PM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Yes, annotations that we add to the code base on purpose (like @Nullable
>> or @SuppressWarnings) are aboslutely fine. What is worse is that the
>> checked is not only checked, but a code generator. :)
>>
>> For example when one wants to implement Coder by extending CustomCoder
>> and use auto-generating the overridden methods, they look like
>>
>> @Overridepublic void encode(Long value, @UnknownKeyFor @NonNull @Initialized OutputStream outStream) throws @UnknownKeyFor@NonNull@Initialized CoderException, @UnknownKeyFor@NonNull@Initialized IOException {
>>
>> }
>>
>> Which is really ugly. :-(
>>
>>  Jan
>>
>> On 3/15/21 6:37 PM, Ismaël Mejía wrote:
>>
>> +1
>>
>> Even if I like the strictness for Null checking, I also think that
>> this is adding too much extra time for builds (that I noticed locally
>> when enabled) and also I agree with Jan that the annotations are
>> really an undesired side effect. For reference when you try to auto
>> complete some method signatures on IntelliJ on downstream projects
>> with C-A-v it generates some extra Checkers annotations like @NonNull
>> and others even if the user isn't using them which is not desirable.
>>
>>
>>
>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com> <kc...@google.com> wrote:
>>
>> Big +1 for moving this to separate CI job. I really don't like what annotations are currently added to the code we ship. Tools like Idea add these annotations to code they generate when overriding classes and that's very annoying. Users should not be exposed to internal tools like nullability checking.
>>
>> I was only planning on moving this to a separate CI job. The job would still be release blocking, so the same annotations would still be required.
>>
>> I'm not sure which annotations you are concerned about. There are two annotations involved with nullness checking, @SuppressWarnings and @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't be exposed to users at all. @Nullable is not just for internal tooling, it also provides useful information about our APIs to users. The user should not have to guess whether a method argument etc. can be null or not, and for better or worse, these annotations are the standard way of expressing that in Java.
>>
>>

Re: Null checking in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

Hi,

I created the issue in JetBrains tracker [1]. I'm still not 100% 
convinced that it is correct for the checker to actually modify the 
bytecode. An open questions is - in guava this does not happen. Does 
guava apply the check on code being released? From what is in this 
thread is seems to me, that the answer is no.

  Jan

[1] https://youtrack.jetbrains.com/issue/IDEA-265658

On 4/1/21 6:15 AM, Kenneth Knowles wrote:
> Hi all,
>
> About the IntelliJ automatic method stub issue: I raised it to the 
> checkerframework list and got a helpful response: 
> https://groups.google.com/g/checker-framework-discuss/c/KHQdjF4lesk/m/dJ4u1BBNBgAJ 
> <https://groups.google.com/g/checker-framework-discuss/c/KHQdjF4lesk/m/dJ4u1BBNBgAJ>
>
> It eventually reached back to Jetbrains and they would appreciate a 
> detailed report with steps to reproduce, preferably a sample project. 
> Would you - Jan or Ismaël or Reuven - provide them with this issue 
> report? It sounds like Jan you have an example ready to go.
>
> Kenn
>
> On Mon, Mar 15, 2021 at 1:29 PM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     Yes, annotations that we add to the code base on purpose (like
>     @Nullable or @SuppressWarnings) are aboslutely fine. What is worse
>     is that the checked is not only checked, but a code generator. :)
>
>     For example when one wants to implement Coder by extending
>     CustomCoder and use auto-generating the overridden methods, they
>     look like
>
>     @Override public void encode(Long value, @UnknownKeyFor @NonNull @Initialized OutputStream outStream)throws @UnknownKeyFor@NonNull@Initialized CoderException, @UnknownKeyFor@NonNull@Initialized IOException {
>
>     }
>
>     Which is really ugly. :-(
>
>      Jan
>
>     On 3/15/21 6:37 PM, Ismaël Mejía wrote:
>>     +1
>>
>>     Even if I like the strictness for Null checking, I also think that
>>     this is adding too much extra time for builds (that I noticed locally
>>     when enabled) and also I agree with Jan that the annotations are
>>     really an undesired side effect. For reference when you try to auto
>>     complete some method signatures on IntelliJ on downstream projects
>>     with C-A-v it generates some extra Checkers annotations like @NonNull
>>     and others even if the user isn't using them which is not desirable.
>>
>>
>>
>>     On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver<kc...@google.com>  <ma...@google.com>  wrote:
>>>>     Big +1 for moving this to separate CI job. I really don't like what annotations are currently added to the code we ship. Tools like Idea add these annotations to code they generate when overriding classes and that's very annoying. Users should not be exposed to internal tools like nullability checking.
>>>     I was only planning on moving this to a separate CI job. The job would still be release blocking, so the same annotations would still be required.
>>>
>>>     I'm not sure which annotations you are concerned about. There are two annotations involved with nullness checking, @SuppressWarnings and @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't be exposed to users at all. @Nullable is not just for internal tooling, it also provides useful information about our APIs to users. The user should not have to guess whether a method argument etc. can be null or not, and for better or worse, these annotations are the standard way of expressing that in Java.
>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

Hi all,

About the IntelliJ automatic method stub issue: I raised it to the
checkerframework list and got a helpful response:
https://groups.google.com/g/checker-framework-discuss/c/KHQdjF4lesk/m/dJ4u1BBNBgAJ

It eventually reached back to Jetbrains and they would appreciate a
detailed report with steps to reproduce, preferably a sample project. Would
you - Jan or Ismaël or Reuven - provide them with this issue report? It
sounds like Jan you have an example ready to go.

Kenn

On Mon, Mar 15, 2021 at 1:29 PM Jan Lukavský <je...@seznam.cz> wrote:

> Yes, annotations that we add to the code base on purpose (like @Nullable
> or @SuppressWarnings) are aboslutely fine. What is worse is that the
> checked is not only checked, but a code generator. :)
>
> For example when one wants to implement Coder by extending CustomCoder and
> use auto-generating the overridden methods, they look like
>
> @Overridepublic void encode(Long value, @UnknownKeyFor @NonNull @Initialized OutputStream outStream) throws @UnknownKeyFor@NonNull@Initialized CoderException, @UnknownKeyFor@NonNull@Initialized IOException {
>
> }
>
> Which is really ugly. :-(
>
>  Jan
>
> On 3/15/21 6:37 PM, Ismaël Mejía wrote:
>
> +1
>
> Even if I like the strictness for Null checking, I also think that
> this is adding too much extra time for builds (that I noticed locally
> when enabled) and also I agree with Jan that the annotations are
> really an undesired side effect. For reference when you try to auto
> complete some method signatures on IntelliJ on downstream projects
> with C-A-v it generates some extra Checkers annotations like @NonNull
> and others even if the user isn't using them which is not desirable.
>
>
>
> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com> <kc...@google.com> wrote:
>
> Big +1 for moving this to separate CI job. I really don't like what annotations are currently added to the code we ship. Tools like Idea add these annotations to code they generate when overriding classes and that's very annoying. Users should not be exposed to internal tools like nullability checking.
>
> I was only planning on moving this to a separate CI job. The job would still be release blocking, so the same annotations would still be required.
>
> I'm not sure which annotations you are concerned about. There are two annotations involved with nullness checking, @SuppressWarnings and @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't be exposed to users at all. @Nullable is not just for internal tooling, it also provides useful information about our APIs to users. The user should not have to guess whether a method argument etc. can be null or not, and for better or worse, these annotations are the standard way of expressing that in Java.
>
>

Re: Null checking in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

Yes, annotations that we add to the code base on purpose (like @Nullable 
or @SuppressWarnings) are aboslutely fine. What is worse is that the 
checked is not only checked, but a code generator. :)

For example when one wants to implement Coder by extending CustomCoder 
and use auto-generating the overridden methods, they look like

@Override public void encode(Long value, @UnknownKeyFor @NonNull @Initialized OutputStream outStream)throws @UnknownKeyFor@NonNull@Initialized CoderException, @UnknownKeyFor@NonNull@Initialized IOException {

}

Which is really ugly. :-(

  Jan

On 3/15/21 6:37 PM, Ismaël Mejía wrote:
> +1
>
> Even if I like the strictness for Null checking, I also think that
> this is adding too much extra time for builds (that I noticed locally
> when enabled) and also I agree with Jan that the annotations are
> really an undesired side effect. For reference when you try to auto
> complete some method signatures on IntelliJ on downstream projects
> with C-A-v it generates some extra Checkers annotations like @NonNull
> and others even if the user isn't using them which is not desirable.
>
>
>
> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com> wrote:
>>> Big +1 for moving this to separate CI job. I really don't like what annotations are currently added to the code we ship. Tools like Idea add these annotations to code they generate when overriding classes and that's very annoying. Users should not be exposed to internal tools like nullability checking.
>>
>> I was only planning on moving this to a separate CI job. The job would still be release blocking, so the same annotations would still be required.
>>
>> I'm not sure which annotations you are concerned about. There are two annotations involved with nullness checking, @SuppressWarnings and @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't be exposed to users at all. @Nullable is not just for internal tooling, it also provides useful information about our APIs to users. The user should not have to guess whether a method argument etc. can be null or not, and for better or worse, these annotations are the standard way of expressing that in Java.

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

I agree with all of your points. I have good news and bad news. Reordering
your points to put some together

On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:

>  a) if a check does not modify the bytecode, it is fine and we can use it
> - we are absolutely free to use any tooling we agree on, if our users
> cannot be affected anyhow
>
+1 and this is a possibility, but would reduce the value for users.

 b) if a tool needs to be leaked to user, it should be as small leakage as
> possible
>
If we want to keep some annotations for user's benefit (which might be
> fine), it should be really limited to the bare minimum (e.g. only @Nullable
> for method arguments and return values, possibly more, I don't know if and
> how we can configure that). Definitely not @UnknownKeyFor, that is simply
> internal to the checker.

It would also stop the leakage (if we would release code without this
> check).
>
+1 we should only need to give users the annotations that we add.

> I checked with `javap -v -c -p
sdks/java/core/classes/classes/java/main/org/apache/beam/sdk/values/PDone.class`
and confirmed that there are unfortunately annotations that we do not need
actually in the bytecode.

You are right on the solution: skipping during release solves this and only
includes the annotations that we add. I will fix this for 2.29.0.

For context, I think the reason that these are inserted is that
checkerframework has configuration options that change what the defaults
are, so it inserts these to allow downstream users to have proper
information.

 c) if a check significantly affects compile performance, it should be
> possible to opt-out
>
Moving the check to different CI is a possibility (a)), it would then
> require opt-in flag to be able to run the check locally.


./gradlew -PskipCheckerframework opts out. The problem seems to be that it
is hard to pass flags when using IntelliJ.

It has only minor effect on CI because most results are FROM-CACHE and the
run is so long already.

Kenn


I think that our current setup violates all these three points.
>
> We should then have opt-out flag for local development before committing
> changes.
>
>  Jan
>
> [1]
> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>
I don't know the details of the checkerframework, but there seems a
contradiction between what code is (currently) generated and what we
therefore release and what actually the checkerframework states [1]:

@UnknownKeyFor:

Used internally by the type system; should never be written by a
programmer.

If this annotation is generated for overwritten methods, then I'd say, that
it means we place a great burden to our users - either not using
autogenerated methods, or erase all the generated annotations afterwards.
Either way, that is not "polite" from Beam.

What we should judge is not only a formal purity of code, but what stands
on the other side is how usable APIs we provide. We should not head for
100% pure code and sacrificing use comfort. Every check that leaks to user
code is at a price and we should not ignore that. No free lunch.

From my point of view:




> On 3/16/21 8:33 AM, Reuven Lax wrote:
>
>
>
> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:
>
>>
>>
>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> I will be blunt about my opinions about the general issue:
>>>
>>> - NullPointerExceptions (and similar) are a solved problem.
>>>    * They have been since 2003 at the latest [1] (this is when the types
>>> were hacked into Java - the foundation dates back to the 70s or earlier)
>>>
>>
>> Huh - Fahndrich tried to hire me once to work on a project called
>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
>> citations!
>>
>
> Umm, I was confusing names. Fahndrich is actually a former coworker at
> Google :)
>
>
>>
>>
>>>    * Checkerframework is a _pluggable_ system where that nullness type
>>> system is a "hello, world" level demo, since 2008 at the latest [2].
>>>    * Our users should know this and judge us accordingly.
>>>
>>> - Checkerframework should be thought of and described as type checking,
>>> because it is. It is not heuristic nor approximate.
>>> - If your code was unclear about whether something could be null, it was
>>> probably unclear to a person reading it also, and very likely to have
>>> actual bugs.
>>> - APIs that accept a lot of nullable parameters are, generally speaking,
>>> bad APIs. They are hard to use correctly, less readable, and very likely to
>>> cause actual bugs. You are forcing your users to deal with accidental
>>> complexity you left behind.
>>>   * Corollary to the above two points: Almost all the time, the changes
>>> to clearify nullness make your code better, more readable, easier for users
>>> or editors.
>>> - It is true that there is a learning curve to programming in this way.
>>>
>>
>> I agree with the above in a closed system. However as mentioned, some of
>> the APIs we use suffer from this.
>>
>> In a previous life, I saw up close an effort to add such analysis to a
>> large codebase. Two separate tools called Prefix and Prefast were used (the
>> difference between the two is actually quite interesting, but not relevant
>> here). However in order to make this analysis useful, there was a massive
>> effort to properly annotate the entire codebase, including all libraries
>> used. This isn't a perfect example - this was a C++ codebase which is much
>> harder to analyze, and these tools identified far more than simply nullness
>> errors (resource leaks, array indices, proper string null termination,
>> exception behavior, etc.). However the closer we can get to properly
>> annotating the transitive closure of our dependencies, the better this
>> framework will work.
>>
>>
>>
>>> - There are certainly common patterns in Java that do not work very
>>> well, and suppression is sometimes the best option.
>>>    * Example: JUnit's @Setup and @Test conventions do not work very well
>>> and it is not worth the effort to make them work. This is actually because
>>> if it were "normal code" it would be bad code. There are complex
>>> inter-method dependencies enforced only by convention. This matters:
>>> sometimes a JUnit test class is called from another class, used as "normal
>>> code". This does go wrong in practice. Plain old JUnit test cases
>>> frequently go wrong as well.
>>>
>>> And here is my opinion when it comes to Beam:
>>>
>>> - "Community over code" is not an excuse for negligent practices that
>>> cause easily avoidable risk to our users. I will be very disappointed if we
>>> choose that.
>>> - I think having tooling that helps newcomers write better code by
>>> default is better for the community too. Just like having automatic
>>> formatting is better. Less to haggle about in review, etc.
>>> - A simple search reveals about 170 bugs that we know of [4].
>>> - So far in almost every module I have fixed I discovered actual new
>>> null errors. Many examples at [5].
>>> - It is extremely easy to suppress the type checking. Almost all of our
>>> classes have it suppressed already (I did this work, to allow existing
>>> errors while protecting new code).
>>> - Including the annotations in the shipped jars is an important feature.
>>> Without this, users cannot write null-safe code themselves.
>>>    * Reuven highlighted this: when methods are not annotated, we have to
>>> use/implement workarounds. Actually Guava does use checkerframework
>>> annotations [6] and the problem is that it requires its *input* to already
>>> be non-null so actually you cannot even use it to convert nullable values
>>> to non-nullable values.
>>>    * Beam has its own [7] that is annotated, actually for yet another
>>> reason: when Guava's checkNotNull fails, it throws NPE when it should throw
>>> IllegalArgumentException. Guava's checkNotNull should not be used for input
>>> validation!
>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in user
>>> code. I wonder if there is something we can do about that. At the Java
>>> level, if they are not on the classpath they should be ignored and not
>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>>> area :-)  [8].
>>>
>>> I understand the pain of longer compile times slowing people down. That
>>> is actually a problem to be solved which does not require lowering our
>>> standards of quality. How about we try moving it to a separate CI job and
>>> see how it goes?
>>>
>>>
>>
>>> In my experience stories like Reuven's are much more frustrating in a
>>> separate CI job because you find out quite late that your code has flaws.
>>> Like when spotless fails, but much more work to fix, and would have been
>>> prevented long ago if it were integrated into the compile.
>>>
>>
>> I agree with this. I prefer to be able to detect on my computer that
>> there are failures, and not have to wait for submission. The complaint was
>> that some people are experiencing trouble on their local machine however,
>> so it seems reasonable to add an opt-out flag (though I would prefer opt
>> out to opt in).
>>
>>
>>>
>>> Kenn
>>>
>>> [1]
>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>> [2]
>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>> [3]
>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>> [4]
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>> [5] https://github.com/apache/beam/pull/12284 and
>>> https://github.com/apache/beam/pull/12162 and
>>> [6]
>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>> [7]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>
>>>
>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> I have some deeper concerns with the null checks. The fact that many
>>>> libraries we use (including guava) don't always annotate their methods
>>>> forces a lot of workarounds. As a very simple example, the return value
>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>> nullability checks don't know this. This and other similar cases require
>>>> constantly adding extra unnecessary null checks in the code just to make
>>>> the checker happy. There have been other cases where I haven't been able to
>>>> figure out a way to make the checker happy (often these seem to involve
>>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>>> disabled the check.
>>>>
>>>> Now you might say that it's worth the extra pain and ugliness of
>>>> writing "useless" code to ensure that we have null-safe code. However I
>>>> think this ignores a sociological aspect of software development. I have a
>>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>>> some time figuring out how to rewrite my code such that it makes the
>>>> checker happy (even though often it forced me to write much more awkward
>>>> code). However even I have often found myself giving up and just disabling
>>>> the check. Many others will have less tolerance than me, and will simply
>>>> disable the checks. At that point we'll have a codebase littered with
>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>>> be. I've seen similar struggles in other codebases, and generally having a
>>>> static checker with too many false positives often ends up being worse than
>>>> having no checker.
>>>>
>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Even if I like the strictness for Null checking, I also think that
>>>>> this is adding too much extra time for builds (that I noticed locally
>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>> really an undesired side effect. For reference when you try to auto
>>>>> complete some method signatures on IntelliJ on downstream projects
>>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>>> and others even if the user isn't using them which is not desirable.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Big +1 for moving this to separate CI job. I really don't like what
>>>>> annotations are currently added to the code we ship. Tools like Idea add
>>>>> these annotations to code they generate when overriding classes and that's
>>>>> very annoying. Users should not be exposed to internal tools like
>>>>> nullability checking.
>>>>> >
>>>>> >
>>>>> > I was only planning on moving this to a separate CI job. The job
>>>>> would still be release blocking, so the same annotations would still be
>>>>> required.
>>>>> >
>>>>> > I'm not sure which annotations you are concerned about. There are
>>>>> two annotations involved with nullness checking, @SuppressWarnings and
>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>>> also provides useful information about our APIs to users. The user should
>>>>> not have to guess whether a method argument etc. can be null or not, and
>>>>> for better or worse, these annotations are the standard way of expressing
>>>>> that in Java.
>>>>>
>>>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

On Wed, Mar 31, 2021 at 8:40 AM Alexey Romanenko <ar...@gmail.com>
wrote:

> I believe we should update a contribution doc on this, since now it’s
> confusing  - you run “./gradlew :your:task:check” and it’s fine but then it
> fails on CI because of some missed Nullable checks.
>

I'll add to the contributor guide. I think it would be ideal for :check to
still include it, and only exclude it from the main :compileJava
:compileTestJava. I think this might require some more Gradle work.


> Also, I noticed that “*checkNotNull(yourVar)*” or "*checkState(yourVar !=
> null)*” won’t help, it will still complain. Only explicit “*if (yourVar
> != null)*” works. Is it correct or do I miss something?
>

This is a problem with Guava's annotations mentioned earlier in the thread.
They are annotated appropriate for dynamic checks that already agree with
the type. You need to use
org.apache.beam.sdk.util.Preconditions.checkNotNull to do a null check that
alters the type from nullable to not-null.

Kenn


>
> On 24 Mar 2021, at 01:24, Kyle Weaver <kc...@google.com> wrote:
>
> I moved the checker framework from opt-out to opt-in [1]. You can opt in
> by passing "-PenableCheckerFramework=true" to the Gradle invocation. The
> checker still runs on all Jenkins CI builds.
>
> [1] https://github.com/apache/beam/pull/14301
>
> On Wed, Mar 17, 2021 at 8:50 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>>
>>
>> On Wed, Mar 17, 2021 at 2:49 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> If there is no way to configure which annotations should be generated,
>>> then I'd be +1 for removing the checker to separated CI and adding an
>>> opt-in flag for the check when run locally.
>>>
>> Yes. If this answer is correct, then we are out of luck:
>> https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods
>>
>> A good followup to moving it to a separate CI job would be to attach a
>> full type checking run to the `:check` gradle task. This should be the best
>> place to attach all of our extended checks like spotbugs, spotless,
>> checkstyle, checkerframework. I rarely run `:check` myself (I think it is
>> currently broken in some ways) but it may be good to start.
>>
>> BTW the slowdown we need to solve is local builds, not CI runs. When it
>> was added the slowdown was almost completely prevented by caching. And now
>> that we disable it for test classes (where for some reason it was extra
>> slow) I wonder if it will speed up the main CI runs at all. So I would say
>> the first thing to do is to just disable it by default but enable it in the
>> Jenkins job builders.
>>
>> Kenn
>>
>> We need to solve the issue for dev@ as well, as the undesirable
>>> annotations are already digging their way to the code base:
>>>
>>> git/apache/beam$ git grep UnknownKeyFor
>>>
>>> Another strange thing is that it seems, that we are pulling the
>>> checkerframework as a runtime dependency (at least for some submodules).
>>> When I run `mvn dependency:tree` on one of my projects that uses maven I see
>>>
>>> [INFO] +- com.google.guava:guava:jar:30.1-jre:provided
>>> [INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
>>> [INFO] |  +-
>>> com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:provided
>>> [INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided
>>>
>>> which is fine, but when I add beam-sdks-java-extensions-euphoria it
>>> changes to
>>>
>>> [INFO] +-
>>> org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile
>>> [INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile
>>>
>>> I'm not a gradle guru, so I cannot tell how this happens, there seems to
>>> be nothing special about euphoria in the gradle.
>>>
>>>  Jan
>>> On 3/16/21 7:12 PM, Kenneth Knowles wrote:
>>>
>>> I've asked on checkerframework users mailing list if they or any users
>>> have recommendations for the IntelliJ integration issue.
>>>
>>> It is worth noting that the default annotations inserted into the
>>> bytecode do add value: the @NonNull type annotations are default for
>>> checkerframework but not default for spotbugs. So having the default
>>> inserted enables downstream users to have betters spotbugs heuristic
>>> analysis. Further, the defaults can be adjusted by users so freezing them
>>> at the values we use them at is valuable in general across all tools.
>>>
>>> I think it makes sense to sacrifice these minor value-adds to keep
>>> things simple-looking for IntelliJ users.
>>>
>>> Kenn
>>>
>>> On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <ke...@apache.org>
>>> wrote:
>>>
>>>> Seems it is an FAQ with no solution:
>>>> https://checkerframework.org/manual/#faq-classfile-annotations
>>>>
>>>> On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <ke...@apache.org>
>>>> wrote:
>>>>
>>>>> Adding -PskipCheckerframework when releasing will solve it for users,
>>>>> but not for dev@.
>>>>>
>>>>> Making it off by default and a separate CI check that turns it on
>>>>> would solve it overall but has the downsides mentioned before.
>>>>>
>>>>> It would be very nice to simply have a flag to not insert default
>>>>> annotations.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>>
>>>>>> I believe it is not a problem of Idea. At least I didn't notice
>>>>>> behavior like that with Guava, although Guava uses the framework as well.
>>>>>> Maybe there is a way to tune which annotations should be generated? Or
>>>>>> Guava handles that somehow differently?
>>>>>> On 3/16/21 5:22 PM, Reuven Lax wrote:
>>>>>>
>>>>>> I've also been annoyed at IntelliJ autogenenerating all these
>>>>>> annotations. I believe Kenn said that this was not the intention - maybe
>>>>>> there's an IntelliJ setting that would stop this from happening?
>>>>>>
>>>>>> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>>>
>>>>>>> I don't know the details of the checkerframework, but there seems a
>>>>>>> contradiction between what code is (currently) generated and what we
>>>>>>> therefore release and what actually the checkerframework states [1]:
>>>>>>>
>>>>>>> @UnknownKeyFor:
>>>>>>>
>>>>>>> Used internally by the type system; should never be written by a
>>>>>>> programmer.
>>>>>>>
>>>>>>> If this annotation is generated for overwritten methods, then I'd
>>>>>>> say, that it means we place a great burden to our users - either not using
>>>>>>> autogenerated methods, or erase all the generated annotations afterwards.
>>>>>>> Either way, that is not "polite" from Beam.
>>>>>>>
>>>>>>> What we should judge is not only a formal purity of code, but what
>>>>>>> stands on the other side is how usable APIs we provide. We should not head
>>>>>>> for 100% pure code and sacrificing use comfort. Every check that leaks to
>>>>>>> user code is at a price and we should not ignore that. No free lunch.
>>>>>>>
>>>>>>> From my point of view:
>>>>>>>
>>>>>>>  a) if a check does not modify the bytecode, it is fine and we can
>>>>>>> use it - we are absolutely free to use any tooling we agree on, if our
>>>>>>> users cannot be affected anyhow
>>>>>>>
>>>>>>>  b) if a tool needs to be leaked to user, it should be as small
>>>>>>> leakage as possible
>>>>>>>
>>>>>>>  c) if a check significantly affects compile performance, it should
>>>>>>> be possible to opt-out
>>>>>>>
>>>>>>> I think that our current setup violates all these three points.
>>>>>>>
>>>>>>> Moving the check to different CI is a possibility (a)), it would
>>>>>>> then require opt-in flag to be able to run the check locally. It would also
>>>>>>> stop the leakage (if we would release code without this check).
>>>>>>>
>>>>>>> If we want to keep some annotations for user's benefit (which might
>>>>>>> be fine), it should be really limited to the bare minimum (e.g. only
>>>>>>> @Nullable for method arguments and return values, possibly more, I don't
>>>>>>> know if and how we can configure that). Definitely not @UnknownKeyFor, that
>>>>>>> is simply internal to the checker. We should then have opt-out flag for
>>>>>>> local development before committing changes.
>>>>>>>
>>>>>>>  Jan
>>>>>>>
>>>>>>> [1]
>>>>>>> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>>>>>>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I will be blunt about my opinions about the general issue:
>>>>>>>>>
>>>>>>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>>>>>>    * They have been since 2003 at the latest [1] (this is when the
>>>>>>>>> types were hacked into Java - the foundation dates back to the 70s or
>>>>>>>>> earlier)
>>>>>>>>>
>>>>>>>>
>>>>>>>> Huh - Fahndrich tried to hire me once to work on a project called
>>>>>>>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
>>>>>>>> citations!
>>>>>>>>
>>>>>>>
>>>>>>> Umm, I was confusing names. Fahndrich is actually a former coworker
>>>>>>> at Google :)
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>    * Checkerframework is a _pluggable_ system where that nullness
>>>>>>>>> type system is a "hello, world" level demo, since 2008 at the latest [2].
>>>>>>>>>    * Our users should know this and judge us accordingly.
>>>>>>>>>
>>>>>>>>> - Checkerframework should be thought of and described as type
>>>>>>>>> checking, because it is. It is not heuristic nor approximate.
>>>>>>>>> - If your code was unclear about whether something could be null,
>>>>>>>>> it was probably unclear to a person reading it also, and very likely to
>>>>>>>>> have actual bugs.
>>>>>>>>> - APIs that accept a lot of nullable parameters are, generally
>>>>>>>>> speaking, bad APIs. They are hard to use correctly, less readable, and very
>>>>>>>>> likely to cause actual bugs. You are forcing your users to deal with
>>>>>>>>> accidental complexity you left behind.
>>>>>>>>>   * Corollary to the above two points: Almost all the time, the
>>>>>>>>> changes to clearify nullness make your code better, more readable, easier
>>>>>>>>> for users or editors.
>>>>>>>>> - It is true that there is a learning curve to programming in this
>>>>>>>>> way.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree with the above in a closed system. However as mentioned,
>>>>>>>> some of the APIs we use suffer from this.
>>>>>>>>
>>>>>>>> In a previous life, I saw up close an effort to add such analysis
>>>>>>>> to a large codebase. Two separate tools called Prefix and Prefast were used
>>>>>>>> (the difference between the two is actually quite interesting, but not
>>>>>>>> relevant here). However in order to make this analysis useful, there was a
>>>>>>>> massive effort to properly annotate the entire codebase, including all
>>>>>>>> libraries used. This isn't a perfect example - this was a C++ codebase
>>>>>>>> which is much harder to analyze, and these tools identified far more than
>>>>>>>> simply nullness errors (resource leaks, array indices, proper string null
>>>>>>>> termination, exception behavior, etc.). However the closer we can get to
>>>>>>>> properly annotating the transitive closure of our dependencies, the better
>>>>>>>> this framework will work.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> - There are certainly common patterns in Java that do not work
>>>>>>>>> very well, and suppression is sometimes the best option.
>>>>>>>>>    * Example: JUnit's @Setup and @Test conventions do not work
>>>>>>>>> very well and it is not worth the effort to make them work. This is
>>>>>>>>> actually because if it were "normal code" it would be bad code. There are
>>>>>>>>> complex inter-method dependencies enforced only by convention. This
>>>>>>>>> matters: sometimes a JUnit test class is called from another class, used as
>>>>>>>>> "normal code". This does go wrong in practice. Plain old JUnit test cases
>>>>>>>>> frequently go wrong as well.
>>>>>>>>>
>>>>>>>>> And here is my opinion when it comes to Beam:
>>>>>>>>>
>>>>>>>>> - "Community over code" is not an excuse for negligent practices
>>>>>>>>> that cause easily avoidable risk to our users. I will be very disappointed
>>>>>>>>> if we choose that.
>>>>>>>>> - I think having tooling that helps newcomers write better code by
>>>>>>>>> default is better for the community too. Just like having automatic
>>>>>>>>> formatting is better. Less to haggle about in review, etc.
>>>>>>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>>>>>>> - So far in almost every module I have fixed I discovered actual
>>>>>>>>> new null errors. Many examples at [5].
>>>>>>>>> - It is extremely easy to suppress the type checking. Almost all
>>>>>>>>> of our classes have it suppressed already (I did this work, to allow
>>>>>>>>> existing errors while protecting new code).
>>>>>>>>> - Including the annotations in the shipped jars is an important
>>>>>>>>> feature. Without this, users cannot write null-safe code themselves.
>>>>>>>>>    * Reuven highlighted this: when methods are not annotated, we
>>>>>>>>> have to use/implement workarounds. Actually Guava does use checkerframework
>>>>>>>>> annotations [6] and the problem is that it requires its *input* to already
>>>>>>>>> be non-null so actually you cannot even use it to convert nullable values
>>>>>>>>> to non-nullable values.
>>>>>>>>>    * Beam has its own [7] that is annotated, actually for yet
>>>>>>>>> another reason: when Guava's checkNotNull fails, it throws NPE when it
>>>>>>>>> should throw IllegalArgumentException. Guava's checkNotNull should not be
>>>>>>>>> used for input validation!
>>>>>>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations
>>>>>>>>> in user code. I wonder if there is something we can do about that. At the
>>>>>>>>> Java level, if they are not on the classpath they should be ignored and not
>>>>>>>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>>>>>>>>> area :-)  [8].
>>>>>>>>>
>>>>>>>>> I understand the pain of longer compile times slowing people down.
>>>>>>>>> That is actually a problem to be solved which does not require lowering our
>>>>>>>>> standards of quality. How about we try moving it to a separate CI job and
>>>>>>>>> see how it goes?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>> In my experience stories like Reuven's are much more frustrating
>>>>>>>>> in a separate CI job because you find out quite late that your code has
>>>>>>>>> flaws. Like when spotless fails, but much more work to fix, and would have
>>>>>>>>> been prevented long ago if it were integrated into the compile.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree with this. I prefer to be able to detect on my computer
>>>>>>>> that there are failures, and not have to wait for submission. The complaint
>>>>>>>> was that some people are experiencing trouble on their local machine
>>>>>>>> however, so it seems reasonable to add an opt-out flag (though I would
>>>>>>>> prefer opt out to opt in).
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>>>>>>>> [2]
>>>>>>>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>>>>>>> [3]
>>>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>>>>>>> [4]
>>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>>>>>>> [5] https://github.com/apache/beam/pull/12284 and
>>>>>>>>> https://github.com/apache/beam/pull/12162 and
>>>>>>>>> [6]
>>>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>>>>>>> [7]
>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>>>>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I have some deeper concerns with the null checks. The fact that
>>>>>>>>>> many libraries we use (including guava) don't always annotate their methods
>>>>>>>>>> forces a lot of workarounds. As a very simple example, the return value
>>>>>>>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>>>>>>>> nullability checks don't know this. This and other similar cases require
>>>>>>>>>> constantly adding extra unnecessary null checks in the code just to make
>>>>>>>>>> the checker happy. There have been other cases where I haven't been able to
>>>>>>>>>> figure out a way to make the checker happy (often these seem to involve
>>>>>>>>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>>>>>>>>> disabled the check.
>>>>>>>>>>
>>>>>>>>>> Now you might say that it's worth the extra pain and ugliness of
>>>>>>>>>> writing "useless" code to ensure that we have null-safe code. However I
>>>>>>>>>> think this ignores a sociological aspect of software development. I have a
>>>>>>>>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>>>>>>>>> some time figuring out how to rewrite my code such that it makes the
>>>>>>>>>> checker happy (even though often it forced me to write much more awkward
>>>>>>>>>> code). However even I have often found myself giving up and just disabling
>>>>>>>>>> the check. Many others will have less tolerance than me, and will simply
>>>>>>>>>> disable the checks. At that point we'll have a codebase littered with
>>>>>>>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>>>>>>>>> be. I've seen similar struggles in other codebases, and generally having a
>>>>>>>>>> static checker with too many false positives often ends up being worse than
>>>>>>>>>> having no checker.
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>> Even if I like the strictness for Null checking, I also think
>>>>>>>>>>> that
>>>>>>>>>>> this is adding too much extra time for builds (that I noticed
>>>>>>>>>>> locally
>>>>>>>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>>>>>>>> really an undesired side effect. For reference when you try to
>>>>>>>>>>> auto
>>>>>>>>>>> complete some method signatures on IntelliJ on downstream
>>>>>>>>>>> projects
>>>>>>>>>>> with C-A-v it generates some extra Checkers annotations like
>>>>>>>>>>> @NonNull
>>>>>>>>>>> and others even if the user isn't using them which is not
>>>>>>>>>>> desirable.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> Big +1 for moving this to separate CI job. I really don't
>>>>>>>>>>> like what annotations are currently added to the code we ship. Tools like
>>>>>>>>>>> Idea add these annotations to code they generate when overriding classes
>>>>>>>>>>> and that's very annoying. Users should not be exposed to internal tools
>>>>>>>>>>> like nullability checking.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > I was only planning on moving this to a separate CI job. The
>>>>>>>>>>> job would still be release blocking, so the same annotations would still be
>>>>>>>>>>> required.
>>>>>>>>>>> >
>>>>>>>>>>> > I'm not sure which annotations you are concerned about. There
>>>>>>>>>>> are two annotations involved with nullness checking, @SuppressWarnings and
>>>>>>>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>>>>>>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>>>>>>>>> also provides useful information about our APIs to users. The user should
>>>>>>>>>>> not have to guess whether a method argument etc. can be null or not, and
>>>>>>>>>>> for better or worse, these annotations are the standard way of expressing
>>>>>>>>>>> that in Java.
>>>>>>>>>>>
>>>>>>>>>>
>

Re: Null checking in Beam

Posted by Alexey Romanenko <ar...@gmail.com>.

I believe we should update a contribution doc on this, since now it’s confusing  - you run “./gradlew :your:task:check” and it’s fine but then it fails on CI because of some missed Nullable checks.

Also, I noticed that “checkNotNull(yourVar)” or "checkState(yourVar != null)” won’t help, it will still complain. Only explicit “if (yourVar != null)” works. Is it correct or do I miss something?

> On 24 Mar 2021, at 01:24, Kyle Weaver <kc...@google.com> wrote:
> 
> I moved the checker framework from opt-out to opt-in [1]. You can opt in by passing "-PenableCheckerFramework=true" to the Gradle invocation. The checker still runs on all Jenkins CI builds.
> 
> [1] https://github.com/apache/beam/pull/14301 <https://github.com/apache/beam/pull/14301>
> On Wed, Mar 17, 2021 at 8:50 AM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
> 
> 
> On Wed, Mar 17, 2021 at 2:49 AM Jan Lukavský <je.ik@seznam.cz <ma...@seznam.cz>> wrote:
> If there is no way to configure which annotations should be generated, then I'd be +1 for removing the checker to separated CI and adding an opt-in flag for the check when run locally.
> 
> Yes. If this answer is correct, then we are out of luck: https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods <https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods>
> 
> A good followup to moving it to a separate CI job would be to attach a full type checking run to the `:check` gradle task. This should be the best place to attach all of our extended checks like spotbugs, spotless, checkstyle, checkerframework. I rarely run `:check` myself (I think it is currently broken in some ways) but it may be good to start.
> 
> BTW the slowdown we need to solve is local builds, not CI runs. When it was added the slowdown was almost completely prevented by caching. And now that we disable it for test classes (where for some reason it was extra slow) I wonder if it will speed up the main CI runs at all. So I would say the first thing to do is to just disable it by default but enable it in the Jenkins job builders.
> 
> Kenn 
> 
> We need to solve the issue for dev@ as well, as the undesirable annotations are already digging their way to the code base:
> 
> git/apache/beam$ git grep UnknownKeyFor
> 
> Another strange thing is that it seems, that we are pulling the checkerframework as a runtime dependency (at least for some submodules). When I run `mvn dependency:tree` on one of my projects that uses maven I see
> 
> [INFO] +- com.google.guava:guava:jar:30.1-jre:provided
> [INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
> [INFO] |  +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:provided
> [INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided
> 
> which is fine, but when I add beam-sdks-java-extensions-euphoria it changes to
> 
> [INFO] +- org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile
> [INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile
> 
> I'm not a gradle guru, so I cannot tell how this happens, there seems to be nothing special about euphoria in the gradle.
> 
>  Jan
> 
> On 3/16/21 7:12 PM, Kenneth Knowles wrote:
>> I've asked on checkerframework users mailing list if they or any users have recommendations for the IntelliJ integration issue.
>> 
>> It is worth noting that the default annotations inserted into the bytecode do add value: the @NonNull type annotations are default for checkerframework but not default for spotbugs. So having the default inserted enables downstream users to have betters spotbugs heuristic analysis. Further, the defaults can be adjusted by users so freezing them at the values we use them at is valuable in general across all tools.
>> 
>> I think it makes sense to sacrifice these minor value-adds to keep things simple-looking for IntelliJ users.
>> 
>> Kenn
>> 
>> On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
>> Seems it is an FAQ with no solution: https://checkerframework.org/manual/#faq-classfile-annotations <https://checkerframework.org/manual/#faq-classfile-annotations>
>> On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
>> Adding -PskipCheckerframework when releasing will solve it for users, but not for dev@.
>> 
>> Making it off by default and a separate CI check that turns it on would solve it overall but has the downsides mentioned before.
>> 
>> It would be very nice to simply have a flag to not insert default annotations.
>> 
>> Kenn
>> 
>> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je.ik@seznam.cz <ma...@seznam.cz>> wrote:
>> I believe it is not a problem of Idea. At least I didn't notice behavior like that with Guava, although Guava uses the framework as well. Maybe there is a way to tune which annotations should be generated? Or Guava handles that somehow differently?
>> 
>> On 3/16/21 5:22 PM, Reuven Lax wrote:
>>> I've also been annoyed at IntelliJ autogenenerating all these annotations. I believe Kenn said that this was not the intention - maybe there's an IntelliJ setting that would stop this from happening?
>>> 
>>> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je.ik@seznam.cz <ma...@seznam.cz>> wrote:
>>> I don't know the details of the checkerframework, but there seems a contradiction between what code is (currently) generated and what we therefore release and what actually the checkerframework states [1]:
>>> 
>>> @UnknownKeyFor:
>>> 
>>> Used internally by the type system; should never be written by a programmer. 
>>> 
>>> If this annotation is generated for overwritten methods, then I'd say, that it means we place a great burden to our users - either not using autogenerated methods, or erase all the generated annotations afterwards. Either way, that is not "polite" from Beam.
>>> 
>>> What we should judge is not only a formal purity of code, but what stands on the other side is how usable APIs we provide. We should not head for 100% pure code and sacrificing use comfort. Every check that leaks to user code is at a price and we should not ignore that. No free lunch.
>>> 
>>> From my point of view:
>>> 
>>>  a) if a check does not modify the bytecode, it is fine and we can use it - we are absolutely free to use any tooling we agree on, if our users cannot be affected anyhow
>>> 
>>>  b) if a tool needs to be leaked to user, it should be as small leakage as possible
>>> 
>>>  c) if a check significantly affects compile performance, it should be possible to opt-out
>>> 
>>> I think that our current setup violates all these three points.
>>> 
>>> Moving the check to different CI is a possibility (a)), it would then require opt-in flag to be able to run the check locally. It would also stop the leakage (if we would release code without this check).
>>> 
>>> If we want to keep some annotations for user's benefit (which might be fine), it should be really limited to the bare minimum (e.g. only @Nullable for method arguments and return values, possibly more, I don't know if and how we can configure that). Definitely not @UnknownKeyFor, that is simply internal to the checker. We should then have opt-out flag for local development before committing changes.
>>> 
>>>  Jan
>>> 
>>> [1] https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html <https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html>
>>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>> 
>>>> 
>>>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <relax@google.com <ma...@google.com>> wrote:
>>>> 
>>>> 
>>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
>>>> I will be blunt about my opinions about the general issue:
>>>> 
>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>    * They have been since 2003 at the latest [1] (this is when the types were hacked into Java - the foundation dates back to the 70s or earlier)
>>>> 
>>>> Huh - Fahndrich tried to hire me once to work on a project called Singularity. Small world. Also note that Sanjay Ghemawat is listed in the citations!
>>>> 
>>>> Umm, I was confusing names. Fahndrich is actually a former coworker at Google :)
>>>>  
>>>>  
>>>>    * Checkerframework is a _pluggable_ system where that nullness type system is a "hello, world" level demo, since 2008 at the latest [2].
>>>>    * Our users should know this and judge us accordingly.
>>>> 
>>>> - Checkerframework should be thought of and described as type checking, because it is. It is not heuristic nor approximate.
>>>> - If your code was unclear about whether something could be null, it was probably unclear to a person reading it also, and very likely to have actual bugs.
>>>> - APIs that accept a lot of nullable parameters are, generally speaking, bad APIs. They are hard to use correctly, less readable, and very likely to cause actual bugs. You are forcing your users to deal with accidental complexity you left behind.
>>>>   * Corollary to the above two points: Almost all the time, the changes to clearify nullness make your code better, more readable, easier for users or editors.
>>>> - It is true that there is a learning curve to programming in this way.
>>>> 
>>>> I agree with the above in a closed system. However as mentioned, some of the APIs we use suffer from this.
>>>> 
>>>> In a previous life, I saw up close an effort to add such analysis to a large codebase. Two separate tools called Prefix and Prefast were used (the difference between the two is actually quite interesting, but not relevant here). However in order to make this analysis useful, there was a massive effort to properly annotate the entire codebase, including all libraries used. This isn't a perfect example - this was a C++ codebase which is much harder to analyze, and these tools identified far more than simply nullness errors (resource leaks, array indices, proper string null termination, exception behavior, etc.). However the closer we can get to properly annotating the transitive closure of our dependencies, the better this framework will work.
>>>> 
>>>>  
>>>> - There are certainly common patterns in Java that do not work very well, and suppression is sometimes the best option.
>>>>    * Example: JUnit's @Setup and @Test conventions do not work very well and it is not worth the effort to make them work. This is actually because if it were "normal code" it would be bad code. There are complex inter-method dependencies enforced only by convention. This matters: sometimes a JUnit test class is called from another class, used as "normal code". This does go wrong in practice. Plain old JUnit test cases frequently go wrong as well.
>>>> 
>>>> And here is my opinion when it comes to Beam:
>>>> 
>>>> - "Community over code" is not an excuse for negligent practices that cause easily avoidable risk to our users. I will be very disappointed if we choose that.
>>>> - I think having tooling that helps newcomers write better code by default is better for the community too. Just like having automatic formatting is better. Less to haggle about in review, etc.
>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>> - So far in almost every module I have fixed I discovered actual new null errors. Many examples at [5].
>>>> - It is extremely easy to suppress the type checking. Almost all of our classes have it suppressed already (I did this work, to allow existing errors while protecting new code).
>>>> - Including the annotations in the shipped jars is an important feature. Without this, users cannot write null-safe code themselves.
>>>>    * Reuven highlighted this: when methods are not annotated, we have to use/implement workarounds. Actually Guava does use checkerframework annotations [6] and the problem is that it requires its *input* to already be non-null so actually you cannot even use it to convert nullable values to non-nullable values.
>>>>    * Beam has its own [7] that is annotated, actually for yet another reason: when Guava's checkNotNull fails, it throws NPE when it should throw IllegalArgumentException. Guava's checkNotNull should not be used for input validation!
>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in user code. I wonder if there is something we can do about that. At the Java level, if they are not on the classpath they should be ignored and not affect users. Coincidentally, the JDK has had NullPointerExceptions in this area :-)  [8].
>>>> 
>>>> I understand the pain of longer compile times slowing people down. That is actually a problem to be solved which does not require lowering our standards of quality. How about we try moving it to a separate CI job and see how it goes?
>>>> 
>>>>  
>>>> In my experience stories like Reuven's are much more frustrating in a separate CI job because you find out quite late that your code has flaws. Like when spotless fails, but much more work to fix, and would have been prevented long ago if it were integrated into the compile.
>>>> 
>>>> I agree with this. I prefer to be able to detect on my computer that there are failures, and not have to wait for submission. The complaint was that some people are experiencing trouble on their local machine however, so it seems reasonable to add an opt-out flag (though I would prefer opt out to opt in).
>>>>  
>>>> 
>>>> Kenn
>>>> 
>>>> [1] https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf <https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf>
>>>> [2] https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf <https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf>
>>>> [3] https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275 <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275>
>>>> [4] https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22) <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)>
>>>> [5] https://github.com/apache/beam/pull/12284 <https://github.com/apache/beam/pull/12284> and https://github.com/apache/beam/pull/12162 <https://github.com/apache/beam/pull/12162> and 
>>>> [6] https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878 <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878>
>>>> [7] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java>
>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174 <https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174>
>>>> 
>>>> 
>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <relax@google.com <ma...@google.com>> wrote:
>>>> I have some deeper concerns with the null checks. The fact that many libraries we use (including guava) don't always annotate their methods forces a lot of workarounds. As a very simple example, the return value from Preconditions.checkNotNull clearly can never be null, yet the nullability checks don't know this. This and other similar cases require constantly adding extra unnecessary null checks in the code just to make the checker happy. There have been other cases where I haven't been able to figure out a way to make the checker happy (often these seem to involve using lambdas), and after 10-15 minutes of investigation have given up and disabled the check.
>>>> 
>>>> Now you might say that it's worth the extra pain and ugliness of writing "useless" code to ensure that we have null-safe code. However I think this ignores a sociological aspect of software development. I have a higher tolerance than many for this sort of pain, and I'm willing to spend some time figuring out how to rewrite my code such that it makes the checker happy (even though often it forced me to write much more awkward code). However even I have often found myself giving up and just disabling the check. Many others will have less tolerance than me, and will simply disable the checks. At that point we'll have a codebase littered with @SuppressWarnings("nullness"), which doesn't really get us where we want to be. I've seen similar struggles in other codebases, and generally having a static checker with too many false positives often ends up being worse than having no checker.
>>>> 
>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <iemejia@gmail.com <ma...@gmail.com>> wrote:
>>>> +1
>>>> 
>>>> Even if I like the strictness for Null checking, I also think that
>>>> this is adding too much extra time for builds (that I noticed locally
>>>> when enabled) and also I agree with Jan that the annotations are
>>>> really an undesired side effect. For reference when you try to auto
>>>> complete some method signatures on IntelliJ on downstream projects
>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>> and others even if the user isn't using them which is not desirable.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kcweaver@google.com <ma...@google.com>> wrote:
>>>> >>
>>>> >> Big +1 for moving this to separate CI job. I really don't like what annotations are currently added to the code we ship. Tools like Idea add these annotations to code they generate when overriding classes and that's very annoying. Users should not be exposed to internal tools like nullability checking.
>>>> >
>>>> >
>>>> > I was only planning on moving this to a separate CI job. The job would still be release blocking, so the same annotations would still be required.
>>>> >
>>>> > I'm not sure which annotations you are concerned about. There are two annotations involved with nullness checking, @SuppressWarnings and @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't be exposed to users at all. @Nullable is not just for internal tooling, it also provides useful information about our APIs to users. The user should not have to guess whether a method argument etc. can be null or not, and for better or worse, these annotations are the standard way of expressing that in Java.

Re: Null checking in Beam

Posted by Kyle Weaver <kc...@google.com>.

I moved the checker framework from opt-out to opt-in [1]. You can opt in by
passing "-PenableCheckerFramework=true" to the Gradle invocation. The
checker still runs on all Jenkins CI builds.

[1] https://github.com/apache/beam/pull/14301

On Wed, Mar 17, 2021 at 8:50 AM Kenneth Knowles <ke...@apache.org> wrote:

>
>
> On Wed, Mar 17, 2021 at 2:49 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> If there is no way to configure which annotations should be generated,
>> then I'd be +1 for removing the checker to separated CI and adding an
>> opt-in flag for the check when run locally.
>>
> Yes. If this answer is correct, then we are out of luck:
> https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods
>
> A good followup to moving it to a separate CI job would be to attach a
> full type checking run to the `:check` gradle task. This should be the best
> place to attach all of our extended checks like spotbugs, spotless,
> checkstyle, checkerframework. I rarely run `:check` myself (I think it is
> currently broken in some ways) but it may be good to start.
>
> BTW the slowdown we need to solve is local builds, not CI runs. When it
> was added the slowdown was almost completely prevented by caching. And now
> that we disable it for test classes (where for some reason it was extra
> slow) I wonder if it will speed up the main CI runs at all. So I would say
> the first thing to do is to just disable it by default but enable it in the
> Jenkins job builders.
>
> Kenn
>
> We need to solve the issue for dev@ as well, as the undesirable
>> annotations are already digging their way to the code base:
>>
>> git/apache/beam$ git grep UnknownKeyFor
>>
>> Another strange thing is that it seems, that we are pulling the
>> checkerframework as a runtime dependency (at least for some submodules).
>> When I run `mvn dependency:tree` on one of my projects that uses maven I see
>>
>> [INFO] +- com.google.guava:guava:jar:30.1-jre:provided
>> [INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
>> [INFO] |  +-
>> com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:provided
>> [INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided
>>
>> which is fine, but when I add beam-sdks-java-extensions-euphoria it
>> changes to
>>
>> [INFO] +-
>> org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile
>> [INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile
>>
>> I'm not a gradle guru, so I cannot tell how this happens, there seems to
>> be nothing special about euphoria in the gradle.
>>
>>  Jan
>> On 3/16/21 7:12 PM, Kenneth Knowles wrote:
>>
>> I've asked on checkerframework users mailing list if they or any users
>> have recommendations for the IntelliJ integration issue.
>>
>> It is worth noting that the default annotations inserted into the
>> bytecode do add value: the @NonNull type annotations are default for
>> checkerframework but not default for spotbugs. So having the default
>> inserted enables downstream users to have betters spotbugs heuristic
>> analysis. Further, the defaults can be adjusted by users so freezing them
>> at the values we use them at is valuable in general across all tools.
>>
>> I think it makes sense to sacrifice these minor value-adds to keep things
>> simple-looking for IntelliJ users.
>>
>> Kenn
>>
>> On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> Seems it is an FAQ with no solution:
>>> https://checkerframework.org/manual/#faq-classfile-annotations
>>>
>>> On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <ke...@apache.org>
>>> wrote:
>>>
>>>> Adding -PskipCheckerframework when releasing will solve it for users,
>>>> but not for dev@.
>>>>
>>>> Making it off by default and a separate CI check that turns it on would
>>>> solve it overall but has the downsides mentioned before.
>>>>
>>>> It would be very nice to simply have a flag to not insert default
>>>> annotations.
>>>>
>>>> Kenn
>>>>
>>>> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>
>>>>> I believe it is not a problem of Idea. At least I didn't notice
>>>>> behavior like that with Guava, although Guava uses the framework as well.
>>>>> Maybe there is a way to tune which annotations should be generated? Or
>>>>> Guava handles that somehow differently?
>>>>> On 3/16/21 5:22 PM, Reuven Lax wrote:
>>>>>
>>>>> I've also been annoyed at IntelliJ autogenenerating all these
>>>>> annotations. I believe Kenn said that this was not the intention - maybe
>>>>> there's an IntelliJ setting that would stop this from happening?
>>>>>
>>>>> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>>
>>>>>> I don't know the details of the checkerframework, but there seems a
>>>>>> contradiction between what code is (currently) generated and what we
>>>>>> therefore release and what actually the checkerframework states [1]:
>>>>>>
>>>>>> @UnknownKeyFor:
>>>>>>
>>>>>> Used internally by the type system; should never be written by a
>>>>>> programmer.
>>>>>>
>>>>>> If this annotation is generated for overwritten methods, then I'd
>>>>>> say, that it means we place a great burden to our users - either not using
>>>>>> autogenerated methods, or erase all the generated annotations afterwards.
>>>>>> Either way, that is not "polite" from Beam.
>>>>>>
>>>>>> What we should judge is not only a formal purity of code, but what
>>>>>> stands on the other side is how usable APIs we provide. We should not head
>>>>>> for 100% pure code and sacrificing use comfort. Every check that leaks to
>>>>>> user code is at a price and we should not ignore that. No free lunch.
>>>>>>
>>>>>> From my point of view:
>>>>>>
>>>>>>  a) if a check does not modify the bytecode, it is fine and we can
>>>>>> use it - we are absolutely free to use any tooling we agree on, if our
>>>>>> users cannot be affected anyhow
>>>>>>
>>>>>>  b) if a tool needs to be leaked to user, it should be as small
>>>>>> leakage as possible
>>>>>>
>>>>>>  c) if a check significantly affects compile performance, it should
>>>>>> be possible to opt-out
>>>>>>
>>>>>> I think that our current setup violates all these three points.
>>>>>>
>>>>>> Moving the check to different CI is a possibility (a)), it would then
>>>>>> require opt-in flag to be able to run the check locally. It would also stop
>>>>>> the leakage (if we would release code without this check).
>>>>>>
>>>>>> If we want to keep some annotations for user's benefit (which might
>>>>>> be fine), it should be really limited to the bare minimum (e.g. only
>>>>>> @Nullable for method arguments and return values, possibly more, I don't
>>>>>> know if and how we can configure that). Definitely not @UnknownKeyFor, that
>>>>>> is simply internal to the checker. We should then have opt-out flag for
>>>>>> local development before committing changes.
>>>>>>
>>>>>>  Jan
>>>>>>
>>>>>> [1]
>>>>>> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>>>>>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I will be blunt about my opinions about the general issue:
>>>>>>>>
>>>>>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>>>>>    * They have been since 2003 at the latest [1] (this is when the
>>>>>>>> types were hacked into Java - the foundation dates back to the 70s or
>>>>>>>> earlier)
>>>>>>>>
>>>>>>>
>>>>>>> Huh - Fahndrich tried to hire me once to work on a project called
>>>>>>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
>>>>>>> citations!
>>>>>>>
>>>>>>
>>>>>> Umm, I was confusing names. Fahndrich is actually a former coworker
>>>>>> at Google :)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>    * Checkerframework is a _pluggable_ system where that nullness
>>>>>>>> type system is a "hello, world" level demo, since 2008 at the latest [2].
>>>>>>>>    * Our users should know this and judge us accordingly.
>>>>>>>>
>>>>>>>> - Checkerframework should be thought of and described as type
>>>>>>>> checking, because it is. It is not heuristic nor approximate.
>>>>>>>> - If your code was unclear about whether something could be null,
>>>>>>>> it was probably unclear to a person reading it also, and very likely to
>>>>>>>> have actual bugs.
>>>>>>>> - APIs that accept a lot of nullable parameters are, generally
>>>>>>>> speaking, bad APIs. They are hard to use correctly, less readable, and very
>>>>>>>> likely to cause actual bugs. You are forcing your users to deal with
>>>>>>>> accidental complexity you left behind.
>>>>>>>>   * Corollary to the above two points: Almost all the time, the
>>>>>>>> changes to clearify nullness make your code better, more readable, easier
>>>>>>>> for users or editors.
>>>>>>>> - It is true that there is a learning curve to programming in this
>>>>>>>> way.
>>>>>>>>
>>>>>>>
>>>>>>> I agree with the above in a closed system. However as mentioned,
>>>>>>> some of the APIs we use suffer from this.
>>>>>>>
>>>>>>> In a previous life, I saw up close an effort to add such analysis to
>>>>>>> a large codebase. Two separate tools called Prefix and Prefast were used
>>>>>>> (the difference between the two is actually quite interesting, but not
>>>>>>> relevant here). However in order to make this analysis useful, there was a
>>>>>>> massive effort to properly annotate the entire codebase, including all
>>>>>>> libraries used. This isn't a perfect example - this was a C++ codebase
>>>>>>> which is much harder to analyze, and these tools identified far more than
>>>>>>> simply nullness errors (resource leaks, array indices, proper string null
>>>>>>> termination, exception behavior, etc.). However the closer we can get to
>>>>>>> properly annotating the transitive closure of our dependencies, the better
>>>>>>> this framework will work.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> - There are certainly common patterns in Java that do not work very
>>>>>>>> well, and suppression is sometimes the best option.
>>>>>>>>    * Example: JUnit's @Setup and @Test conventions do not work very
>>>>>>>> well and it is not worth the effort to make them work. This is actually
>>>>>>>> because if it were "normal code" it would be bad code. There are complex
>>>>>>>> inter-method dependencies enforced only by convention. This matters:
>>>>>>>> sometimes a JUnit test class is called from another class, used as "normal
>>>>>>>> code". This does go wrong in practice. Plain old JUnit test cases
>>>>>>>> frequently go wrong as well.
>>>>>>>>
>>>>>>>> And here is my opinion when it comes to Beam:
>>>>>>>>
>>>>>>>> - "Community over code" is not an excuse for negligent practices
>>>>>>>> that cause easily avoidable risk to our users. I will be very disappointed
>>>>>>>> if we choose that.
>>>>>>>> - I think having tooling that helps newcomers write better code by
>>>>>>>> default is better for the community too. Just like having automatic
>>>>>>>> formatting is better. Less to haggle about in review, etc.
>>>>>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>>>>>> - So far in almost every module I have fixed I discovered actual
>>>>>>>> new null errors. Many examples at [5].
>>>>>>>> - It is extremely easy to suppress the type checking. Almost all of
>>>>>>>> our classes have it suppressed already (I did this work, to allow existing
>>>>>>>> errors while protecting new code).
>>>>>>>> - Including the annotations in the shipped jars is an important
>>>>>>>> feature. Without this, users cannot write null-safe code themselves.
>>>>>>>>    * Reuven highlighted this: when methods are not annotated, we
>>>>>>>> have to use/implement workarounds. Actually Guava does use checkerframework
>>>>>>>> annotations [6] and the problem is that it requires its *input* to already
>>>>>>>> be non-null so actually you cannot even use it to convert nullable values
>>>>>>>> to non-nullable values.
>>>>>>>>    * Beam has its own [7] that is annotated, actually for yet
>>>>>>>> another reason: when Guava's checkNotNull fails, it throws NPE when it
>>>>>>>> should throw IllegalArgumentException. Guava's checkNotNull should not be
>>>>>>>> used for input validation!
>>>>>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in
>>>>>>>> user code. I wonder if there is something we can do about that. At the Java
>>>>>>>> level, if they are not on the classpath they should be ignored and not
>>>>>>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>>>>>>>> area :-)  [8].
>>>>>>>>
>>>>>>>> I understand the pain of longer compile times slowing people down.
>>>>>>>> That is actually a problem to be solved which does not require lowering our
>>>>>>>> standards of quality. How about we try moving it to a separate CI job and
>>>>>>>> see how it goes?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>> In my experience stories like Reuven's are much more frustrating in
>>>>>>>> a separate CI job because you find out quite late that your code has flaws.
>>>>>>>> Like when spotless fails, but much more work to fix, and would have been
>>>>>>>> prevented long ago if it were integrated into the compile.
>>>>>>>>
>>>>>>>
>>>>>>> I agree with this. I prefer to be able to detect on my computer that
>>>>>>> there are failures, and not have to wait for submission. The complaint was
>>>>>>> that some people are experiencing trouble on their local machine however,
>>>>>>> so it seems reasonable to add an opt-out flag (though I would prefer opt
>>>>>>> out to opt in).
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>>>>>>> [2]
>>>>>>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>>>>>> [3]
>>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>>>>>> [4]
>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>>>>>> [5] https://github.com/apache/beam/pull/12284 and
>>>>>>>> https://github.com/apache/beam/pull/12162 and
>>>>>>>> [6]
>>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>>>>>> [7]
>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>>>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have some deeper concerns with the null checks. The fact that
>>>>>>>>> many libraries we use (including guava) don't always annotate their methods
>>>>>>>>> forces a lot of workarounds. As a very simple example, the return value
>>>>>>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>>>>>>> nullability checks don't know this. This and other similar cases require
>>>>>>>>> constantly adding extra unnecessary null checks in the code just to make
>>>>>>>>> the checker happy. There have been other cases where I haven't been able to
>>>>>>>>> figure out a way to make the checker happy (often these seem to involve
>>>>>>>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>>>>>>>> disabled the check.
>>>>>>>>>
>>>>>>>>> Now you might say that it's worth the extra pain and ugliness of
>>>>>>>>> writing "useless" code to ensure that we have null-safe code. However I
>>>>>>>>> think this ignores a sociological aspect of software development. I have a
>>>>>>>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>>>>>>>> some time figuring out how to rewrite my code such that it makes the
>>>>>>>>> checker happy (even though often it forced me to write much more awkward
>>>>>>>>> code). However even I have often found myself giving up and just disabling
>>>>>>>>> the check. Many others will have less tolerance than me, and will simply
>>>>>>>>> disable the checks. At that point we'll have a codebase littered with
>>>>>>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>>>>>>>> be. I've seen similar struggles in other codebases, and generally having a
>>>>>>>>> static checker with too many false positives often ends up being worse than
>>>>>>>>> having no checker.
>>>>>>>>>
>>>>>>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Even if I like the strictness for Null checking, I also think that
>>>>>>>>>> this is adding too much extra time for builds (that I noticed
>>>>>>>>>> locally
>>>>>>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>>>>>>> really an undesired side effect. For reference when you try to
>>>>>>>>>> auto
>>>>>>>>>> complete some method signatures on IntelliJ on downstream projects
>>>>>>>>>> with C-A-v it generates some extra Checkers annotations like
>>>>>>>>>> @NonNull
>>>>>>>>>> and others even if the user isn't using them which is not
>>>>>>>>>> desirable.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> Big +1 for moving this to separate CI job. I really don't like
>>>>>>>>>> what annotations are currently added to the code we ship. Tools like Idea
>>>>>>>>>> add these annotations to code they generate when overriding classes and
>>>>>>>>>> that's very annoying. Users should not be exposed to internal tools like
>>>>>>>>>> nullability checking.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > I was only planning on moving this to a separate CI job. The
>>>>>>>>>> job would still be release blocking, so the same annotations would still be
>>>>>>>>>> required.
>>>>>>>>>> >
>>>>>>>>>> > I'm not sure which annotations you are concerned about. There
>>>>>>>>>> are two annotations involved with nullness checking, @SuppressWarnings and
>>>>>>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>>>>>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>>>>>>>> also provides useful information about our APIs to users. The user should
>>>>>>>>>> not have to guess whether a method argument etc. can be null or not, and
>>>>>>>>>> for better or worse, these annotations are the standard way of expressing
>>>>>>>>>> that in Java.
>>>>>>>>>>
>>>>>>>>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

On Wed, Mar 17, 2021 at 2:49 AM Jan Lukavský <je...@seznam.cz> wrote:

> If there is no way to configure which annotations should be generated,
> then I'd be +1 for removing the checker to separated CI and adding an
> opt-in flag for the check when run locally.
>
Yes. If this answer is correct, then we are out of luck:
https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods

A good followup to moving it to a separate CI job would be to attach a full
type checking run to the `:check` gradle task. This should be the best
place to attach all of our extended checks like spotbugs, spotless,
checkstyle, checkerframework. I rarely run `:check` myself (I think it is
currently broken in some ways) but it may be good to start.

BTW the slowdown we need to solve is local builds, not CI runs. When it was
added the slowdown was almost completely prevented by caching. And now that
we disable it for test classes (where for some reason it was extra slow) I
wonder if it will speed up the main CI runs at all. So I would say the
first thing to do is to just disable it by default but enable it in the
Jenkins job builders.

Kenn

We need to solve the issue for dev@ as well, as the undesirable annotations
> are already digging their way to the code base:
>
> git/apache/beam$ git grep UnknownKeyFor
>
> Another strange thing is that it seems, that we are pulling the
> checkerframework as a runtime dependency (at least for some submodules).
> When I run `mvn dependency:tree` on one of my projects that uses maven I see
>
> [INFO] +- com.google.guava:guava:jar:30.1-jre:provided
> [INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
> [INFO] |  +-
> com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:provided
> [INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided
>
> which is fine, but when I add beam-sdks-java-extensions-euphoria it
> changes to
>
> [INFO] +-
> org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile
> [INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile
>
> I'm not a gradle guru, so I cannot tell how this happens, there seems to
> be nothing special about euphoria in the gradle.
>
>  Jan
> On 3/16/21 7:12 PM, Kenneth Knowles wrote:
>
> I've asked on checkerframework users mailing list if they or any users
> have recommendations for the IntelliJ integration issue.
>
> It is worth noting that the default annotations inserted into the bytecode
> do add value: the @NonNull type annotations are default for
> checkerframework but not default for spotbugs. So having the default
> inserted enables downstream users to have betters spotbugs heuristic
> analysis. Further, the defaults can be adjusted by users so freezing them
> at the values we use them at is valuable in general across all tools.
>
> I think it makes sense to sacrifice these minor value-adds to keep things
> simple-looking for IntelliJ users.
>
> Kenn
>
> On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Seems it is an FAQ with no solution:
>> https://checkerframework.org/manual/#faq-classfile-annotations
>>
>> On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> Adding -PskipCheckerframework when releasing will solve it for users,
>>> but not for dev@.
>>>
>>> Making it off by default and a separate CI check that turns it on would
>>> solve it overall but has the downsides mentioned before.
>>>
>>> It would be very nice to simply have a flag to not insert default
>>> annotations.
>>>
>>> Kenn
>>>
>>> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> I believe it is not a problem of Idea. At least I didn't notice
>>>> behavior like that with Guava, although Guava uses the framework as well.
>>>> Maybe there is a way to tune which annotations should be generated? Or
>>>> Guava handles that somehow differently?
>>>> On 3/16/21 5:22 PM, Reuven Lax wrote:
>>>>
>>>> I've also been annoyed at IntelliJ autogenenerating all these
>>>> annotations. I believe Kenn said that this was not the intention - maybe
>>>> there's an IntelliJ setting that would stop this from happening?
>>>>
>>>> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>
>>>>> I don't know the details of the checkerframework, but there seems a
>>>>> contradiction between what code is (currently) generated and what we
>>>>> therefore release and what actually the checkerframework states [1]:
>>>>>
>>>>> @UnknownKeyFor:
>>>>>
>>>>> Used internally by the type system; should never be written by a
>>>>> programmer.
>>>>>
>>>>> If this annotation is generated for overwritten methods, then I'd say,
>>>>> that it means we place a great burden to our users - either not using
>>>>> autogenerated methods, or erase all the generated annotations afterwards.
>>>>> Either way, that is not "polite" from Beam.
>>>>>
>>>>> What we should judge is not only a formal purity of code, but what
>>>>> stands on the other side is how usable APIs we provide. We should not head
>>>>> for 100% pure code and sacrificing use comfort. Every check that leaks to
>>>>> user code is at a price and we should not ignore that. No free lunch.
>>>>>
>>>>> From my point of view:
>>>>>
>>>>>  a) if a check does not modify the bytecode, it is fine and we can use
>>>>> it - we are absolutely free to use any tooling we agree on, if our users
>>>>> cannot be affected anyhow
>>>>>
>>>>>  b) if a tool needs to be leaked to user, it should be as small
>>>>> leakage as possible
>>>>>
>>>>>  c) if a check significantly affects compile performance, it should be
>>>>> possible to opt-out
>>>>>
>>>>> I think that our current setup violates all these three points.
>>>>>
>>>>> Moving the check to different CI is a possibility (a)), it would then
>>>>> require opt-in flag to be able to run the check locally. It would also stop
>>>>> the leakage (if we would release code without this check).
>>>>>
>>>>> If we want to keep some annotations for user's benefit (which might be
>>>>> fine), it should be really limited to the bare minimum (e.g. only @Nullable
>>>>> for method arguments and return values, possibly more, I don't know if and
>>>>> how we can configure that). Definitely not @UnknownKeyFor, that is simply
>>>>> internal to the checker. We should then have opt-out flag for local
>>>>> development before committing changes.
>>>>>
>>>>>  Jan
>>>>>
>>>>> [1]
>>>>> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>>>>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I will be blunt about my opinions about the general issue:
>>>>>>>
>>>>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>>>>    * They have been since 2003 at the latest [1] (this is when the
>>>>>>> types were hacked into Java - the foundation dates back to the 70s or
>>>>>>> earlier)
>>>>>>>
>>>>>>
>>>>>> Huh - Fahndrich tried to hire me once to work on a project called
>>>>>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
>>>>>> citations!
>>>>>>
>>>>>
>>>>> Umm, I was confusing names. Fahndrich is actually a former coworker at
>>>>> Google :)
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>    * Checkerframework is a _pluggable_ system where that nullness
>>>>>>> type system is a "hello, world" level demo, since 2008 at the latest [2].
>>>>>>>    * Our users should know this and judge us accordingly.
>>>>>>>
>>>>>>> - Checkerframework should be thought of and described as type
>>>>>>> checking, because it is. It is not heuristic nor approximate.
>>>>>>> - If your code was unclear about whether something could be null, it
>>>>>>> was probably unclear to a person reading it also, and very likely to have
>>>>>>> actual bugs.
>>>>>>> - APIs that accept a lot of nullable parameters are, generally
>>>>>>> speaking, bad APIs. They are hard to use correctly, less readable, and very
>>>>>>> likely to cause actual bugs. You are forcing your users to deal with
>>>>>>> accidental complexity you left behind.
>>>>>>>   * Corollary to the above two points: Almost all the time, the
>>>>>>> changes to clearify nullness make your code better, more readable, easier
>>>>>>> for users or editors.
>>>>>>> - It is true that there is a learning curve to programming in this
>>>>>>> way.
>>>>>>>
>>>>>>
>>>>>> I agree with the above in a closed system. However as mentioned, some
>>>>>> of the APIs we use suffer from this.
>>>>>>
>>>>>> In a previous life, I saw up close an effort to add such analysis to
>>>>>> a large codebase. Two separate tools called Prefix and Prefast were used
>>>>>> (the difference between the two is actually quite interesting, but not
>>>>>> relevant here). However in order to make this analysis useful, there was a
>>>>>> massive effort to properly annotate the entire codebase, including all
>>>>>> libraries used. This isn't a perfect example - this was a C++ codebase
>>>>>> which is much harder to analyze, and these tools identified far more than
>>>>>> simply nullness errors (resource leaks, array indices, proper string null
>>>>>> termination, exception behavior, etc.). However the closer we can get to
>>>>>> properly annotating the transitive closure of our dependencies, the better
>>>>>> this framework will work.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> - There are certainly common patterns in Java that do not work very
>>>>>>> well, and suppression is sometimes the best option.
>>>>>>>    * Example: JUnit's @Setup and @Test conventions do not work very
>>>>>>> well and it is not worth the effort to make them work. This is actually
>>>>>>> because if it were "normal code" it would be bad code. There are complex
>>>>>>> inter-method dependencies enforced only by convention. This matters:
>>>>>>> sometimes a JUnit test class is called from another class, used as "normal
>>>>>>> code". This does go wrong in practice. Plain old JUnit test cases
>>>>>>> frequently go wrong as well.
>>>>>>>
>>>>>>> And here is my opinion when it comes to Beam:
>>>>>>>
>>>>>>> - "Community over code" is not an excuse for negligent practices
>>>>>>> that cause easily avoidable risk to our users. I will be very disappointed
>>>>>>> if we choose that.
>>>>>>> - I think having tooling that helps newcomers write better code by
>>>>>>> default is better for the community too. Just like having automatic
>>>>>>> formatting is better. Less to haggle about in review, etc.
>>>>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>>>>> - So far in almost every module I have fixed I discovered actual new
>>>>>>> null errors. Many examples at [5].
>>>>>>> - It is extremely easy to suppress the type checking. Almost all of
>>>>>>> our classes have it suppressed already (I did this work, to allow existing
>>>>>>> errors while protecting new code).
>>>>>>> - Including the annotations in the shipped jars is an important
>>>>>>> feature. Without this, users cannot write null-safe code themselves.
>>>>>>>    * Reuven highlighted this: when methods are not annotated, we
>>>>>>> have to use/implement workarounds. Actually Guava does use checkerframework
>>>>>>> annotations [6] and the problem is that it requires its *input* to already
>>>>>>> be non-null so actually you cannot even use it to convert nullable values
>>>>>>> to non-nullable values.
>>>>>>>    * Beam has its own [7] that is annotated, actually for yet
>>>>>>> another reason: when Guava's checkNotNull fails, it throws NPE when it
>>>>>>> should throw IllegalArgumentException. Guava's checkNotNull should not be
>>>>>>> used for input validation!
>>>>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in
>>>>>>> user code. I wonder if there is something we can do about that. At the Java
>>>>>>> level, if they are not on the classpath they should be ignored and not
>>>>>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>>>>>>> area :-)  [8].
>>>>>>>
>>>>>>> I understand the pain of longer compile times slowing people down.
>>>>>>> That is actually a problem to be solved which does not require lowering our
>>>>>>> standards of quality. How about we try moving it to a separate CI job and
>>>>>>> see how it goes?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>> In my experience stories like Reuven's are much more frustrating in
>>>>>>> a separate CI job because you find out quite late that your code has flaws.
>>>>>>> Like when spotless fails, but much more work to fix, and would have been
>>>>>>> prevented long ago if it were integrated into the compile.
>>>>>>>
>>>>>>
>>>>>> I agree with this. I prefer to be able to detect on my computer that
>>>>>> there are failures, and not have to wait for submission. The complaint was
>>>>>> that some people are experiencing trouble on their local machine however,
>>>>>> so it seems reasonable to add an opt-out flag (though I would prefer opt
>>>>>> out to opt in).
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> [1]
>>>>>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>>>>>> [2]
>>>>>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>>>>> [3]
>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>>>>> [4]
>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>>>>> [5] https://github.com/apache/beam/pull/12284 and
>>>>>>> https://github.com/apache/beam/pull/12162 and
>>>>>>> [6]
>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>>>>> [7]
>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> I have some deeper concerns with the null checks. The fact that
>>>>>>>> many libraries we use (including guava) don't always annotate their methods
>>>>>>>> forces a lot of workarounds. As a very simple example, the return value
>>>>>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>>>>>> nullability checks don't know this. This and other similar cases require
>>>>>>>> constantly adding extra unnecessary null checks in the code just to make
>>>>>>>> the checker happy. There have been other cases where I haven't been able to
>>>>>>>> figure out a way to make the checker happy (often these seem to involve
>>>>>>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>>>>>>> disabled the check.
>>>>>>>>
>>>>>>>> Now you might say that it's worth the extra pain and ugliness of
>>>>>>>> writing "useless" code to ensure that we have null-safe code. However I
>>>>>>>> think this ignores a sociological aspect of software development. I have a
>>>>>>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>>>>>>> some time figuring out how to rewrite my code such that it makes the
>>>>>>>> checker happy (even though often it forced me to write much more awkward
>>>>>>>> code). However even I have often found myself giving up and just disabling
>>>>>>>> the check. Many others will have less tolerance than me, and will simply
>>>>>>>> disable the checks. At that point we'll have a codebase littered with
>>>>>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>>>>>>> be. I've seen similar struggles in other codebases, and generally having a
>>>>>>>> static checker with too many false positives often ends up being worse than
>>>>>>>> having no checker.
>>>>>>>>
>>>>>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Even if I like the strictness for Null checking, I also think that
>>>>>>>>> this is adding too much extra time for builds (that I noticed
>>>>>>>>> locally
>>>>>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>>>>>> really an undesired side effect. For reference when you try to auto
>>>>>>>>> complete some method signatures on IntelliJ on downstream projects
>>>>>>>>> with C-A-v it generates some extra Checkers annotations like
>>>>>>>>> @NonNull
>>>>>>>>> and others even if the user isn't using them which is not
>>>>>>>>> desirable.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Big +1 for moving this to separate CI job. I really don't like
>>>>>>>>> what annotations are currently added to the code we ship. Tools like Idea
>>>>>>>>> add these annotations to code they generate when overriding classes and
>>>>>>>>> that's very annoying. Users should not be exposed to internal tools like
>>>>>>>>> nullability checking.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > I was only planning on moving this to a separate CI job. The job
>>>>>>>>> would still be release blocking, so the same annotations would still be
>>>>>>>>> required.
>>>>>>>>> >
>>>>>>>>> > I'm not sure which annotations you are concerned about. There
>>>>>>>>> are two annotations involved with nullness checking, @SuppressWarnings and
>>>>>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>>>>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>>>>>>> also provides useful information about our APIs to users. The user should
>>>>>>>>> not have to guess whether a method argument etc. can be null or not, and
>>>>>>>>> for better or worse, these annotations are the standard way of expressing
>>>>>>>>> that in Java.
>>>>>>>>>
>>>>>>>>

Re: Null checking in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

If there is no way to configure which annotations should be generated, 
then I'd be +1 for removing the checker to separated CI and adding an 
opt-in flag for the check when run locally.

We need to solve the issue for dev@ as well, as the undesirable 
annotations are already digging their way to the code base:

git/apache/beam$ git grep UnknownKeyFor

Another strange thing is that it seems, that we are pulling the 
checkerframework as a runtime dependency (at least for some submodules). 
When I run `mvn dependency:tree` on one of my projects that uses maven I see

[INFO] +- com.google.guava:guava:jar:30.1-jre:provided
[INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
[INFO] |  +- 
com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:provided
[INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided

which is fine, but when I add beam-sdks-java-extensions-euphoria it 
changes to

[INFO] +- 
org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile
[INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile

I'm not a gradle guru, so I cannot tell how this happens, there seems to 
be nothing special about euphoria in the gradle.

  Jan

On 3/16/21 7:12 PM, Kenneth Knowles wrote:
> I've asked on checkerframework users mailing list if they or any users 
> have recommendations for the IntelliJ integration issue.
>
> It is worth noting that the default annotations inserted into the 
> bytecode do add value: the @NonNull type annotations are default for 
> checkerframework but not default for spotbugs. So having the default 
> inserted enables downstream users to have betters spotbugs heuristic 
> analysis. Further, the defaults can be adjusted by users so freezing 
> them at the values we use them at is valuable in general across all tools.
>
> I think it makes sense to sacrifice these minor value-adds to keep 
> things simple-looking for IntelliJ users.
>
> Kenn
>
> On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <kenn@apache.org 
> <ma...@apache.org>> wrote:
>
>     Seems it is an FAQ with no solution:
>     https://checkerframework.org/manual/#faq-classfile-annotations
>     <https://checkerframework.org/manual/#faq-classfile-annotations>
>
>     On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <kenn@apache.org
>     <ma...@apache.org>> wrote:
>
>         Adding -PskipCheckerframework when releasing will solve it for
>         users, but not for dev@.
>
>         Making it off by default and a separate CI check that turns it
>         on would solve it overall but has the downsides mentioned before.
>
>         It would be very nice to simply have a flag to not insert
>         default annotations.
>
>         Kenn
>
>         On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je.ik@seznam.cz
>         <ma...@seznam.cz>> wrote:
>
>             I believe it is not a problem of Idea. At least I didn't
>             notice behavior like that with Guava, although Guava uses
>             the framework as well. Maybe there is a way to tune which
>             annotations should be generated? Or Guava handles that
>             somehow differently?
>
>             On 3/16/21 5:22 PM, Reuven Lax wrote:
>>             I've also been annoyed at IntelliJ autogenenerating all
>>             these annotations. I believe Kenn said that this was not
>>             the intention - maybe there's an IntelliJ setting that
>>             would stop this from happening?
>>
>>             On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský
>>             <je.ik@seznam.cz <ma...@seznam.cz>> wrote:
>>
>>                 I don't know the details of the checkerframework, but
>>                 there seems a contradiction between what code is
>>                 (currently) generated and what we therefore release
>>                 and what actually the checkerframework states [1]:
>>
>>                 @UnknownKeyFor:
>>
>>                 Used internally by the type system; should never be
>>                 written by a programmer.
>>
>>                 If this annotation is generated for overwritten
>>                 methods, then I'd say, that it means we place a great
>>                 burden to our users - either not using autogenerated
>>                 methods, or erase all the generated annotations
>>                 afterwards. Either way, that is not "polite" from Beam.
>>
>>                 What we should judge is not only a formal purity of
>>                 code, but what stands on the other side is how usable
>>                 APIs we provide. We should not head for 100% pure
>>                 code and sacrificing use comfort. Every check that
>>                 leaks to user code is at a price and we should not
>>                 ignore that. No free lunch.
>>
>>                 From my point of view:
>>
>>                  a) if a check does not modify the bytecode, it is
>>                 fine and we can use it - we are absolutely free to
>>                 use any tooling we agree on, if our users cannot be
>>                 affected anyhow
>>
>>                  b) if a tool needs to be leaked to user, it should
>>                 be as small leakage as possible
>>
>>                  c) if a check significantly affects compile
>>                 performance, it should be possible to opt-out
>>
>>                 I think that our current setup violates all these
>>                 three points.
>>
>>                 Moving the check to different CI is a possibility
>>                 (a)), it would then require opt-in flag to be able to
>>                 run the check locally. It would also stop the leakage
>>                 (if we would release code without this check).
>>
>>                 If we want to keep some annotations for user's
>>                 benefit (which might be fine), it should be really
>>                 limited to the bare minimum (e.g. only @Nullable for
>>                 method arguments and return values, possibly more, I
>>                 don't know if and how we can configure that).
>>                 Definitely not @UnknownKeyFor, that is simply
>>                 internal to the checker. We should then have opt-out
>>                 flag for local development before committing changes.
>>
>>                  Jan
>>
>>                 [1]
>>                 https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>>                 <https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html>
>>
>>                 On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>
>>>
>>>                 On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax
>>>                 <relax@google.com <ma...@google.com>> wrote:
>>>
>>>
>>>
>>>                     On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles
>>>                     <kenn@apache.org <ma...@apache.org>> wrote:
>>>
>>>                         I will be blunt about my opinions about the
>>>                         general issue:
>>>
>>>                         - NullPointerExceptions (and similar) are a
>>>                         solved problem.
>>>                            * They have been since 2003 at the latest
>>>                         [1] (this is when the types were hacked into
>>>                         Java - the foundation dates back to the 70s
>>>                         or earlier)
>>>
>>>
>>>                     Huh - Fahndrich tried to hire me once to work on
>>>                     a project called Singularity. Small world. Also
>>>                     note that Sanjay Ghemawat is listed in the
>>>                     citations!
>>>
>>>
>>>                 Umm, I was confusing names. Fahndrich is actually a
>>>                 former coworker at Google :)
>>>
>>>                            * Checkerframework is a _pluggable_
>>>                         system where that nullness type system is a
>>>                         "hello, world" level demo, since 2008 at the
>>>                         latest [2].
>>>                            * Our users should know this and judge us
>>>                         accordingly.
>>>
>>>                         - Checkerframework should be thought of and
>>>                         described as type checking, because it is.
>>>                         It is not heuristic nor approximate.
>>>                         - If your code was unclear about whether
>>>                         something could be null, it was probably
>>>                         unclear to a person reading it also, and
>>>                         very likely to have actual bugs.
>>>                         - APIs that accept a lot of nullable
>>>                         parameters are, generally speaking, bad
>>>                         APIs. They are hard to use correctly, less
>>>                         readable, and very likely to cause actual
>>>                         bugs. You are forcing your users to deal
>>>                         with accidental complexity you left behind.
>>>                           * Corollary to the above two points:
>>>                         Almost all the time, the changes to clearify
>>>                         nullness make your code better, more
>>>                         readable, easier for users or editors.
>>>                         - It is true that there is a learning curve
>>>                         to programming in this way.
>>>
>>>
>>>                     I agree with the above in a closed system.
>>>                     However as mentioned, some of the APIs we use
>>>                     suffer from this.
>>>
>>>                     In a previous life, I saw up close an effort to
>>>                     add such analysis to a large codebase. Two
>>>                     separate tools called Prefix and Prefast were
>>>                     used (the difference between the two is actually
>>>                     quite interesting, but not relevant here).
>>>                     However in order to make this analysis useful,
>>>                     there was a massive effort to properly annotate
>>>                     the entire codebase, including all libraries
>>>                     used. This isn't a perfect example - this was a
>>>                     C++ codebase which is much harder to analyze,
>>>                     and these tools identified far more than simply
>>>                     nullness errors (resource leaks, array indices,
>>>                     proper string null termination, exception
>>>                     behavior, etc.). However the closer we can get
>>>                     to properly annotating the transitive closure of
>>>                     our dependencies, the better this framework will
>>>                     work.
>>>
>>>                         - There are certainly common patterns in
>>>                         Java that do not work very well, and
>>>                         suppression is sometimes the best option.
>>>                            * Example: JUnit's @Setup and @Test
>>>                         conventions do not work very well and it is
>>>                         not worth the effort to make them work. This
>>>                         is actually because if it were "normal code"
>>>                         it would be bad code. There are complex
>>>                         inter-method dependencies enforced only by
>>>                         convention. This matters: sometimes a JUnit
>>>                         test class is called from another class,
>>>                         used as "normal code". This does go wrong in
>>>                         practice. Plain old JUnit test cases
>>>                         frequently go wrong as well.
>>>
>>>                         And here is my opinion when it comes to Beam:
>>>
>>>                         - "Community over code" is not an excuse for
>>>                         negligent practices that cause easily
>>>                         avoidable risk to our users. I will be very
>>>                         disappointed if we choose that.
>>>                         - I think having tooling that helps
>>>                         newcomers write better code by default is
>>>                         better for the community too. Just like
>>>                         having automatic formatting is better. Less
>>>                         to haggle about in review, etc.
>>>                         - A simple search reveals about 170 bugs
>>>                         that we know of [4].
>>>                         - So far in almost every module I have fixed
>>>                         I discovered actual new null errors. Many
>>>                         examples at [5].
>>>                         - It is extremely easy to suppress the type
>>>                         checking. Almost all of our classes have it
>>>                         suppressed already (I did this work, to
>>>                         allow existing errors while protecting new
>>>                         code).
>>>                         - Including the annotations in the shipped
>>>                         jars is an important feature. Without this,
>>>                         users cannot write null-safe code themselves.
>>>                            * Reuven highlighted this: when methods
>>>                         are not annotated, we have to use/implement
>>>                         workarounds. Actually Guava does use
>>>                         checkerframework annotations [6] and the
>>>                         problem is that it requires its *input* to
>>>                         already be non-null so actually you cannot
>>>                         even use it to convert nullable values to
>>>                         non-nullable values.
>>>                            * Beam has its own [7] that is annotated,
>>>                         actually for yet another reason: when
>>>                         Guava's checkNotNull fails, it throws NPE
>>>                         when it should throw
>>>                         IllegalArgumentException. Guava's
>>>                         checkNotNull should not be used for input
>>>                         validation!
>>>                         - It is unfortunate that IntelliJ inserts a
>>>                         bunch of annotations in user code. I wonder
>>>                         if there is something we can do about that.
>>>                         At the Java level, if they are not on the
>>>                         classpath they should be ignored and not
>>>                         affect users. Coincidentally, the JDK has
>>>                         had NullPointerExceptions in this area :-) [8].
>>>
>>>                         I understand the pain of longer compile
>>>                         times slowing people down. That is actually
>>>                         a problem to be solved which does not
>>>                         require lowering our standards of quality.
>>>                         How about we try moving it to a separate CI
>>>                         job and see how it goes?
>>>
>>>                         In my experience stories like Reuven's are
>>>                         much more frustrating in a separate CI job
>>>                         because you find out quite late that your
>>>                         code has flaws. Like when spotless fails,
>>>                         but much more work to fix, and would have
>>>                         been prevented long ago if it were
>>>                         integrated into the compile.
>>>
>>>
>>>                     I agree with this. I prefer to be able to detect
>>>                     on my computer that there are failures, and not
>>>                     have to wait for submission. The complaint was
>>>                     that some people are experiencing trouble on
>>>                     their local machine however, so it seems
>>>                     reasonable to add an opt-out flag (though I
>>>                     would prefer opt out to opt in).
>>>
>>>
>>>                         Kenn
>>>
>>>                         [1]
>>>                         https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>>                         <https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf>
>>>                         [2]
>>>                         https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>                         <https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf>
>>>                         [3]
>>>                         https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>                         <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275>
>>>                         [4]
>>>                         https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>                         <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)>
>>>                         [5]
>>>                         https://github.com/apache/beam/pull/12284
>>>                         <https://github.com/apache/beam/pull/12284>
>>>                         and
>>>                         https://github.com/apache/beam/pull/12162
>>>                         <https://github.com/apache/beam/pull/12162> and
>>>                         [6]
>>>                         https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>                         <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878>
>>>                         [7]
>>>                         https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>                         <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java>
>>>                         [8]
>>>                         https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>                         <https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174>
>>>
>>>
>>>                         On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax
>>>                         <relax@google.com <ma...@google.com>>
>>>                         wrote:
>>>
>>>                             I have some deeper concerns with the
>>>                             null checks. The fact that many
>>>                             libraries we use (including guava) don't
>>>                             always annotate their methods forces a
>>>                             lot of workarounds. As a very simple
>>>                             example, the return value from
>>>                             Preconditions.checkNotNull clearly can
>>>                             never be null, yet the nullability
>>>                             checks don't know this. This and other
>>>                             similar cases require constantly adding
>>>                             extra unnecessary null checks in the
>>>                             code just to make the checker happy.
>>>                             There have been other cases where I
>>>                             haven't been able to figure out a way to
>>>                             make the checker happy (often these seem
>>>                             to involve using lambdas), and after
>>>                             10-15 minutes of investigation have
>>>                             given up and disabled the check.
>>>
>>>                             Now you might say that it's worth the
>>>                             extra pain and ugliness of writing
>>>                             "useless" code to ensure that we have
>>>                             null-safe code. However I think this
>>>                             ignores a sociological aspect of
>>>                             software development. I have a
>>>                             higher tolerance than many for this sort
>>>                             of pain, and I'm willing to spend some
>>>                             time figuring out how to rewrite my code
>>>                             such that it makes the checker happy
>>>                             (even though often it forced me to write
>>>                             much more awkward code). However even I
>>>                             have often found myself giving up and
>>>                             just disabling the check. Many others
>>>                             will have less tolerance than me, and
>>>                             will simply disable the checks. At that
>>>                             point we'll have a codebase littered
>>>                             with @SuppressWarnings("nullness"),
>>>                             which doesn't really get us where we
>>>                             want to be. I've seen similar struggles
>>>                             in other codebases, and generally having
>>>                             a static checker with too many false
>>>                             positives often ends up being worse than
>>>                             having no checker.
>>>
>>>                             On Mon, Mar 15, 2021 at 10:37 AM Ismaël
>>>                             Mejía <iemejia@gmail.com
>>>                             <ma...@gmail.com>> wrote:
>>>
>>>                                 +1
>>>
>>>                                 Even if I like the strictness for
>>>                                 Null checking, I also think that
>>>                                 this is adding too much extra time
>>>                                 for builds (that I noticed locally
>>>                                 when enabled) and also I agree with
>>>                                 Jan that the annotations are
>>>                                 really an undesired side effect. For
>>>                                 reference when you try to auto
>>>                                 complete some method signatures on
>>>                                 IntelliJ on downstream projects
>>>                                 with C-A-v it generates some extra
>>>                                 Checkers annotations like @NonNull
>>>                                 and others even if the user isn't
>>>                                 using them which is not desirable.
>>>
>>>
>>>
>>>                                 On Mon, Mar 15, 2021 at 6:04 PM Kyle
>>>                                 Weaver <kcweaver@google.com
>>>                                 <ma...@google.com>> wrote:
>>>                                 >>
>>>                                 >> Big +1 for moving this to
>>>                                 separate CI job. I really don't like
>>>                                 what annotations are currently added
>>>                                 to the code we ship. Tools like Idea
>>>                                 add these annotations to code they
>>>                                 generate when overriding classes and
>>>                                 that's very annoying. Users should
>>>                                 not be exposed to internal tools
>>>                                 like nullability checking.
>>>                                 >
>>>                                 >
>>>                                 > I was only planning on moving this
>>>                                 to a separate CI job. The job would
>>>                                 still be release blocking, so the
>>>                                 same annotations would still be
>>>                                 required.
>>>                                 >
>>>                                 > I'm not sure which annotations you
>>>                                 are concerned about. There are two
>>>                                 annotations involved with nullness
>>>                                 checking, @SuppressWarnings and
>>>                                 @Nullable. @SuppressWarnings has
>>>                                 retention policy SOURCE, so it
>>>                                 shouldn't be exposed to users at
>>>                                 all. @Nullable is not just for
>>>                                 internal tooling, it also provides
>>>                                 useful information about our APIs to
>>>                                 users. The user should not have to
>>>                                 guess whether a method argument etc.
>>>                                 can be null or not, and for better
>>>                                 or worse, these annotations are the
>>>                                 standard way of expressing that in Java.
>>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

I've asked on checkerframework users mailing list if they or any users have
recommendations for the IntelliJ integration issue.

It is worth noting that the default annotations inserted into the bytecode
do add value: the @NonNull type annotations are default for
checkerframework but not default for spotbugs. So having the default
inserted enables downstream users to have betters spotbugs heuristic
analysis. Further, the defaults can be adjusted by users so freezing them
at the values we use them at is valuable in general across all tools.

I think it makes sense to sacrifice these minor value-adds to keep things
simple-looking for IntelliJ users.

Kenn

On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <ke...@apache.org> wrote:

> Seems it is an FAQ with no solution:
> https://checkerframework.org/manual/#faq-classfile-annotations
>
> On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Adding -PskipCheckerframework when releasing will solve it for users, but
>> not for dev@.
>>
>> Making it off by default and a separate CI check that turns it on would
>> solve it overall but has the downsides mentioned before.
>>
>> It would be very nice to simply have a flag to not insert default
>> annotations.
>>
>> Kenn
>>
>> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> I believe it is not a problem of Idea. At least I didn't notice behavior
>>> like that with Guava, although Guava uses the framework as well. Maybe
>>> there is a way to tune which annotations should be generated? Or Guava
>>> handles that somehow differently?
>>> On 3/16/21 5:22 PM, Reuven Lax wrote:
>>>
>>> I've also been annoyed at IntelliJ autogenenerating all these
>>> annotations. I believe Kenn said that this was not the intention - maybe
>>> there's an IntelliJ setting that would stop this from happening?
>>>
>>> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> I don't know the details of the checkerframework, but there seems a
>>>> contradiction between what code is (currently) generated and what we
>>>> therefore release and what actually the checkerframework states [1]:
>>>>
>>>> @UnknownKeyFor:
>>>>
>>>> Used internally by the type system; should never be written by a
>>>> programmer.
>>>>
>>>> If this annotation is generated for overwritten methods, then I'd say,
>>>> that it means we place a great burden to our users - either not using
>>>> autogenerated methods, or erase all the generated annotations afterwards.
>>>> Either way, that is not "polite" from Beam.
>>>>
>>>> What we should judge is not only a formal purity of code, but what
>>>> stands on the other side is how usable APIs we provide. We should not head
>>>> for 100% pure code and sacrificing use comfort. Every check that leaks to
>>>> user code is at a price and we should not ignore that. No free lunch.
>>>>
>>>> From my point of view:
>>>>
>>>>  a) if a check does not modify the bytecode, it is fine and we can use
>>>> it - we are absolutely free to use any tooling we agree on, if our users
>>>> cannot be affected anyhow
>>>>
>>>>  b) if a tool needs to be leaked to user, it should be as small leakage
>>>> as possible
>>>>
>>>>  c) if a check significantly affects compile performance, it should be
>>>> possible to opt-out
>>>>
>>>> I think that our current setup violates all these three points.
>>>>
>>>> Moving the check to different CI is a possibility (a)), it would then
>>>> require opt-in flag to be able to run the check locally. It would also stop
>>>> the leakage (if we would release code without this check).
>>>>
>>>> If we want to keep some annotations for user's benefit (which might be
>>>> fine), it should be really limited to the bare minimum (e.g. only @Nullable
>>>> for method arguments and return values, possibly more, I don't know if and
>>>> how we can configure that). Definitely not @UnknownKeyFor, that is simply
>>>> internal to the checker. We should then have opt-out flag for local
>>>> development before committing changes.
>>>>
>>>>  Jan
>>>>
>>>> [1]
>>>> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>>>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I will be blunt about my opinions about the general issue:
>>>>>>
>>>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>>>    * They have been since 2003 at the latest [1] (this is when the
>>>>>> types were hacked into Java - the foundation dates back to the 70s or
>>>>>> earlier)
>>>>>>
>>>>>
>>>>> Huh - Fahndrich tried to hire me once to work on a project called
>>>>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
>>>>> citations!
>>>>>
>>>>
>>>> Umm, I was confusing names. Fahndrich is actually a former coworker at
>>>> Google :)
>>>>
>>>>
>>>>>
>>>>>
>>>>>>    * Checkerframework is a _pluggable_ system where that nullness
>>>>>> type system is a "hello, world" level demo, since 2008 at the latest [2].
>>>>>>    * Our users should know this and judge us accordingly.
>>>>>>
>>>>>> - Checkerframework should be thought of and described as type
>>>>>> checking, because it is. It is not heuristic nor approximate.
>>>>>> - If your code was unclear about whether something could be null, it
>>>>>> was probably unclear to a person reading it also, and very likely to have
>>>>>> actual bugs.
>>>>>> - APIs that accept a lot of nullable parameters are, generally
>>>>>> speaking, bad APIs. They are hard to use correctly, less readable, and very
>>>>>> likely to cause actual bugs. You are forcing your users to deal with
>>>>>> accidental complexity you left behind.
>>>>>>   * Corollary to the above two points: Almost all the time, the
>>>>>> changes to clearify nullness make your code better, more readable, easier
>>>>>> for users or editors.
>>>>>> - It is true that there is a learning curve to programming in this
>>>>>> way.
>>>>>>
>>>>>
>>>>> I agree with the above in a closed system. However as mentioned, some
>>>>> of the APIs we use suffer from this.
>>>>>
>>>>> In a previous life, I saw up close an effort to add such analysis to a
>>>>> large codebase. Two separate tools called Prefix and Prefast were used (the
>>>>> difference between the two is actually quite interesting, but not relevant
>>>>> here). However in order to make this analysis useful, there was a massive
>>>>> effort to properly annotate the entire codebase, including all libraries
>>>>> used. This isn't a perfect example - this was a C++ codebase which is much
>>>>> harder to analyze, and these tools identified far more than simply nullness
>>>>> errors (resource leaks, array indices, proper string null termination,
>>>>> exception behavior, etc.). However the closer we can get to properly
>>>>> annotating the transitive closure of our dependencies, the better this
>>>>> framework will work.
>>>>>
>>>>>
>>>>>
>>>>>> - There are certainly common patterns in Java that do not work very
>>>>>> well, and suppression is sometimes the best option.
>>>>>>    * Example: JUnit's @Setup and @Test conventions do not work very
>>>>>> well and it is not worth the effort to make them work. This is actually
>>>>>> because if it were "normal code" it would be bad code. There are complex
>>>>>> inter-method dependencies enforced only by convention. This matters:
>>>>>> sometimes a JUnit test class is called from another class, used as "normal
>>>>>> code". This does go wrong in practice. Plain old JUnit test cases
>>>>>> frequently go wrong as well.
>>>>>>
>>>>>> And here is my opinion when it comes to Beam:
>>>>>>
>>>>>> - "Community over code" is not an excuse for negligent practices that
>>>>>> cause easily avoidable risk to our users. I will be very disappointed if we
>>>>>> choose that.
>>>>>> - I think having tooling that helps newcomers write better code by
>>>>>> default is better for the community too. Just like having automatic
>>>>>> formatting is better. Less to haggle about in review, etc.
>>>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>>>> - So far in almost every module I have fixed I discovered actual new
>>>>>> null errors. Many examples at [5].
>>>>>> - It is extremely easy to suppress the type checking. Almost all of
>>>>>> our classes have it suppressed already (I did this work, to allow existing
>>>>>> errors while protecting new code).
>>>>>> - Including the annotations in the shipped jars is an important
>>>>>> feature. Without this, users cannot write null-safe code themselves.
>>>>>>    * Reuven highlighted this: when methods are not annotated, we have
>>>>>> to use/implement workarounds. Actually Guava does use checkerframework
>>>>>> annotations [6] and the problem is that it requires its *input* to already
>>>>>> be non-null so actually you cannot even use it to convert nullable values
>>>>>> to non-nullable values.
>>>>>>    * Beam has its own [7] that is annotated, actually for yet another
>>>>>> reason: when Guava's checkNotNull fails, it throws NPE when it should throw
>>>>>> IllegalArgumentException. Guava's checkNotNull should not be used for input
>>>>>> validation!
>>>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in
>>>>>> user code. I wonder if there is something we can do about that. At the Java
>>>>>> level, if they are not on the classpath they should be ignored and not
>>>>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>>>>>> area :-)  [8].
>>>>>>
>>>>>> I understand the pain of longer compile times slowing people down.
>>>>>> That is actually a problem to be solved which does not require lowering our
>>>>>> standards of quality. How about we try moving it to a separate CI job and
>>>>>> see how it goes?
>>>>>>
>>>>>>
>>>>>
>>>>>> In my experience stories like Reuven's are much more frustrating in a
>>>>>> separate CI job because you find out quite late that your code has flaws.
>>>>>> Like when spotless fails, but much more work to fix, and would have been
>>>>>> prevented long ago if it were integrated into the compile.
>>>>>>
>>>>>
>>>>> I agree with this. I prefer to be able to detect on my computer that
>>>>> there are failures, and not have to wait for submission. The complaint was
>>>>> that some people are experiencing trouble on their local machine however,
>>>>> so it seems reasonable to add an opt-out flag (though I would prefer opt
>>>>> out to opt in).
>>>>>
>>>>>
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> [1]
>>>>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>>>>> [2]
>>>>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>>>> [3]
>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>>>> [4]
>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>>>> [5] https://github.com/apache/beam/pull/12284 and
>>>>>> https://github.com/apache/beam/pull/12162 and
>>>>>> [6]
>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>>>> [7]
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> I have some deeper concerns with the null checks. The fact that many
>>>>>>> libraries we use (including guava) don't always annotate their methods
>>>>>>> forces a lot of workarounds. As a very simple example, the return value
>>>>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>>>>> nullability checks don't know this. This and other similar cases require
>>>>>>> constantly adding extra unnecessary null checks in the code just to make
>>>>>>> the checker happy. There have been other cases where I haven't been able to
>>>>>>> figure out a way to make the checker happy (often these seem to involve
>>>>>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>>>>>> disabled the check.
>>>>>>>
>>>>>>> Now you might say that it's worth the extra pain and ugliness of
>>>>>>> writing "useless" code to ensure that we have null-safe code. However I
>>>>>>> think this ignores a sociological aspect of software development. I have a
>>>>>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>>>>>> some time figuring out how to rewrite my code such that it makes the
>>>>>>> checker happy (even though often it forced me to write much more awkward
>>>>>>> code). However even I have often found myself giving up and just disabling
>>>>>>> the check. Many others will have less tolerance than me, and will simply
>>>>>>> disable the checks. At that point we'll have a codebase littered with
>>>>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>>>>>> be. I've seen similar struggles in other codebases, and generally having a
>>>>>>> static checker with too many false positives often ends up being worse than
>>>>>>> having no checker.
>>>>>>>
>>>>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Even if I like the strictness for Null checking, I also think that
>>>>>>>> this is adding too much extra time for builds (that I noticed
>>>>>>>> locally
>>>>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>>>>> really an undesired side effect. For reference when you try to auto
>>>>>>>> complete some method signatures on IntelliJ on downstream projects
>>>>>>>> with C-A-v it generates some extra Checkers annotations like
>>>>>>>> @NonNull
>>>>>>>> and others even if the user isn't using them which is not desirable.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> Big +1 for moving this to separate CI job. I really don't like
>>>>>>>> what annotations are currently added to the code we ship. Tools like Idea
>>>>>>>> add these annotations to code they generate when overriding classes and
>>>>>>>> that's very annoying. Users should not be exposed to internal tools like
>>>>>>>> nullability checking.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I was only planning on moving this to a separate CI job. The job
>>>>>>>> would still be release blocking, so the same annotations would still be
>>>>>>>> required.
>>>>>>>> >
>>>>>>>> > I'm not sure which annotations you are concerned about. There are
>>>>>>>> two annotations involved with nullness checking, @SuppressWarnings and
>>>>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>>>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>>>>>> also provides useful information about our APIs to users. The user should
>>>>>>>> not have to guess whether a method argument etc. can be null or not, and
>>>>>>>> for better or worse, these annotations are the standard way of expressing
>>>>>>>> that in Java.
>>>>>>>>
>>>>>>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

Seems it is an FAQ with no solution:
https://checkerframework.org/manual/#faq-classfile-annotations

On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <ke...@apache.org> wrote:

> Adding -PskipCheckerframework when releasing will solve it for users, but
> not for dev@.
>
> Making it off by default and a separate CI check that turns it on would
> solve it overall but has the downsides mentioned before.
>
> It would be very nice to simply have a flag to not insert default
> annotations.
>
> Kenn
>
> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> I believe it is not a problem of Idea. At least I didn't notice behavior
>> like that with Guava, although Guava uses the framework as well. Maybe
>> there is a way to tune which annotations should be generated? Or Guava
>> handles that somehow differently?
>> On 3/16/21 5:22 PM, Reuven Lax wrote:
>>
>> I've also been annoyed at IntelliJ autogenenerating all these
>> annotations. I believe Kenn said that this was not the intention - maybe
>> there's an IntelliJ setting that would stop this from happening?
>>
>> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> I don't know the details of the checkerframework, but there seems a
>>> contradiction between what code is (currently) generated and what we
>>> therefore release and what actually the checkerframework states [1]:
>>>
>>> @UnknownKeyFor:
>>>
>>> Used internally by the type system; should never be written by a
>>> programmer.
>>>
>>> If this annotation is generated for overwritten methods, then I'd say,
>>> that it means we place a great burden to our users - either not using
>>> autogenerated methods, or erase all the generated annotations afterwards.
>>> Either way, that is not "polite" from Beam.
>>>
>>> What we should judge is not only a formal purity of code, but what
>>> stands on the other side is how usable APIs we provide. We should not head
>>> for 100% pure code and sacrificing use comfort. Every check that leaks to
>>> user code is at a price and we should not ignore that. No free lunch.
>>>
>>> From my point of view:
>>>
>>>  a) if a check does not modify the bytecode, it is fine and we can use
>>> it - we are absolutely free to use any tooling we agree on, if our users
>>> cannot be affected anyhow
>>>
>>>  b) if a tool needs to be leaked to user, it should be as small leakage
>>> as possible
>>>
>>>  c) if a check significantly affects compile performance, it should be
>>> possible to opt-out
>>>
>>> I think that our current setup violates all these three points.
>>>
>>> Moving the check to different CI is a possibility (a)), it would then
>>> require opt-in flag to be able to run the check locally. It would also stop
>>> the leakage (if we would release code without this check).
>>>
>>> If we want to keep some annotations for user's benefit (which might be
>>> fine), it should be really limited to the bare minimum (e.g. only @Nullable
>>> for method arguments and return values, possibly more, I don't know if and
>>> how we can configure that). Definitely not @UnknownKeyFor, that is simply
>>> internal to the checker. We should then have opt-out flag for local
>>> development before committing changes.
>>>
>>>  Jan
>>>
>>> [1]
>>> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>
>>>
>>>
>>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org>
>>>> wrote:
>>>>
>>>>> I will be blunt about my opinions about the general issue:
>>>>>
>>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>>    * They have been since 2003 at the latest [1] (this is when the
>>>>> types were hacked into Java - the foundation dates back to the 70s or
>>>>> earlier)
>>>>>
>>>>
>>>> Huh - Fahndrich tried to hire me once to work on a project called
>>>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
>>>> citations!
>>>>
>>>
>>> Umm, I was confusing names. Fahndrich is actually a former coworker at
>>> Google :)
>>>
>>>
>>>>
>>>>
>>>>>    * Checkerframework is a _pluggable_ system where that nullness type
>>>>> system is a "hello, world" level demo, since 2008 at the latest [2].
>>>>>    * Our users should know this and judge us accordingly.
>>>>>
>>>>> - Checkerframework should be thought of and described as type
>>>>> checking, because it is. It is not heuristic nor approximate.
>>>>> - If your code was unclear about whether something could be null, it
>>>>> was probably unclear to a person reading it also, and very likely to have
>>>>> actual bugs.
>>>>> - APIs that accept a lot of nullable parameters are, generally
>>>>> speaking, bad APIs. They are hard to use correctly, less readable, and very
>>>>> likely to cause actual bugs. You are forcing your users to deal with
>>>>> accidental complexity you left behind.
>>>>>   * Corollary to the above two points: Almost all the time, the
>>>>> changes to clearify nullness make your code better, more readable, easier
>>>>> for users or editors.
>>>>> - It is true that there is a learning curve to programming in this way.
>>>>>
>>>>
>>>> I agree with the above in a closed system. However as mentioned, some
>>>> of the APIs we use suffer from this.
>>>>
>>>> In a previous life, I saw up close an effort to add such analysis to a
>>>> large codebase. Two separate tools called Prefix and Prefast were used (the
>>>> difference between the two is actually quite interesting, but not relevant
>>>> here). However in order to make this analysis useful, there was a massive
>>>> effort to properly annotate the entire codebase, including all libraries
>>>> used. This isn't a perfect example - this was a C++ codebase which is much
>>>> harder to analyze, and these tools identified far more than simply nullness
>>>> errors (resource leaks, array indices, proper string null termination,
>>>> exception behavior, etc.). However the closer we can get to properly
>>>> annotating the transitive closure of our dependencies, the better this
>>>> framework will work.
>>>>
>>>>
>>>>
>>>>> - There are certainly common patterns in Java that do not work very
>>>>> well, and suppression is sometimes the best option.
>>>>>    * Example: JUnit's @Setup and @Test conventions do not work very
>>>>> well and it is not worth the effort to make them work. This is actually
>>>>> because if it were "normal code" it would be bad code. There are complex
>>>>> inter-method dependencies enforced only by convention. This matters:
>>>>> sometimes a JUnit test class is called from another class, used as "normal
>>>>> code". This does go wrong in practice. Plain old JUnit test cases
>>>>> frequently go wrong as well.
>>>>>
>>>>> And here is my opinion when it comes to Beam:
>>>>>
>>>>> - "Community over code" is not an excuse for negligent practices that
>>>>> cause easily avoidable risk to our users. I will be very disappointed if we
>>>>> choose that.
>>>>> - I think having tooling that helps newcomers write better code by
>>>>> default is better for the community too. Just like having automatic
>>>>> formatting is better. Less to haggle about in review, etc.
>>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>>> - So far in almost every module I have fixed I discovered actual new
>>>>> null errors. Many examples at [5].
>>>>> - It is extremely easy to suppress the type checking. Almost all of
>>>>> our classes have it suppressed already (I did this work, to allow existing
>>>>> errors while protecting new code).
>>>>> - Including the annotations in the shipped jars is an important
>>>>> feature. Without this, users cannot write null-safe code themselves.
>>>>>    * Reuven highlighted this: when methods are not annotated, we have
>>>>> to use/implement workarounds. Actually Guava does use checkerframework
>>>>> annotations [6] and the problem is that it requires its *input* to already
>>>>> be non-null so actually you cannot even use it to convert nullable values
>>>>> to non-nullable values.
>>>>>    * Beam has its own [7] that is annotated, actually for yet another
>>>>> reason: when Guava's checkNotNull fails, it throws NPE when it should throw
>>>>> IllegalArgumentException. Guava's checkNotNull should not be used for input
>>>>> validation!
>>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in
>>>>> user code. I wonder if there is something we can do about that. At the Java
>>>>> level, if they are not on the classpath they should be ignored and not
>>>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>>>>> area :-)  [8].
>>>>>
>>>>> I understand the pain of longer compile times slowing people down.
>>>>> That is actually a problem to be solved which does not require lowering our
>>>>> standards of quality. How about we try moving it to a separate CI job and
>>>>> see how it goes?
>>>>>
>>>>>
>>>>
>>>>> In my experience stories like Reuven's are much more frustrating in a
>>>>> separate CI job because you find out quite late that your code has flaws.
>>>>> Like when spotless fails, but much more work to fix, and would have been
>>>>> prevented long ago if it were integrated into the compile.
>>>>>
>>>>
>>>> I agree with this. I prefer to be able to detect on my computer that
>>>> there are failures, and not have to wait for submission. The complaint was
>>>> that some people are experiencing trouble on their local machine however,
>>>> so it seems reasonable to add an opt-out flag (though I would prefer opt
>>>> out to opt in).
>>>>
>>>>
>>>>>
>>>>> Kenn
>>>>>
>>>>> [1]
>>>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>>>> [2]
>>>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>>> [3]
>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>>> [4]
>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>>> [5] https://github.com/apache/beam/pull/12284 and
>>>>> https://github.com/apache/beam/pull/12162 and
>>>>> [6]
>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>>> [7]
>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>>>
>>>>>
>>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> I have some deeper concerns with the null checks. The fact that many
>>>>>> libraries we use (including guava) don't always annotate their methods
>>>>>> forces a lot of workarounds. As a very simple example, the return value
>>>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>>>> nullability checks don't know this. This and other similar cases require
>>>>>> constantly adding extra unnecessary null checks in the code just to make
>>>>>> the checker happy. There have been other cases where I haven't been able to
>>>>>> figure out a way to make the checker happy (often these seem to involve
>>>>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>>>>> disabled the check.
>>>>>>
>>>>>> Now you might say that it's worth the extra pain and ugliness of
>>>>>> writing "useless" code to ensure that we have null-safe code. However I
>>>>>> think this ignores a sociological aspect of software development. I have a
>>>>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>>>>> some time figuring out how to rewrite my code such that it makes the
>>>>>> checker happy (even though often it forced me to write much more awkward
>>>>>> code). However even I have often found myself giving up and just disabling
>>>>>> the check. Many others will have less tolerance than me, and will simply
>>>>>> disable the checks. At that point we'll have a codebase littered with
>>>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>>>>> be. I've seen similar struggles in other codebases, and generally having a
>>>>>> static checker with too many false positives often ends up being worse than
>>>>>> having no checker.
>>>>>>
>>>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Even if I like the strictness for Null checking, I also think that
>>>>>>> this is adding too much extra time for builds (that I noticed locally
>>>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>>>> really an undesired side effect. For reference when you try to auto
>>>>>>> complete some method signatures on IntelliJ on downstream projects
>>>>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>>>>> and others even if the user isn't using them which is not desirable.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Big +1 for moving this to separate CI job. I really don't like
>>>>>>> what annotations are currently added to the code we ship. Tools like Idea
>>>>>>> add these annotations to code they generate when overriding classes and
>>>>>>> that's very annoying. Users should not be exposed to internal tools like
>>>>>>> nullability checking.
>>>>>>> >
>>>>>>> >
>>>>>>> > I was only planning on moving this to a separate CI job. The job
>>>>>>> would still be release blocking, so the same annotations would still be
>>>>>>> required.
>>>>>>> >
>>>>>>> > I'm not sure which annotations you are concerned about. There are
>>>>>>> two annotations involved with nullness checking, @SuppressWarnings and
>>>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>>>>> also provides useful information about our APIs to users. The user should
>>>>>>> not have to guess whether a method argument etc. can be null or not, and
>>>>>>> for better or worse, these annotations are the standard way of expressing
>>>>>>> that in Java.
>>>>>>>
>>>>>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

Adding -PskipCheckerframework when releasing will solve it for users, but
not for dev@.

Making it off by default and a separate CI check that turns it on would
solve it overall but has the downsides mentioned before.

It would be very nice to simply have a flag to not insert default
annotations.

Kenn

On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je...@seznam.cz> wrote:

> I believe it is not a problem of Idea. At least I didn't notice behavior
> like that with Guava, although Guava uses the framework as well. Maybe
> there is a way to tune which annotations should be generated? Or Guava
> handles that somehow differently?
> On 3/16/21 5:22 PM, Reuven Lax wrote:
>
> I've also been annoyed at IntelliJ autogenenerating all these annotations.
> I believe Kenn said that this was not the intention - maybe there's an
> IntelliJ setting that would stop this from happening?
>
> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> I don't know the details of the checkerframework, but there seems a
>> contradiction between what code is (currently) generated and what we
>> therefore release and what actually the checkerframework states [1]:
>>
>> @UnknownKeyFor:
>>
>> Used internally by the type system; should never be written by a
>> programmer.
>>
>> If this annotation is generated for overwritten methods, then I'd say,
>> that it means we place a great burden to our users - either not using
>> autogenerated methods, or erase all the generated annotations afterwards.
>> Either way, that is not "polite" from Beam.
>>
>> What we should judge is not only a formal purity of code, but what stands
>> on the other side is how usable APIs we provide. We should not head for
>> 100% pure code and sacrificing use comfort. Every check that leaks to user
>> code is at a price and we should not ignore that. No free lunch.
>>
>> From my point of view:
>>
>>  a) if a check does not modify the bytecode, it is fine and we can use it
>> - we are absolutely free to use any tooling we agree on, if our users
>> cannot be affected anyhow
>>
>>  b) if a tool needs to be leaked to user, it should be as small leakage
>> as possible
>>
>>  c) if a check significantly affects compile performance, it should be
>> possible to opt-out
>>
>> I think that our current setup violates all these three points.
>>
>> Moving the check to different CI is a possibility (a)), it would then
>> require opt-in flag to be able to run the check locally. It would also stop
>> the leakage (if we would release code without this check).
>>
>> If we want to keep some annotations for user's benefit (which might be
>> fine), it should be really limited to the bare minimum (e.g. only @Nullable
>> for method arguments and return values, possibly more, I don't know if and
>> how we can configure that). Definitely not @UnknownKeyFor, that is simply
>> internal to the checker. We should then have opt-out flag for local
>> development before committing changes.
>>
>>  Jan
>>
>> [1]
>> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>
>>
>>
>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:
>>
>>>
>>>
>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> I will be blunt about my opinions about the general issue:
>>>>
>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>    * They have been since 2003 at the latest [1] (this is when the
>>>> types were hacked into Java - the foundation dates back to the 70s or
>>>> earlier)
>>>>
>>>
>>> Huh - Fahndrich tried to hire me once to work on a project called
>>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
>>> citations!
>>>
>>
>> Umm, I was confusing names. Fahndrich is actually a former coworker at
>> Google :)
>>
>>
>>>
>>>
>>>>    * Checkerframework is a _pluggable_ system where that nullness type
>>>> system is a "hello, world" level demo, since 2008 at the latest [2].
>>>>    * Our users should know this and judge us accordingly.
>>>>
>>>> - Checkerframework should be thought of and described as type checking,
>>>> because it is. It is not heuristic nor approximate.
>>>> - If your code was unclear about whether something could be null, it
>>>> was probably unclear to a person reading it also, and very likely to have
>>>> actual bugs.
>>>> - APIs that accept a lot of nullable parameters are, generally
>>>> speaking, bad APIs. They are hard to use correctly, less readable, and very
>>>> likely to cause actual bugs. You are forcing your users to deal with
>>>> accidental complexity you left behind.
>>>>   * Corollary to the above two points: Almost all the time, the changes
>>>> to clearify nullness make your code better, more readable, easier for users
>>>> or editors.
>>>> - It is true that there is a learning curve to programming in this way.
>>>>
>>>
>>> I agree with the above in a closed system. However as mentioned, some of
>>> the APIs we use suffer from this.
>>>
>>> In a previous life, I saw up close an effort to add such analysis to a
>>> large codebase. Two separate tools called Prefix and Prefast were used (the
>>> difference between the two is actually quite interesting, but not relevant
>>> here). However in order to make this analysis useful, there was a massive
>>> effort to properly annotate the entire codebase, including all libraries
>>> used. This isn't a perfect example - this was a C++ codebase which is much
>>> harder to analyze, and these tools identified far more than simply nullness
>>> errors (resource leaks, array indices, proper string null termination,
>>> exception behavior, etc.). However the closer we can get to properly
>>> annotating the transitive closure of our dependencies, the better this
>>> framework will work.
>>>
>>>
>>>
>>>> - There are certainly common patterns in Java that do not work very
>>>> well, and suppression is sometimes the best option.
>>>>    * Example: JUnit's @Setup and @Test conventions do not work very
>>>> well and it is not worth the effort to make them work. This is actually
>>>> because if it were "normal code" it would be bad code. There are complex
>>>> inter-method dependencies enforced only by convention. This matters:
>>>> sometimes a JUnit test class is called from another class, used as "normal
>>>> code". This does go wrong in practice. Plain old JUnit test cases
>>>> frequently go wrong as well.
>>>>
>>>> And here is my opinion when it comes to Beam:
>>>>
>>>> - "Community over code" is not an excuse for negligent practices that
>>>> cause easily avoidable risk to our users. I will be very disappointed if we
>>>> choose that.
>>>> - I think having tooling that helps newcomers write better code by
>>>> default is better for the community too. Just like having automatic
>>>> formatting is better. Less to haggle about in review, etc.
>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>> - So far in almost every module I have fixed I discovered actual new
>>>> null errors. Many examples at [5].
>>>> - It is extremely easy to suppress the type checking. Almost all of our
>>>> classes have it suppressed already (I did this work, to allow existing
>>>> errors while protecting new code).
>>>> - Including the annotations in the shipped jars is an important
>>>> feature. Without this, users cannot write null-safe code themselves.
>>>>    * Reuven highlighted this: when methods are not annotated, we have
>>>> to use/implement workarounds. Actually Guava does use checkerframework
>>>> annotations [6] and the problem is that it requires its *input* to already
>>>> be non-null so actually you cannot even use it to convert nullable values
>>>> to non-nullable values.
>>>>    * Beam has its own [7] that is annotated, actually for yet another
>>>> reason: when Guava's checkNotNull fails, it throws NPE when it should throw
>>>> IllegalArgumentException. Guava's checkNotNull should not be used for input
>>>> validation!
>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in
>>>> user code. I wonder if there is something we can do about that. At the Java
>>>> level, if they are not on the classpath they should be ignored and not
>>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>>>> area :-)  [8].
>>>>
>>>> I understand the pain of longer compile times slowing people down. That
>>>> is actually a problem to be solved which does not require lowering our
>>>> standards of quality. How about we try moving it to a separate CI job and
>>>> see how it goes?
>>>>
>>>>
>>>
>>>> In my experience stories like Reuven's are much more frustrating in a
>>>> separate CI job because you find out quite late that your code has flaws.
>>>> Like when spotless fails, but much more work to fix, and would have been
>>>> prevented long ago if it were integrated into the compile.
>>>>
>>>
>>> I agree with this. I prefer to be able to detect on my computer that
>>> there are failures, and not have to wait for submission. The complaint was
>>> that some people are experiencing trouble on their local machine however,
>>> so it seems reasonable to add an opt-out flag (though I would prefer opt
>>> out to opt in).
>>>
>>>
>>>>
>>>> Kenn
>>>>
>>>> [1]
>>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>>> [2]
>>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>> [3]
>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>> [4]
>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>> [5] https://github.com/apache/beam/pull/12284 and
>>>> https://github.com/apache/beam/pull/12162 and
>>>> [6]
>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>> [7]
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>>
>>>>
>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> I have some deeper concerns with the null checks. The fact that many
>>>>> libraries we use (including guava) don't always annotate their methods
>>>>> forces a lot of workarounds. As a very simple example, the return value
>>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>>> nullability checks don't know this. This and other similar cases require
>>>>> constantly adding extra unnecessary null checks in the code just to make
>>>>> the checker happy. There have been other cases where I haven't been able to
>>>>> figure out a way to make the checker happy (often these seem to involve
>>>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>>>> disabled the check.
>>>>>
>>>>> Now you might say that it's worth the extra pain and ugliness of
>>>>> writing "useless" code to ensure that we have null-safe code. However I
>>>>> think this ignores a sociological aspect of software development. I have a
>>>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>>>> some time figuring out how to rewrite my code such that it makes the
>>>>> checker happy (even though often it forced me to write much more awkward
>>>>> code). However even I have often found myself giving up and just disabling
>>>>> the check. Many others will have less tolerance than me, and will simply
>>>>> disable the checks. At that point we'll have a codebase littered with
>>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>>>> be. I've seen similar struggles in other codebases, and generally having a
>>>>> static checker with too many false positives often ends up being worse than
>>>>> having no checker.
>>>>>
>>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Even if I like the strictness for Null checking, I also think that
>>>>>> this is adding too much extra time for builds (that I noticed locally
>>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>>> really an undesired side effect. For reference when you try to auto
>>>>>> complete some method signatures on IntelliJ on downstream projects
>>>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>>>> and others even if the user isn't using them which is not desirable.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Big +1 for moving this to separate CI job. I really don't like
>>>>>> what annotations are currently added to the code we ship. Tools like Idea
>>>>>> add these annotations to code they generate when overriding classes and
>>>>>> that's very annoying. Users should not be exposed to internal tools like
>>>>>> nullability checking.
>>>>>> >
>>>>>> >
>>>>>> > I was only planning on moving this to a separate CI job. The job
>>>>>> would still be release blocking, so the same annotations would still be
>>>>>> required.
>>>>>> >
>>>>>> > I'm not sure which annotations you are concerned about. There are
>>>>>> two annotations involved with nullness checking, @SuppressWarnings and
>>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>>>> also provides useful information about our APIs to users. The user should
>>>>>> not have to guess whether a method argument etc. can be null or not, and
>>>>>> for better or worse, these annotations are the standard way of expressing
>>>>>> that in Java.
>>>>>>
>>>>>

Re: Null checking in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

I believe it is not a problem of Idea. At least I didn't notice behavior 
like that with Guava, although Guava uses the framework as well. Maybe 
there is a way to tune which annotations should be generated? Or Guava 
handles that somehow differently?

On 3/16/21 5:22 PM, Reuven Lax wrote:
> I've also been annoyed at IntelliJ autogenenerating all these 
> annotations. I believe Kenn said that this was not the intention - 
> maybe there's an IntelliJ setting that would stop this from happening?
>
> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     I don't know the details of the checkerframework, but there seems
>     a contradiction between what code is (currently) generated and
>     what we therefore release and what actually the checkerframework
>     states [1]:
>
>     @UnknownKeyFor:
>
>     Used internally by the type system; should never be written by a
>     programmer.
>
>     If this annotation is generated for overwritten methods, then I'd
>     say, that it means we place a great burden to our users - either
>     not using autogenerated methods, or erase all the generated
>     annotations afterwards. Either way, that is not "polite" from Beam.
>
>     What we should judge is not only a formal purity of code, but what
>     stands on the other side is how usable APIs we provide. We should
>     not head for 100% pure code and sacrificing use comfort. Every
>     check that leaks to user code is at a price and we should not
>     ignore that. No free lunch.
>
>     From my point of view:
>
>      a) if a check does not modify the bytecode, it is fine and we can
>     use it - we are absolutely free to use any tooling we agree on, if
>     our users cannot be affected anyhow
>
>      b) if a tool needs to be leaked to user, it should be as small
>     leakage as possible
>
>      c) if a check significantly affects compile performance, it
>     should be possible to opt-out
>
>     I think that our current setup violates all these three points.
>
>     Moving the check to different CI is a possibility (a)), it would
>     then require opt-in flag to be able to run the check locally. It
>     would also stop the leakage (if we would release code without this
>     check).
>
>     If we want to keep some annotations for user's benefit (which
>     might be fine), it should be really limited to the bare minimum
>     (e.g. only @Nullable for method arguments and return values,
>     possibly more, I don't know if and how we can configure that).
>     Definitely not @UnknownKeyFor, that is simply internal to the
>     checker. We should then have opt-out flag for local development
>     before committing changes.
>
>      Jan
>
>     [1]
>     https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>     <https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html>
>
>     On 3/16/21 8:33 AM, Reuven Lax wrote:
>>
>>
>>     On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <relax@google.com
>>     <ma...@google.com>> wrote:
>>
>>
>>
>>         On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles
>>         <kenn@apache.org <ma...@apache.org>> wrote:
>>
>>             I will be blunt about my opinions about the general issue:
>>
>>             - NullPointerExceptions (and similar) are a solved problem.
>>                * They have been since 2003 at the latest [1] (this is
>>             when the types were hacked into Java - the foundation
>>             dates back to the 70s or earlier)
>>
>>
>>         Huh - Fahndrich tried to hire me once to work on a project
>>         called Singularity. Small world. Also note that Sanjay
>>         Ghemawat is listed in the citations!
>>
>>
>>     Umm, I was confusing names. Fahndrich is actually a former
>>     coworker at Google :)
>>
>>                * Checkerframework is a _pluggable_ system where that
>>             nullness type system is a "hello, world" level demo,
>>             since 2008 at the latest [2].
>>                * Our users should know this and judge us accordingly.
>>
>>             - Checkerframework should be thought of and described as
>>             type checking, because it is. It is not heuristic nor
>>             approximate.
>>             - If your code was unclear about whether something could
>>             be null, it was probably unclear to a person reading it
>>             also, and very likely to have actual bugs.
>>             - APIs that accept a lot of nullable parameters are,
>>             generally speaking, bad APIs. They are hard to use
>>             correctly, less readable, and very likely to cause actual
>>             bugs. You are forcing your users to deal with accidental
>>             complexity you left behind.
>>               * Corollary to the above two points: Almost all the
>>             time, the changes to clearify nullness make your code
>>             better, more readable, easier for users or editors.
>>             - It is true that there is a learning curve to
>>             programming in this way.
>>
>>
>>         I agree with the above in a closed system. However as
>>         mentioned, some of the APIs we use suffer from this.
>>
>>         In a previous life, I saw up close an effort to add such
>>         analysis to a large codebase. Two separate tools called
>>         Prefix and Prefast were used (the difference between the two
>>         is actually quite interesting, but not relevant here).
>>         However in order to make this analysis useful, there was a
>>         massive effort to properly annotate the entire codebase,
>>         including all libraries used. This isn't a perfect example -
>>         this was a C++ codebase which is much harder to analyze, and
>>         these tools identified far more than simply nullness errors
>>         (resource leaks, array indices, proper string null
>>         termination, exception behavior, etc.). However the closer we
>>         can get to properly annotating the transitive closure of our
>>         dependencies, the better this framework will work.
>>
>>             - There are certainly common patterns in Java that do not
>>             work very well, and suppression is sometimes the best option.
>>                * Example: JUnit's @Setup and @Test conventions do not
>>             work very well and it is not worth the effort to make
>>             them work. This is actually because if it were "normal
>>             code" it would be bad code. There are complex
>>             inter-method dependencies enforced only by convention.
>>             This matters: sometimes a JUnit test class is called from
>>             another class, used as "normal code". This does go wrong
>>             in practice. Plain old JUnit test cases frequently go
>>             wrong as well.
>>
>>             And here is my opinion when it comes to Beam:
>>
>>             - "Community over code" is not an excuse for negligent
>>             practices that cause easily avoidable risk to our users.
>>             I will be very disappointed if we choose that.
>>             - I think having tooling that helps newcomers write
>>             better code by default is better for the community too.
>>             Just like having automatic formatting is better. Less to
>>             haggle about in review, etc.
>>             - A simple search reveals about 170 bugs that we know of [4].
>>             - So far in almost every module I have fixed I discovered
>>             actual new null errors. Many examples at [5].
>>             - It is extremely easy to suppress the type checking.
>>             Almost all of our classes have it suppressed already (I
>>             did this work, to allow existing errors while protecting
>>             new code).
>>             - Including the annotations in the shipped jars is an
>>             important feature. Without this, users cannot write
>>             null-safe code themselves.
>>                * Reuven highlighted this: when methods are not
>>             annotated, we have to use/implement workarounds. Actually
>>             Guava does use checkerframework annotations [6] and the
>>             problem is that it requires its *input* to already be
>>             non-null so actually you cannot even use it to convert
>>             nullable values to non-nullable values.
>>                * Beam has its own [7] that is annotated, actually for
>>             yet another reason: when Guava's checkNotNull fails, it
>>             throws NPE when it should throw IllegalArgumentException.
>>             Guava's checkNotNull should not be used for input validation!
>>             - It is unfortunate that IntelliJ inserts a bunch of
>>             annotations in user code. I wonder if there is something
>>             we can do about that. At the Java level, if they are not
>>             on the classpath they should be ignored and not affect
>>             users. Coincidentally, the JDK has had
>>             NullPointerExceptions in this area :-)  [8].
>>
>>             I understand the pain of longer compile times slowing
>>             people down. That is actually a problem to be solved
>>             which does not require lowering our standards of quality.
>>             How about we try moving it to a separate CI job and see
>>             how it goes?
>>
>>             In my experience stories like Reuven's are much more
>>             frustrating in a separate CI job because you find out
>>             quite late that your code has flaws. Like when spotless
>>             fails, but much more work to fix, and would have been
>>             prevented long ago if it were integrated into the compile.
>>
>>
>>         I agree with this. I prefer to be able to detect on my
>>         computer that there are failures, and not have to wait for
>>         submission. The complaint was that some people are
>>         experiencing trouble on their local machine however, so it
>>         seems reasonable to add an opt-out flag (though I would
>>         prefer opt out to opt in).
>>
>>
>>             Kenn
>>
>>             [1]
>>             https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>             <https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf>
>>             [2]
>>             https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>             <https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf>
>>             [3]
>>             https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>             <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275>
>>             [4]
>>             https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>             <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)>
>>             [5] https://github.com/apache/beam/pull/12284
>>             <https://github.com/apache/beam/pull/12284> and
>>             https://github.com/apache/beam/pull/12162
>>             <https://github.com/apache/beam/pull/12162> and
>>             [6]
>>             https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>             <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878>
>>             [7]
>>             https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>             <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java>
>>             [8]
>>             https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>             <https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174>
>>
>>
>>             On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax
>>             <relax@google.com <ma...@google.com>> wrote:
>>
>>                 I have some deeper concerns with the null checks. The
>>                 fact that many libraries we use (including guava)
>>                 don't always annotate their methods forces a lot of
>>                 workarounds. As a very simple example, the return
>>                 value from Preconditions.checkNotNull clearly can
>>                 never be null, yet the nullability checks don't know
>>                 this. This and other similar cases require constantly
>>                 adding extra unnecessary null checks in the code just
>>                 to make the checker happy. There have been other
>>                 cases where I haven't been able to figure out a way
>>                 to make the checker happy (often these seem to
>>                 involve using lambdas), and after 10-15 minutes of
>>                 investigation have given up and disabled the check.
>>
>>                 Now you might say that it's worth the extra pain and
>>                 ugliness of writing "useless" code to ensure that we
>>                 have null-safe code. However I think this ignores a
>>                 sociological aspect of software development. I have a
>>                 higher tolerance than many for this sort of pain, and
>>                 I'm willing to spend some time figuring out how to
>>                 rewrite my code such that it makes the checker happy
>>                 (even though often it forced me to write much more
>>                 awkward code). However even I have often found myself
>>                 giving up and just disabling the check. Many others
>>                 will have less tolerance than me, and will simply
>>                 disable the checks. At that point we'll have a
>>                 codebase littered with @SuppressWarnings("nullness"),
>>                 which doesn't really get us where we want to be. I've
>>                 seen similar struggles in other codebases, and
>>                 generally having a static checker with too many false
>>                 positives often ends up being worse than having no
>>                 checker.
>>
>>                 On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía
>>                 <iemejia@gmail.com <ma...@gmail.com>> wrote:
>>
>>                     +1
>>
>>                     Even if I like the strictness for Null checking,
>>                     I also think that
>>                     this is adding too much extra time for builds
>>                     (that I noticed locally
>>                     when enabled) and also I agree with Jan that the
>>                     annotations are
>>                     really an undesired side effect. For reference
>>                     when you try to auto
>>                     complete some method signatures on IntelliJ on
>>                     downstream projects
>>                     with C-A-v it generates some extra Checkers
>>                     annotations like @NonNull
>>                     and others even if the user isn't using them
>>                     which is not desirable.
>>
>>
>>
>>                     On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver
>>                     <kcweaver@google.com
>>                     <ma...@google.com>> wrote:
>>                     >>
>>                     >> Big +1 for moving this to separate CI job. I
>>                     really don't like what annotations are currently
>>                     added to the code we ship. Tools like Idea add
>>                     these annotations to code they generate when
>>                     overriding classes and that's very annoying.
>>                     Users should not be exposed to internal tools
>>                     like nullability checking.
>>                     >
>>                     >
>>                     > I was only planning on moving this to a
>>                     separate CI job. The job would still be release
>>                     blocking, so the same annotations would still be
>>                     required.
>>                     >
>>                     > I'm not sure which annotations you are
>>                     concerned about. There are two annotations
>>                     involved with nullness checking,
>>                     @SuppressWarnings and @Nullable.
>>                     @SuppressWarnings has retention policy SOURCE, so
>>                     it shouldn't be exposed to users at all.
>>                     @Nullable is not just for internal tooling, it
>>                     also provides useful information about our APIs
>>                     to users. The user should not have to guess
>>                     whether a method argument etc. can be null or
>>                     not, and for better or worse, these annotations
>>                     are the standard way of expressing that in Java.
>>

Re: Null checking in Beam

Posted by Reuven Lax <re...@google.com>.

I've also been annoyed at IntelliJ autogenenerating all these annotations.
I believe Kenn said that this was not the intention - maybe there's an
IntelliJ setting that would stop this from happening?

On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:

> I don't know the details of the checkerframework, but there seems a
> contradiction between what code is (currently) generated and what we
> therefore release and what actually the checkerframework states [1]:
>
> @UnknownKeyFor:
>
> Used internally by the type system; should never be written by a
> programmer.
>
> If this annotation is generated for overwritten methods, then I'd say,
> that it means we place a great burden to our users - either not using
> autogenerated methods, or erase all the generated annotations afterwards.
> Either way, that is not "polite" from Beam.
>
> What we should judge is not only a formal purity of code, but what stands
> on the other side is how usable APIs we provide. We should not head for
> 100% pure code and sacrificing use comfort. Every check that leaks to user
> code is at a price and we should not ignore that. No free lunch.
>
> From my point of view:
>
>  a) if a check does not modify the bytecode, it is fine and we can use it
> - we are absolutely free to use any tooling we agree on, if our users
> cannot be affected anyhow
>
>  b) if a tool needs to be leaked to user, it should be as small leakage as
> possible
>
>  c) if a check significantly affects compile performance, it should be
> possible to opt-out
>
> I think that our current setup violates all these three points.
>
> Moving the check to different CI is a possibility (a)), it would then
> require opt-in flag to be able to run the check locally. It would also stop
> the leakage (if we would release code without this check).
>
> If we want to keep some annotations for user's benefit (which might be
> fine), it should be really limited to the bare minimum (e.g. only @Nullable
> for method arguments and return values, possibly more, I don't know if and
> how we can configure that). Definitely not @UnknownKeyFor, that is simply
> internal to the checker. We should then have opt-out flag for local
> development before committing changes.
>
>  Jan
>
> [1]
> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
> On 3/16/21 8:33 AM, Reuven Lax wrote:
>
>
>
> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:
>
>>
>>
>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> I will be blunt about my opinions about the general issue:
>>>
>>> - NullPointerExceptions (and similar) are a solved problem.
>>>    * They have been since 2003 at the latest [1] (this is when the types
>>> were hacked into Java - the foundation dates back to the 70s or earlier)
>>>
>>
>> Huh - Fahndrich tried to hire me once to work on a project called
>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
>> citations!
>>
>
> Umm, I was confusing names. Fahndrich is actually a former coworker at
> Google :)
>
>
>>
>>
>>>    * Checkerframework is a _pluggable_ system where that nullness type
>>> system is a "hello, world" level demo, since 2008 at the latest [2].
>>>    * Our users should know this and judge us accordingly.
>>>
>>> - Checkerframework should be thought of and described as type checking,
>>> because it is. It is not heuristic nor approximate.
>>> - If your code was unclear about whether something could be null, it was
>>> probably unclear to a person reading it also, and very likely to have
>>> actual bugs.
>>> - APIs that accept a lot of nullable parameters are, generally speaking,
>>> bad APIs. They are hard to use correctly, less readable, and very likely to
>>> cause actual bugs. You are forcing your users to deal with accidental
>>> complexity you left behind.
>>>   * Corollary to the above two points: Almost all the time, the changes
>>> to clearify nullness make your code better, more readable, easier for users
>>> or editors.
>>> - It is true that there is a learning curve to programming in this way.
>>>
>>
>> I agree with the above in a closed system. However as mentioned, some of
>> the APIs we use suffer from this.
>>
>> In a previous life, I saw up close an effort to add such analysis to a
>> large codebase. Two separate tools called Prefix and Prefast were used (the
>> difference between the two is actually quite interesting, but not relevant
>> here). However in order to make this analysis useful, there was a massive
>> effort to properly annotate the entire codebase, including all libraries
>> used. This isn't a perfect example - this was a C++ codebase which is much
>> harder to analyze, and these tools identified far more than simply nullness
>> errors (resource leaks, array indices, proper string null termination,
>> exception behavior, etc.). However the closer we can get to properly
>> annotating the transitive closure of our dependencies, the better this
>> framework will work.
>>
>>
>>
>>> - There are certainly common patterns in Java that do not work very
>>> well, and suppression is sometimes the best option.
>>>    * Example: JUnit's @Setup and @Test conventions do not work very well
>>> and it is not worth the effort to make them work. This is actually because
>>> if it were "normal code" it would be bad code. There are complex
>>> inter-method dependencies enforced only by convention. This matters:
>>> sometimes a JUnit test class is called from another class, used as "normal
>>> code". This does go wrong in practice. Plain old JUnit test cases
>>> frequently go wrong as well.
>>>
>>> And here is my opinion when it comes to Beam:
>>>
>>> - "Community over code" is not an excuse for negligent practices that
>>> cause easily avoidable risk to our users. I will be very disappointed if we
>>> choose that.
>>> - I think having tooling that helps newcomers write better code by
>>> default is better for the community too. Just like having automatic
>>> formatting is better. Less to haggle about in review, etc.
>>> - A simple search reveals about 170 bugs that we know of [4].
>>> - So far in almost every module I have fixed I discovered actual new
>>> null errors. Many examples at [5].
>>> - It is extremely easy to suppress the type checking. Almost all of our
>>> classes have it suppressed already (I did this work, to allow existing
>>> errors while protecting new code).
>>> - Including the annotations in the shipped jars is an important feature.
>>> Without this, users cannot write null-safe code themselves.
>>>    * Reuven highlighted this: when methods are not annotated, we have to
>>> use/implement workarounds. Actually Guava does use checkerframework
>>> annotations [6] and the problem is that it requires its *input* to already
>>> be non-null so actually you cannot even use it to convert nullable values
>>> to non-nullable values.
>>>    * Beam has its own [7] that is annotated, actually for yet another
>>> reason: when Guava's checkNotNull fails, it throws NPE when it should throw
>>> IllegalArgumentException. Guava's checkNotNull should not be used for input
>>> validation!
>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in user
>>> code. I wonder if there is something we can do about that. At the Java
>>> level, if they are not on the classpath they should be ignored and not
>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>>> area :-)  [8].
>>>
>>> I understand the pain of longer compile times slowing people down. That
>>> is actually a problem to be solved which does not require lowering our
>>> standards of quality. How about we try moving it to a separate CI job and
>>> see how it goes?
>>>
>>>
>>
>>> In my experience stories like Reuven's are much more frustrating in a
>>> separate CI job because you find out quite late that your code has flaws.
>>> Like when spotless fails, but much more work to fix, and would have been
>>> prevented long ago if it were integrated into the compile.
>>>
>>
>> I agree with this. I prefer to be able to detect on my computer that
>> there are failures, and not have to wait for submission. The complaint was
>> that some people are experiencing trouble on their local machine however,
>> so it seems reasonable to add an opt-out flag (though I would prefer opt
>> out to opt in).
>>
>>
>>>
>>> Kenn
>>>
>>> [1]
>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>> [2]
>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>> [3]
>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>> [4]
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>> [5] https://github.com/apache/beam/pull/12284 and
>>> https://github.com/apache/beam/pull/12162 and
>>> [6]
>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>> [7]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>
>>>
>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> I have some deeper concerns with the null checks. The fact that many
>>>> libraries we use (including guava) don't always annotate their methods
>>>> forces a lot of workarounds. As a very simple example, the return value
>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>> nullability checks don't know this. This and other similar cases require
>>>> constantly adding extra unnecessary null checks in the code just to make
>>>> the checker happy. There have been other cases where I haven't been able to
>>>> figure out a way to make the checker happy (often these seem to involve
>>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>>> disabled the check.
>>>>
>>>> Now you might say that it's worth the extra pain and ugliness of
>>>> writing "useless" code to ensure that we have null-safe code. However I
>>>> think this ignores a sociological aspect of software development. I have a
>>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>>> some time figuring out how to rewrite my code such that it makes the
>>>> checker happy (even though often it forced me to write much more awkward
>>>> code). However even I have often found myself giving up and just disabling
>>>> the check. Many others will have less tolerance than me, and will simply
>>>> disable the checks. At that point we'll have a codebase littered with
>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>>> be. I've seen similar struggles in other codebases, and generally having a
>>>> static checker with too many false positives often ends up being worse than
>>>> having no checker.
>>>>
>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Even if I like the strictness for Null checking, I also think that
>>>>> this is adding too much extra time for builds (that I noticed locally
>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>> really an undesired side effect. For reference when you try to auto
>>>>> complete some method signatures on IntelliJ on downstream projects
>>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>>> and others even if the user isn't using them which is not desirable.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Big +1 for moving this to separate CI job. I really don't like what
>>>>> annotations are currently added to the code we ship. Tools like Idea add
>>>>> these annotations to code they generate when overriding classes and that's
>>>>> very annoying. Users should not be exposed to internal tools like
>>>>> nullability checking.
>>>>> >
>>>>> >
>>>>> > I was only planning on moving this to a separate CI job. The job
>>>>> would still be release blocking, so the same annotations would still be
>>>>> required.
>>>>> >
>>>>> > I'm not sure which annotations you are concerned about. There are
>>>>> two annotations involved with nullness checking, @SuppressWarnings and
>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>>> also provides useful information about our APIs to users. The user should
>>>>> not have to guess whether a method argument etc. can be null or not, and
>>>>> for better or worse, these annotations are the standard way of expressing
>>>>> that in Java.
>>>>>
>>>>

Re: Null checking in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

I don't know the details of the checkerframework, but there seems a 
contradiction between what code is (currently) generated and what we 
therefore release and what actually the checkerframework states [1]:

@UnknownKeyFor:

Used internally by the type system; should never be written by a 
programmer.

If this annotation is generated for overwritten methods, then I'd say, 
that it means we place a great burden to our users - either not using 
autogenerated methods, or erase all the generated annotations 
afterwards. Either way, that is not "polite" from Beam.

What we should judge is not only a formal purity of code, but what 
stands on the other side is how usable APIs we provide. We should not 
head for 100% pure code and sacrificing use comfort. Every check that 
leaks to user code is at a price and we should not ignore that. No free 
lunch.

 From my point of view:

  a) if a check does not modify the bytecode, it is fine and we can use 
it - we are absolutely free to use any tooling we agree on, if our users 
cannot be affected anyhow

  b) if a tool needs to be leaked to user, it should be as small leakage 
as possible

  c) if a check significantly affects compile performance, it should be 
possible to opt-out

I think that our current setup violates all these three points.

Moving the check to different CI is a possibility (a)), it would then 
require opt-in flag to be able to run the check locally. It would also 
stop the leakage (if we would release code without this check).

If we want to keep some annotations for user's benefit (which might be 
fine), it should be really limited to the bare minimum (e.g. only 
@Nullable for method arguments and return values, possibly more, I don't 
know if and how we can configure that). Definitely not @UnknownKeyFor, 
that is simply internal to the checker. We should then have opt-out flag 
for local development before committing changes.

  Jan

[1] 
https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html

On 3/16/21 8:33 AM, Reuven Lax wrote:
>
>
> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <relax@google.com 
> <ma...@google.com>> wrote:
>
>
>
>     On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <kenn@apache.org
>     <ma...@apache.org>> wrote:
>
>         I will be blunt about my opinions about the general issue:
>
>         - NullPointerExceptions (and similar) are a solved problem.
>            * They have been since 2003 at the latest [1] (this is when
>         the types were hacked into Java - the foundation dates back to
>         the 70s or earlier)
>
>
>     Huh - Fahndrich tried to hire me once to work on a project called
>     Singularity. Small world. Also note that Sanjay Ghemawat is listed
>     in the citations!
>
>
> Umm, I was confusing names. Fahndrich is actually a former coworker at 
> Google :)
>
>            * Checkerframework is a _pluggable_ system where that
>         nullness type system is a "hello, world" level demo, since
>         2008 at the latest [2].
>            * Our users should know this and judge us accordingly.
>
>         - Checkerframework should be thought of and described as type
>         checking, because it is. It is not heuristic nor approximate.
>         - If your code was unclear about whether something could be
>         null, it was probably unclear to a person reading it also, and
>         very likely to have actual bugs.
>         - APIs that accept a lot of nullable parameters are, generally
>         speaking, bad APIs. They are hard to use correctly, less
>         readable, and very likely to cause actual bugs. You are
>         forcing your users to deal with accidental complexity you left
>         behind.
>           * Corollary to the above two points: Almost all the time,
>         the changes to clearify nullness make your code better, more
>         readable, easier for users or editors.
>         - It is true that there is a learning curve to programming in
>         this way.
>
>
>     I agree with the above in a closed system. However as mentioned,
>     some of the APIs we use suffer from this.
>
>     In a previous life, I saw up close an effort to add such analysis
>     to a large codebase. Two separate tools called Prefix and Prefast
>     were used (the difference between the two is actually quite
>     interesting, but not relevant here). However in order to make this
>     analysis useful, there was a massive effort to properly annotate
>     the entire codebase, including all libraries used. This isn't a
>     perfect example - this was a C++ codebase which is much harder to
>     analyze, and these tools identified far more than simply nullness
>     errors (resource leaks, array indices, proper string null
>     termination, exception behavior, etc.). However the closer we can
>     get to properly annotating the transitive closure of our
>     dependencies, the better this framework will work.
>
>         - There are certainly common patterns in Java that do not work
>         very well, and suppression is sometimes the best option.
>            * Example: JUnit's @Setup and @Test conventions do not work
>         very well and it is not worth the effort to make them work.
>         This is actually because if it were "normal code" it would be
>         bad code. There are complex inter-method dependencies enforced
>         only by convention. This matters: sometimes a JUnit test class
>         is called from another class, used as "normal code". This does
>         go wrong in practice. Plain old JUnit test cases frequently go
>         wrong as well.
>
>         And here is my opinion when it comes to Beam:
>
>         - "Community over code" is not an excuse for negligent
>         practices that cause easily avoidable risk to our users. I
>         will be very disappointed if we choose that.
>         - I think having tooling that helps newcomers write better
>         code by default is better for the community too. Just like
>         having automatic formatting is better. Less to haggle about in
>         review, etc.
>         - A simple search reveals about 170 bugs that we know of [4].
>         - So far in almost every module I have fixed I discovered
>         actual new null errors. Many examples at [5].
>         - It is extremely easy to suppress the type checking. Almost
>         all of our classes have it suppressed already (I did this
>         work, to allow existing errors while protecting new code).
>         - Including the annotations in the shipped jars is an
>         important feature. Without this, users cannot write null-safe
>         code themselves.
>            * Reuven highlighted this: when methods are not annotated,
>         we have to use/implement workarounds. Actually Guava does use
>         checkerframework annotations [6] and the problem is that it
>         requires its *input* to already be non-null so actually you
>         cannot even use it to convert nullable values to non-nullable
>         values.
>            * Beam has its own [7] that is annotated, actually for yet
>         another reason: when Guava's checkNotNull fails, it throws NPE
>         when it should throw IllegalArgumentException. Guava's
>         checkNotNull should not be used for input validation!
>         - It is unfortunate that IntelliJ inserts a bunch of
>         annotations in user code. I wonder if there is something we
>         can do about that. At the Java level, if they are not on the
>         classpath they should be ignored and not affect users.
>         Coincidentally, the JDK has had NullPointerExceptions in this
>         area :-) [8].
>
>         I understand the pain of longer compile times slowing people
>         down. That is actually a problem to be solved which does not
>         require lowering our standards of quality. How about we try
>         moving it to a separate CI job and see how it goes?
>
>         In my experience stories like Reuven's are much more
>         frustrating in a separate CI job because you find out quite
>         late that your code has flaws. Like when spotless fails, but
>         much more work to fix, and would have been prevented long ago
>         if it were integrated into the compile.
>
>
>     I agree with this. I prefer to be able to detect on my computer
>     that there are failures, and not have to wait for submission. The
>     complaint was that some people are experiencing trouble on their
>     local machine however, so it seems reasonable to add an opt-out
>     flag (though I would prefer opt out to opt in).
>
>
>         Kenn
>
>         [1]
>         https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>         <https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf>
>         [2]
>         https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>         <https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf>
>         [3]
>         https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>         <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275>
>         [4]
>         https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>         <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)>
>         [5] https://github.com/apache/beam/pull/12284
>         <https://github.com/apache/beam/pull/12284> and
>         https://github.com/apache/beam/pull/12162
>         <https://github.com/apache/beam/pull/12162> and
>         [6]
>         https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>         <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878>
>         [7]
>         https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>         <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java>
>         [8]
>         https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>         <https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174>
>
>
>         On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <relax@google.com
>         <ma...@google.com>> wrote:
>
>             I have some deeper concerns with the null checks. The fact
>             that many libraries we use (including guava) don't always
>             annotate their methods forces a lot of workarounds. As a
>             very simple example, the return value from
>             Preconditions.checkNotNull clearly can never be null, yet
>             the nullability checks don't know this. This and other
>             similar cases require constantly adding extra unnecessary
>             null checks in the code just to make the checker happy.
>             There have been other cases where I haven't been able to
>             figure out a way to make the checker happy (often these
>             seem to involve using lambdas), and after 10-15 minutes of
>             investigation have given up and disabled the check.
>
>             Now you might say that it's worth the extra pain and
>             ugliness of writing "useless" code to ensure that we have
>             null-safe code. However I think this ignores a
>             sociological aspect of software development. I have a
>             higher tolerance than many for this sort of pain, and I'm
>             willing to spend some time figuring out how to rewrite my
>             code such that it makes the checker happy (even though
>             often it forced me to write much more awkward code).
>             However even I have often found myself giving up and just
>             disabling the check. Many others will have less tolerance
>             than me, and will simply disable the checks. At that point
>             we'll have a codebase littered with
>             @SuppressWarnings("nullness"), which doesn't really get us
>             where we want to be. I've seen similar struggles in other
>             codebases, and generally having a static checker with too
>             many false positives often ends up being worse than having
>             no checker.
>
>             On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía
>             <iemejia@gmail.com <ma...@gmail.com>> wrote:
>
>                 +1
>
>                 Even if I like the strictness for Null checking, I
>                 also think that
>                 this is adding too much extra time for builds (that I
>                 noticed locally
>                 when enabled) and also I agree with Jan that the
>                 annotations are
>                 really an undesired side effect. For reference when
>                 you try to auto
>                 complete some method signatures on IntelliJ on
>                 downstream projects
>                 with C-A-v it generates some extra Checkers
>                 annotations like @NonNull
>                 and others even if the user isn't using them which is
>                 not desirable.
>
>
>
>                 On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver
>                 <kcweaver@google.com <ma...@google.com>> wrote:
>                 >>
>                 >> Big +1 for moving this to separate CI job. I really
>                 don't like what annotations are currently added to the
>                 code we ship. Tools like Idea add these annotations to
>                 code they generate when overriding classes and that's
>                 very annoying. Users should not be exposed to internal
>                 tools like nullability checking.
>                 >
>                 >
>                 > I was only planning on moving this to a separate CI
>                 job. The job would still be release blocking, so the
>                 same annotations would still be required.
>                 >
>                 > I'm not sure which annotations you are concerned
>                 about. There are two annotations involved with
>                 nullness checking, @SuppressWarnings and @Nullable.
>                 @SuppressWarnings has retention policy SOURCE, so it
>                 shouldn't be exposed to users at all. @Nullable is not
>                 just for internal tooling, it also provides useful
>                 information about our APIs to users. The user should
>                 not have to guess whether a method argument etc. can
>                 be null or not, and for better or worse, these
>                 annotations are the standard way of expressing that in
>                 Java.
>

Re: Null checking in Beam

Posted by Reuven Lax <re...@google.com>.

On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:

>
>
> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> I will be blunt about my opinions about the general issue:
>>
>> - NullPointerExceptions (and similar) are a solved problem.
>>    * They have been since 2003 at the latest [1] (this is when the types
>> were hacked into Java - the foundation dates back to the 70s or earlier)
>>
>
> Huh - Fahndrich tried to hire me once to work on a project called
> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
> citations!
>

Umm, I was confusing names. Fahndrich is actually a former coworker at
Google :)


>
>
>>    * Checkerframework is a _pluggable_ system where that nullness type
>> system is a "hello, world" level demo, since 2008 at the latest [2].
>>    * Our users should know this and judge us accordingly.
>>
>> - Checkerframework should be thought of and described as type checking,
>> because it is. It is not heuristic nor approximate.
>> - If your code was unclear about whether something could be null, it was
>> probably unclear to a person reading it also, and very likely to have
>> actual bugs.
>> - APIs that accept a lot of nullable parameters are, generally speaking,
>> bad APIs. They are hard to use correctly, less readable, and very likely to
>> cause actual bugs. You are forcing your users to deal with accidental
>> complexity you left behind.
>>   * Corollary to the above two points: Almost all the time, the changes
>> to clearify nullness make your code better, more readable, easier for users
>> or editors.
>> - It is true that there is a learning curve to programming in this way.
>>
>
> I agree with the above in a closed system. However as mentioned, some of
> the APIs we use suffer from this.
>
> In a previous life, I saw up close an effort to add such analysis to a
> large codebase. Two separate tools called Prefix and Prefast were used (the
> difference between the two is actually quite interesting, but not relevant
> here). However in order to make this analysis useful, there was a massive
> effort to properly annotate the entire codebase, including all libraries
> used. This isn't a perfect example - this was a C++ codebase which is much
> harder to analyze, and these tools identified far more than simply nullness
> errors (resource leaks, array indices, proper string null termination,
> exception behavior, etc.). However the closer we can get to properly
> annotating the transitive closure of our dependencies, the better this
> framework will work.
>
>
>
>> - There are certainly common patterns in Java that do not work very well,
>> and suppression is sometimes the best option.
>>    * Example: JUnit's @Setup and @Test conventions do not work very well
>> and it is not worth the effort to make them work. This is actually because
>> if it were "normal code" it would be bad code. There are complex
>> inter-method dependencies enforced only by convention. This matters:
>> sometimes a JUnit test class is called from another class, used as "normal
>> code". This does go wrong in practice. Plain old JUnit test cases
>> frequently go wrong as well.
>>
>> And here is my opinion when it comes to Beam:
>>
>> - "Community over code" is not an excuse for negligent practices that
>> cause easily avoidable risk to our users. I will be very disappointed if we
>> choose that.
>> - I think having tooling that helps newcomers write better code by
>> default is better for the community too. Just like having automatic
>> formatting is better. Less to haggle about in review, etc.
>> - A simple search reveals about 170 bugs that we know of [4].
>> - So far in almost every module I have fixed I discovered actual new null
>> errors. Many examples at [5].
>> - It is extremely easy to suppress the type checking. Almost all of our
>> classes have it suppressed already (I did this work, to allow existing
>> errors while protecting new code).
>> - Including the annotations in the shipped jars is an important feature.
>> Without this, users cannot write null-safe code themselves.
>>    * Reuven highlighted this: when methods are not annotated, we have to
>> use/implement workarounds. Actually Guava does use checkerframework
>> annotations [6] and the problem is that it requires its *input* to already
>> be non-null so actually you cannot even use it to convert nullable values
>> to non-nullable values.
>>    * Beam has its own [7] that is annotated, actually for yet another
>> reason: when Guava's checkNotNull fails, it throws NPE when it should throw
>> IllegalArgumentException. Guava's checkNotNull should not be used for input
>> validation!
>> - It is unfortunate that IntelliJ inserts a bunch of annotations in user
>> code. I wonder if there is something we can do about that. At the Java
>> level, if they are not on the classpath they should be ignored and not
>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>> area :-)  [8].
>>
>> I understand the pain of longer compile times slowing people down. That
>> is actually a problem to be solved which does not require lowering our
>> standards of quality. How about we try moving it to a separate CI job and
>> see how it goes?
>>
>>
>
>> In my experience stories like Reuven's are much more frustrating in a
>> separate CI job because you find out quite late that your code has flaws.
>> Like when spotless fails, but much more work to fix, and would have been
>> prevented long ago if it were integrated into the compile.
>>
>
> I agree with this. I prefer to be able to detect on my computer that there
> are failures, and not have to wait for submission. The complaint was that
> some people are experiencing trouble on their local machine however, so it
> seems reasonable to add an opt-out flag (though I would prefer opt out to
> opt in).
>
>
>>
>> Kenn
>>
>> [1]
>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>> [2]
>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>> [3]
>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>> [4]
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>> [5] https://github.com/apache/beam/pull/12284 and
>> https://github.com/apache/beam/pull/12162 and
>> [6]
>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>> [7]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>
>>
>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:
>>
>>> I have some deeper concerns with the null checks. The fact that many
>>> libraries we use (including guava) don't always annotate their methods
>>> forces a lot of workarounds. As a very simple example, the return value
>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>> nullability checks don't know this. This and other similar cases require
>>> constantly adding extra unnecessary null checks in the code just to make
>>> the checker happy. There have been other cases where I haven't been able to
>>> figure out a way to make the checker happy (often these seem to involve
>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>> disabled the check.
>>>
>>> Now you might say that it's worth the extra pain and ugliness of writing
>>> "useless" code to ensure that we have null-safe code. However I think this
>>> ignores a sociological aspect of software development. I have a
>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>> some time figuring out how to rewrite my code such that it makes the
>>> checker happy (even though often it forced me to write much more awkward
>>> code). However even I have often found myself giving up and just disabling
>>> the check. Many others will have less tolerance than me, and will simply
>>> disable the checks. At that point we'll have a codebase littered with
>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>> be. I've seen similar struggles in other codebases, and generally having a
>>> static checker with too many false positives often ends up being worse than
>>> having no checker.
>>>
>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>> Even if I like the strictness for Null checking, I also think that
>>>> this is adding too much extra time for builds (that I noticed locally
>>>> when enabled) and also I agree with Jan that the annotations are
>>>> really an undesired side effect. For reference when you try to auto
>>>> complete some method signatures on IntelliJ on downstream projects
>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>> and others even if the user isn't using them which is not desirable.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com>
>>>> wrote:
>>>> >>
>>>> >> Big +1 for moving this to separate CI job. I really don't like what
>>>> annotations are currently added to the code we ship. Tools like Idea add
>>>> these annotations to code they generate when overriding classes and that's
>>>> very annoying. Users should not be exposed to internal tools like
>>>> nullability checking.
>>>> >
>>>> >
>>>> > I was only planning on moving this to a separate CI job. The job
>>>> would still be release blocking, so the same annotations would still be
>>>> required.
>>>> >
>>>> > I'm not sure which annotations you are concerned about. There are two
>>>> annotations involved with nullness checking, @SuppressWarnings and
>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>> also provides useful information about our APIs to users. The user should
>>>> not have to guess whether a method argument etc. can be null or not, and
>>>> for better or worse, these annotations are the standard way of expressing
>>>> that in Java.
>>>>
>>>

Re: Null checking in Beam

Posted by Reuven Lax <re...@google.com>.

On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <ke...@apache.org> wrote:

> I will be blunt about my opinions about the general issue:
>
> - NullPointerExceptions (and similar) are a solved problem.
>    * They have been since 2003 at the latest [1] (this is when the types
> were hacked into Java - the foundation dates back to the 70s or earlier)
>

Huh - Fahndrich tried to hire me once to work on a project called
Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
citations!


>    * Checkerframework is a _pluggable_ system where that nullness type
> system is a "hello, world" level demo, since 2008 at the latest [2].
>    * Our users should know this and judge us accordingly.
>
> - Checkerframework should be thought of and described as type checking,
> because it is. It is not heuristic nor approximate.
> - If your code was unclear about whether something could be null, it was
> probably unclear to a person reading it also, and very likely to have
> actual bugs.
> - APIs that accept a lot of nullable parameters are, generally speaking,
> bad APIs. They are hard to use correctly, less readable, and very likely to
> cause actual bugs. You are forcing your users to deal with accidental
> complexity you left behind.
>   * Corollary to the above two points: Almost all the time, the changes to
> clearify nullness make your code better, more readable, easier for users or
> editors.
> - It is true that there is a learning curve to programming in this way.
>

I agree with the above in a closed system. However as mentioned, some of
the APIs we use suffer from this.

In a previous life, I saw up close an effort to add such analysis to a
large codebase. Two separate tools called Prefix and Prefast were used (the
difference between the two is actually quite interesting, but not relevant
here). However in order to make this analysis useful, there was a massive
effort to properly annotate the entire codebase, including all libraries
used. This isn't a perfect example - this was a C++ codebase which is much
harder to analyze, and these tools identified far more than simply nullness
errors (resource leaks, array indices, proper string null termination,
exception behavior, etc.). However the closer we can get to properly
annotating the transitive closure of our dependencies, the better this
framework will work.



> - There are certainly common patterns in Java that do not work very well,
> and suppression is sometimes the best option.
>    * Example: JUnit's @Setup and @Test conventions do not work very well
> and it is not worth the effort to make them work. This is actually because
> if it were "normal code" it would be bad code. There are complex
> inter-method dependencies enforced only by convention. This matters:
> sometimes a JUnit test class is called from another class, used as "normal
> code". This does go wrong in practice. Plain old JUnit test cases
> frequently go wrong as well.
>
> And here is my opinion when it comes to Beam:
>
> - "Community over code" is not an excuse for negligent practices that
> cause easily avoidable risk to our users. I will be very disappointed if we
> choose that.
> - I think having tooling that helps newcomers write better code by default
> is better for the community too. Just like having automatic formatting is
> better. Less to haggle about in review, etc.
> - A simple search reveals about 170 bugs that we know of [4].
> - So far in almost every module I have fixed I discovered actual new null
> errors. Many examples at [5].
> - It is extremely easy to suppress the type checking. Almost all of our
> classes have it suppressed already (I did this work, to allow existing
> errors while protecting new code).
> - Including the annotations in the shipped jars is an important feature.
> Without this, users cannot write null-safe code themselves.
>    * Reuven highlighted this: when methods are not annotated, we have to
> use/implement workarounds. Actually Guava does use checkerframework
> annotations [6] and the problem is that it requires its *input* to already
> be non-null so actually you cannot even use it to convert nullable values
> to non-nullable values.
>    * Beam has its own [7] that is annotated, actually for yet another
> reason: when Guava's checkNotNull fails, it throws NPE when it should throw
> IllegalArgumentException. Guava's checkNotNull should not be used for input
> validation!
> - It is unfortunate that IntelliJ inserts a bunch of annotations in user
> code. I wonder if there is something we can do about that. At the Java
> level, if they are not on the classpath they should be ignored and not
> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
> area :-)  [8].
>
> I understand the pain of longer compile times slowing people down. That is
> actually a problem to be solved which does not require lowering our
> standards of quality. How about we try moving it to a separate CI job and
> see how it goes?
>
>

> In my experience stories like Reuven's are much more frustrating in a
> separate CI job because you find out quite late that your code has flaws.
> Like when spotless fails, but much more work to fix, and would have been
> prevented long ago if it were integrated into the compile.
>

I agree with this. I prefer to be able to detect on my computer that there
are failures, and not have to wait for submission. The complaint was that
some people are experiencing trouble on their local machine however, so it
seems reasonable to add an opt-out flag (though I would prefer opt out to
opt in).


>
> Kenn
>
> [1]
> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
> [2]
> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
> [3]
> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
> [4]
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
> [5] https://github.com/apache/beam/pull/12284 and
> https://github.com/apache/beam/pull/12162 and
> [6]
> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
> [7]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>
>
> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:
>
>> I have some deeper concerns with the null checks. The fact that many
>> libraries we use (including guava) don't always annotate their methods
>> forces a lot of workarounds. As a very simple example, the return value
>> from Preconditions.checkNotNull clearly can never be null, yet the
>> nullability checks don't know this. This and other similar cases require
>> constantly adding extra unnecessary null checks in the code just to make
>> the checker happy. There have been other cases where I haven't been able to
>> figure out a way to make the checker happy (often these seem to involve
>> using lambdas), and after 10-15 minutes of investigation have given up and
>> disabled the check.
>>
>> Now you might say that it's worth the extra pain and ugliness of writing
>> "useless" code to ensure that we have null-safe code. However I think this
>> ignores a sociological aspect of software development. I have a
>> higher tolerance than many for this sort of pain, and I'm willing to spend
>> some time figuring out how to rewrite my code such that it makes the
>> checker happy (even though often it forced me to write much more awkward
>> code). However even I have often found myself giving up and just disabling
>> the check. Many others will have less tolerance than me, and will simply
>> disable the checks. At that point we'll have a codebase littered with
>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>> be. I've seen similar struggles in other codebases, and generally having a
>> static checker with too many false positives often ends up being worse than
>> having no checker.
>>
>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>>> +1
>>>
>>> Even if I like the strictness for Null checking, I also think that
>>> this is adding too much extra time for builds (that I noticed locally
>>> when enabled) and also I agree with Jan that the annotations are
>>> really an undesired side effect. For reference when you try to auto
>>> complete some method signatures on IntelliJ on downstream projects
>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>> and others even if the user isn't using them which is not desirable.
>>>
>>>
>>>
>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com> wrote:
>>> >>
>>> >> Big +1 for moving this to separate CI job. I really don't like what
>>> annotations are currently added to the code we ship. Tools like Idea add
>>> these annotations to code they generate when overriding classes and that's
>>> very annoying. Users should not be exposed to internal tools like
>>> nullability checking.
>>> >
>>> >
>>> > I was only planning on moving this to a separate CI job. The job would
>>> still be release blocking, so the same annotations would still be required.
>>> >
>>> > I'm not sure which annotations you are concerned about. There are two
>>> annotations involved with nullness checking, @SuppressWarnings and
>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>> also provides useful information about our APIs to users. The user should
>>> not have to guess whether a method argument etc. can be null or not, and
>>> for better or worse, these annotations are the standard way of expressing
>>> that in Java.
>>>
>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

I will be blunt about my opinions about the general issue:

- NullPointerExceptions (and similar) are a solved problem.
   * They have been since 2003 at the latest [1] (this is when the types
were hacked into Java - the foundation dates back to the 70s or earlier)
   * Checkerframework is a _pluggable_ system where that nullness type
system is a "hello, world" level demo, since 2008 at the latest [2].
   * Our users should know this and judge us accordingly.

- Checkerframework should be thought of and described as type checking,
because it is. It is not heuristic nor approximate.
- If your code was unclear about whether something could be null, it was
probably unclear to a person reading it also, and very likely to have
actual bugs.
- APIs that accept a lot of nullable parameters are, generally speaking,
bad APIs. They are hard to use correctly, less readable, and very likely to
cause actual bugs. You are forcing your users to deal with accidental
complexity you left behind.
  * Corollary to the above two points: Almost all the time, the changes to
clearify nullness make your code better, more readable, easier for users or
editors.
- It is true that there is a learning curve to programming in this way.
- There are certainly common patterns in Java that do not work very well,
and suppression is sometimes the best option.
   * Example: JUnit's @Setup and @Test conventions do not work very well
and it is not worth the effort to make them work. This is actually because
if it were "normal code" it would be bad code. There are complex
inter-method dependencies enforced only by convention. This matters:
sometimes a JUnit test class is called from another class, used as "normal
code". This does go wrong in practice. Plain old JUnit test cases
frequently go wrong as well.

And here is my opinion when it comes to Beam:

- "Community over code" is not an excuse for negligent practices that cause
easily avoidable risk to our users. I will be very disappointed if we
choose that.
- I think having tooling that helps newcomers write better code by default
is better for the community too. Just like having automatic formatting is
better. Less to haggle about in review, etc.
- A simple search reveals about 170 bugs that we know of [4].
- So far in almost every module I have fixed I discovered actual new null
errors. Many examples at [5].
- It is extremely easy to suppress the type checking. Almost all of our
classes have it suppressed already (I did this work, to allow existing
errors while protecting new code).
- Including the annotations in the shipped jars is an important feature.
Without this, users cannot write null-safe code themselves.
   * Reuven highlighted this: when methods are not annotated, we have to
use/implement workarounds. Actually Guava does use checkerframework
annotations [6] and the problem is that it requires its *input* to already
be non-null so actually you cannot even use it to convert nullable values
to non-nullable values.
   * Beam has its own [7] that is annotated, actually for yet another
reason: when Guava's checkNotNull fails, it throws NPE when it should throw
IllegalArgumentException. Guava's checkNotNull should not be used for input
validation!
- It is unfortunate that IntelliJ inserts a bunch of annotations in user
code. I wonder if there is something we can do about that. At the Java
level, if they are not on the classpath they should be ignored and not
affect users. Coincidentally, the JDK has had NullPointerExceptions in this
area :-)  [8].

I understand the pain of longer compile times slowing people down. That is
actually a problem to be solved which does not require lowering our
standards of quality. How about we try moving it to a separate CI job and
see how it goes?

In my experience stories like Reuven's are much more frustrating in a
separate CI job because you find out quite late that your code has flaws.
Like when spotless fails, but much more work to fix, and would have been
prevented long ago if it were integrated into the compile.

Kenn

[1]
https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
[2]
https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
[3]
https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
[4]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
[5] https://github.com/apache/beam/pull/12284 and
https://github.com/apache/beam/pull/12162 and
[6]
https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
[7]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
[8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174


On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:

> I have some deeper concerns with the null checks. The fact that many
> libraries we use (including guava) don't always annotate their methods
> forces a lot of workarounds. As a very simple example, the return value
> from Preconditions.checkNotNull clearly can never be null, yet the
> nullability checks don't know this. This and other similar cases require
> constantly adding extra unnecessary null checks in the code just to make
> the checker happy. There have been other cases where I haven't been able to
> figure out a way to make the checker happy (often these seem to involve
> using lambdas), and after 10-15 minutes of investigation have given up and
> disabled the check.
>
> Now you might say that it's worth the extra pain and ugliness of writing
> "useless" code to ensure that we have null-safe code. However I think this
> ignores a sociological aspect of software development. I have a
> higher tolerance than many for this sort of pain, and I'm willing to spend
> some time figuring out how to rewrite my code such that it makes the
> checker happy (even though often it forced me to write much more awkward
> code). However even I have often found myself giving up and just disabling
> the check. Many others will have less tolerance than me, and will simply
> disable the checks. At that point we'll have a codebase littered with
> @SuppressWarnings("nullness"), which doesn't really get us where we want to
> be. I've seen similar struggles in other codebases, and generally having a
> static checker with too many false positives often ends up being worse than
> having no checker.
>
> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> +1
>>
>> Even if I like the strictness for Null checking, I also think that
>> this is adding too much extra time for builds (that I noticed locally
>> when enabled) and also I agree with Jan that the annotations are
>> really an undesired side effect. For reference when you try to auto
>> complete some method signatures on IntelliJ on downstream projects
>> with C-A-v it generates some extra Checkers annotations like @NonNull
>> and others even if the user isn't using them which is not desirable.
>>
>>
>>
>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com> wrote:
>> >>
>> >> Big +1 for moving this to separate CI job. I really don't like what
>> annotations are currently added to the code we ship. Tools like Idea add
>> these annotations to code they generate when overriding classes and that's
>> very annoying. Users should not be exposed to internal tools like
>> nullability checking.
>> >
>> >
>> > I was only planning on moving this to a separate CI job. The job would
>> still be release blocking, so the same annotations would still be required.
>> >
>> > I'm not sure which annotations you are concerned about. There are two
>> annotations involved with nullness checking, @SuppressWarnings and
>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>> be exposed to users at all. @Nullable is not just for internal tooling, it
>> also provides useful information about our APIs to users. The user should
>> not have to guess whether a method argument etc. can be null or not, and
>> for better or worse, these annotations are the standard way of expressing
>> that in Java.
>>
>

Re: Null checking in Beam

Posted by Reuven Lax <re...@google.com>.

I have some deeper concerns with the null checks. The fact that many
libraries we use (including guava) don't always annotate their methods
forces a lot of workarounds. As a very simple example, the return value
from Preconditions.checkNotNull clearly can never be null, yet the
nullability checks don't know this. This and other similar cases require
constantly adding extra unnecessary null checks in the code just to make
the checker happy. There have been other cases where I haven't been able to
figure out a way to make the checker happy (often these seem to involve
using lambdas), and after 10-15 minutes of investigation have given up and
disabled the check.

Now you might say that it's worth the extra pain and ugliness of writing
"useless" code to ensure that we have null-safe code. However I think this
ignores a sociological aspect of software development. I have a
higher tolerance than many for this sort of pain, and I'm willing to spend
some time figuring out how to rewrite my code such that it makes the
checker happy (even though often it forced me to write much more awkward
code). However even I have often found myself giving up and just disabling
the check. Many others will have less tolerance than me, and will simply
disable the checks. At that point we'll have a codebase littered with
@SuppressWarnings("nullness"), which doesn't really get us where we want to
be. I've seen similar struggles in other codebases, and generally having a
static checker with too many false positives often ends up being worse than
having no checker.

On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ie...@gmail.com> wrote:

> +1
>
> Even if I like the strictness for Null checking, I also think that
> this is adding too much extra time for builds (that I noticed locally
> when enabled) and also I agree with Jan that the annotations are
> really an undesired side effect. For reference when you try to auto
> complete some method signatures on IntelliJ on downstream projects
> with C-A-v it generates some extra Checkers annotations like @NonNull
> and others even if the user isn't using them which is not desirable.
>
>
>
> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com> wrote:
> >>
> >> Big +1 for moving this to separate CI job. I really don't like what
> annotations are currently added to the code we ship. Tools like Idea add
> these annotations to code they generate when overriding classes and that's
> very annoying. Users should not be exposed to internal tools like
> nullability checking.
> >
> >
> > I was only planning on moving this to a separate CI job. The job would
> still be release blocking, so the same annotations would still be required.
> >
> > I'm not sure which annotations you are concerned about. There are two
> annotations involved with nullness checking, @SuppressWarnings and
> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
> be exposed to users at all. @Nullable is not just for internal tooling, it
> also provides useful information about our APIs to users. The user should
> not have to guess whether a method argument etc. can be null or not, and
> for better or worse, these annotations are the standard way of expressing
> that in Java.
>

Re: Null checking in Beam

Posted by Ismaël Mejía <ie...@gmail.com>.

+1

Even if I like the strictness for Null checking, I also think that
this is adding too much extra time for builds (that I noticed locally
when enabled) and also I agree with Jan that the annotations are
really an undesired side effect. For reference when you try to auto
complete some method signatures on IntelliJ on downstream projects
with C-A-v it generates some extra Checkers annotations like @NonNull
and others even if the user isn't using them which is not desirable.

On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kc...@google.com> wrote:
>>
>> Big +1 for moving this to separate CI job. I really don't like what annotations are currently added to the code we ship. Tools like Idea add these annotations to code they generate when overriding classes and that's very annoying. Users should not be exposed to internal tools like nullability checking.
>
>
> I was only planning on moving this to a separate CI job. The job would still be release blocking, so the same annotations would still be required.
>
> I'm not sure which annotations you are concerned about. There are two annotations involved with nullness checking, @SuppressWarnings and @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't be exposed to users at all. @Nullable is not just for internal tooling, it also provides useful information about our APIs to users. The user should not have to guess whether a method argument etc. can be null or not, and for better or worse, these annotations are the standard way of expressing that in Java.

Re: Null checking in Beam

Posted by Kyle Weaver <kc...@google.com>.

>
> Big +1 for moving this to separate CI job. I really don't like what
> annotations are currently added to the code we ship. Tools like Idea add
> these annotations to code they generate when overriding classes and that's
> very annoying. Users should not be exposed to internal tools like
> nullability checking.


I was only planning on moving this to a separate CI job. The job would
still be release blocking, so the same annotations would still be required.

I'm not sure which annotations you are concerned about. There are two
annotations involved with nullness checking, @SuppressWarnings
and @Nullable. @SuppressWarnings has retention policy SOURCE, so it
shouldn't be exposed to users at all. @Nullable is not just for internal
tooling, it also provides useful information about our APIs to users. The
user should not have to guess whether a method argument etc. can be null or
not, and for better or worse, these annotations are the standard way of
expressing that in Java.

>

Re: Null checking in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

Big +1 for moving this to separate CI job. I really don't like what 
annotations are currently added to the code we ship. Tools like Idea add 
these annotations to code they generate when overriding classes and 
that's very annoying. Users should not be exposed to internal tools like 
nullability checking.

Could this change be done for upcoming 2.29.0 release, so that is does 
not contain those annotations?

  Jan

On 3/13/21 1:11 AM, Kyle Weaver wrote:
> > Makes sense. We have had good experiences with spotless and spotbugs 
> so I think it can work as a separate CI job.
>
> Sounds good. I can set that up next week then.
>
> > You might also benefit from having a cache (not just up-to-date 
> checking). Actually the project as a whole would benefit from a shared 
> distributed cache I'm guessing.
>
> Is there a JIRA or something for that? I feel like this has been 
> discussed before, but I don't remember the details.
>
> On Fri, Mar 12, 2021 at 3:57 PM Kenneth Knowles <kenn@apache.org 
> <ma...@apache.org>> wrote:
>
>
>
>     On Fri, Mar 12, 2021 at 3:05 PM Kyle Weaver <kcweaver@google.com
>     <ma...@google.com>> wrote:
>
>         > On the other hand, modifying core is a less common developer
>         use case so passing a couple flags to skip it seems manageable
>         for those people who are touching the core.
>
>         Every time I check out a new git branch that has a modified
>         core compared to the previous branch (i.e. almost always) it
>         needs to be rebuilt, and runs the null checker again. I
>         usually use Intellij to run tests, and I haven't figured out
>         how to make it pass skipCheckerFramework by default, so I've
>         resorted to just setting "skipCheckerFramework=true" in my
>         gradle.properties.
>
>
>     Makes sense. We have had good experiences with spotless and
>     spotbugs so I think it can work as a separate CI job.
>
>     You might also benefit from having a cache (not just up-to-date
>     checking). Actually the project as a whole would benefit from a
>     shared distributed cache I'm guessing.
>
>     Kenn
>
>
>         On Fri, Mar 12, 2021 at 2:53 PM Kenneth Knowles
>         <kenn@apache.org <ma...@apache.org>> wrote:
>
>             I'm OK with this as long as they are treated as strictly
>             merge blocking.
>
>             On the other hand, modifying core is a less common
>             developer use case so passing a couple flags to skip it
>             seems manageable for those people who are touching the core.
>
>             Kenn
>
>             On Fri, Mar 12, 2021 at 1:19 PM Pablo Estrada
>             <pabloem@google.com <ma...@google.com>> wrote:
>
>                 Does it make sense to add a Jenkins precommit suite
>                 that runs null checking exclusively?
>
>                 On Fri, Mar 12, 2021 at 11:57 AM Kyle Weaver
>                 <kcweaver@google.com <ma...@google.com>> wrote:
>
>                     I don't mind fixing my code or suppressing
>                     nullness errors, but the cost of the null check
>                     itself is hurting my productivity. In the best
>                     case, null checks add about ten minutes to the
>                     build time for large modules like :sdks:java:core.
>                     In the worst case, they cause my build to fail
>                     altogether, because the framework logs a warning
>                     that "Memory constraints are impeding
>                     performance," which is interpreted by -Wall as an
>                     error. It might not be a problem on big machines
>                     with a lot of memory, but on my Macbook, and on
>                     the Github Actions executors it is a real problem.
>                     https://issues.apache.org/jira/browse/BEAM-11837
>                     <https://issues.apache.org/jira/browse/BEAM-11837>
>
>                     Since nullness checks seem to work fine for now on
>                     Jenkins, I propose making them opt-in rather than
>                     opt-out, and only run them on Jenkins by default.
>
>                     On Tue, Mar 2, 2021 at 12:13 PM Kyle Weaver
>                     <kcweaver@google.com <ma...@google.com>>
>                     wrote:
>
>                         Can you try adding the generated classes to
>                         generatedClassPatterns in the
>                         JavaNatureConfiguration?
>
>                         https://github.com/apache/beam/blob/03b883b415d27244ddabb17a0fb5bab147b86f89/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L92
>                         <https://github.com/apache/beam/blob/03b883b415d27244ddabb17a0fb5bab147b86f89/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L92>
>
>
>                         On Tue, Mar 2, 2021 at 12:05 AM Reuven Lax
>                         <relax@google.com <ma...@google.com>>
>                         wrote:
>
>                             I'm running into a problem with this
>                             check. I added a protocol-buffer file to a
>                             module (google-cloud-platform) that
>                             previously did have any protobuf files in
>                             it. The generated files contain lines that
>                             violate this null checker, so they won't
>                             compile. I can't annotate the files,
>                             because they are codegen files. I tried
>                             adding the package to spotbugs-filter.xml,
>                             but that didn't help.
>
>                             Any suggestions?
>
>                             Reuven
>
>                             On Fri, Jan 22, 2021 at 10:38 AM Brian
>                             Hulette <bhulette@google.com
>                             <ma...@google.com>> wrote:
>
>
>
>                                 On Fri, Jan 22, 2021 at 1:18 AM Jan
>                                 Lukavský <je.ik@seznam.cz
>                                 <ma...@seznam.cz>> wrote:
>
>                                     Hi,
>
>                                     I'll give my two cents here.
>
>                                     I'm not 100% sure that the 1-5% of
>                                     bugs are as severe as other types
>                                     of bugs. Yes, throwing NPEs at
>                                     user is not very polite. On the
>                                     other hand, many of these actually
>                                     boil down to user errors. Then we
>                                     might ask what a correct solution
>                                     would be. If we manage to figure
>                                     out what the actual problem is and
>                                     tell user what specifically is
>                                     missing or going wrong, that would
>                                     be just awesome. On the other
>                                     hand, if a tool used for avoiding
>                                     "unexpected" NPEs forces us to code
>
>                                        Object value =
>                                     Objects.requireNonNull(myNullableObject);
>                                     // or similar using Preconditions
>                                        value.doStuff();
>
>                                     instead of just
>
>                                     myNullableObject.doStuff()
>
>                                     what we actually did, is a) made a
>                                     framework happy, and b) changed a
>                                     line at which NPE is thrown by 1
>                                     (and yes, internally prevented JVM
>                                     from thrown SIGSEGV at itself, but
>                                     that is deeply technical thing).
>                                     Nothing changed semantically, from
>                                     user perspective.
>
>                                 I'd argue there's value in asking Beam
>                                 developers to make that change. It
>                                 makes us acknowledge that there's a
>                                 possibility myNullableObject is null.
>                                 If myNullableObject being null is
>                                 something relevant to the user we
>                                 would likely want to wrap it in some
>                                 other exception or provide a more
>                                 useful message than just NPE(!!).
>
>
>                                     Now, given that the framework
>                                     significantly rises compile time
>                                     (due to all the checks), causes
>                                     many "bugs" being reported by
>                                     static code analysis tools
>                                     (because the framework adds
>                                     @Nonnull default annotations
>                                     everywhere, even when Beam's code
>                                     actually counts with nullability
>                                     of a field), and given how much we
>                                     currently suppress these checks ($
>                                     git grep BEAM-10402 | wc -l ->
>                                     1981), I'd say this deserves a
>                                     deeper discussion.
>
>                                 The reason there are so many
>                                 suppressions is that fixing all the
>                                 errors is a monumental task. Rather
>                                 than addressing them all, Kenn
>                                 programmatically added suppressions
>                                 for classes that failed the checks
>                                 (https://github.com/apache/beam/pull/13261
>                                 <https://github.com/apache/beam/pull/13261>).
>                                 This allowed us to start running the
>                                 checker on the code that passes it
>                                 while the errors are fixed.
>
>                                      Jan
>
>
>                                     On 1/20/21 10:48 PM, Kenneth
>                                     Knowles wrote:
>>                                     Yes, completely sound nullability
>>                                     checking has been added to the
>>                                     project via checkerframework,
>>                                     based on a large number of NPE
>>                                     bugs (1-5% depending on how you
>>                                     search, but many other bugs
>>                                     likely attributable to
>>                                     nullness-based design errors)
>>                                     which are extra embarrassing
>>                                     because NPEs have were
>>                                     essentially solved, even in
>>                                     practice for Java, well before
>>                                     Beam existed.
>>
>>                                     Checker framework is a pluggable
>>                                     type system analysis with some
>>                                     amount of control flow
>>                                     sensitivity. Every value has a
>>                                     type that is either nullable or
>>                                     not, and certain control
>>                                     structures (like checking for
>>                                     null) can alter the type inside a
>>                                     scope. The best way to think
>>                                     about it is to consider every
>>                                     value in the program as either
>>                                     nullable or not, much like you
>>                                     think of every value as either a
>>                                     string or not, and to view method
>>                                     calls as inherently
>>                                     stateful/nondetermistic. This can
>>                                     be challenging in esoteric cases,
>>                                     but usually makes the overall
>>                                     code health better anyhow.
>>
>>                                     Your example illustrates how
>>                                     problematic the design of the
>>                                     Java language is: the analysis
>>                                     cannot assume that
>>                                     `getDescription` is a pure
>>                                     function, and neither should you.
>>                                     Even if it is aware of
>>                                     boolean-short-circuit it would
>>                                     not be sound to accept this code.
>>                                     There is an annotation for this
>>                                     in the cases where it is true
>>                                     (like proto-generate getters):
>>                                     https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html
>>                                     <https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html>
>>
>>                                     The refactor for cases like this
>>                                     is trivial so there isn't a lot
>>                                     of value to thinking too hard
>>                                     about it.
>>
>>                                     if
>>                                     (statusCode.equals(Code.INVALID_ARGUMENT)
>>                                       @Nullable String desc =
>>                                     statusCode.toStatus().getDescription();
>>                                       if (desc != null &&
>>                                     desc.contains("finalized")) {
>>                                         return false;
>>                                       }
>>                                     }
>>
>>                                     To a casual eye, this may look
>>                                     like a noop change. To the
>>                                     analysis it makes all the
>>                                     difference. And IMO this
>>                                     difference is real. Humans may
>>                                     assume it is a noop and humans
>>                                     would be wrong. So many times
>>                                     when you think/expect/hope that
>>                                     `getXYZ()` is a trivial getter
>>                                     method you later learn that it
>>                                     was tweaked for some odd reason.
>>                                     I believe this code change makes
>>                                     the code better. Suppressing
>>                                     these errors should be
>>                                     exceptionally rare, and never in
>>                                     normal code. It is far better to
>>                                     improve your code than to
>>                                     suppress the issue.
>>
>>                                     It would be very cool for the
>>                                     proto compiler to annotate enough
>>                                     that new-and-improved type
>>                                     checkers could make things more
>>                                     convenient.
>>
>>                                     Kenn
>>
>>                                     On Mon, Jan 11, 2021 at 8:53 PM
>>                                     Reuven Lax <relax@google.com
>>                                     <ma...@google.com>> wrote:
>>
>>                                         I can use that trick. However
>>                                         I'm surprised that the check
>>                                         appears to be so simplistic.
>>
>>                                         For example, the following
>>                                         code triggers the check on
>>                                         getDescription().contains(),
>>                                         since getDescription returns
>>                                         a Nullable string. However
>>                                         even a simplistic analysis
>>                                         should realize that
>>                                         getDescription() was checked
>>                                         for null first! I have a
>>                                         dozen or so cases like this,
>>                                         and I question how useful
>>                                         such a simplistic check it
>>                                         will be.
>>
>>                                         if (statusCode.equals(Code.INVALID_ARGUMENT)
>>                                         &&statusCode.toStatus().getDescription() !=null &&statusCode.toStatus().getDescription().contains("finalized")) {return false;
>>                                         }
>>
>>
>>                                         On Mon, Jan 11, 2021 at 8:32
>>                                         PM Boyuan Zhang
>>                                         <boyuanz@google.com
>>                                         <ma...@google.com>>
>>                                         wrote:
>>
>>                                             Yeah it seems like the
>>                                             checker is enabled:
>>                                             https://issues.apache.org/jira/browse/BEAM-10402
>>                                             <https://issues.apache.org/jira/browse/BEAM-10402>.
>>                                             I used
>>                                             @SuppressWarnings({"nullness"
>>                                             )}) to suppress the error
>>                                             when I think it's not
>>                                             really a concern.
>>
>>                                             On Mon, Jan 11, 2021 at
>>                                             8:28 PM Reuven Lax
>>                                             <relax@google.com
>>                                             <ma...@google.com>>
>>                                             wrote:
>>
>>                                                 Has extra Nullable
>>                                                 checking been enabled
>>                                                 in the Beam project?
>>                                                 I have a PR that was
>>                                                 on hold for several
>>                                                 months, and I'm
>>                                                 struggling now with
>>                                                 compile failing to
>>                                                 complaints about
>>                                                 assigning something
>>                                                 that is nullable to
>>                                                 something that is not
>>                                                 nullable. Even when
>>                                                 the immediate control
>>                                                 flow makes it
>>                                                 absolutely impossible
>>                                                 for the variable to
>>                                                 be null.
>>
>>                                                 Has something changed
>>                                                 here?
>>
>>                                                 Reuven
>>

Re: Null checking in Beam

Posted by Kyle Weaver <kc...@google.com>.

> Makes sense. We have had good experiences with spotless and spotbugs so I
think it can work as a separate CI job.

Sounds good. I can set that up next week then.

> You might also benefit from having a cache (not just up-to-date
checking). Actually the project as a whole would benefit from a shared
distributed cache I'm guessing.

Is there a JIRA or something for that? I feel like this has been discussed
before, but I don't remember the details.

On Fri, Mar 12, 2021 at 3:57 PM Kenneth Knowles <ke...@apache.org> wrote:

>
>
> On Fri, Mar 12, 2021 at 3:05 PM Kyle Weaver <kc...@google.com> wrote:
>
>> > On the other hand, modifying core is a less common developer use case
>> so passing a couple flags to skip it seems manageable for those people who
>> are touching the core.
>>
>> Every time I check out a new git branch that has a modified core compared
>> to the previous branch (i.e. almost always) it needs to be rebuilt, and
>> runs the null checker again. I usually use Intellij to run tests, and I
>> haven't figured out how to make it pass skipCheckerFramework by default, so
>> I've resorted to just setting "skipCheckerFramework=true" in my
>> gradle.properties.
>>
>
> Makes sense. We have had good experiences with spotless and spotbugs so I
> think it can work as a separate CI job.
>
> You might also benefit from having a cache (not just up-to-date checking).
> Actually the project as a whole would benefit from a shared distributed
> cache I'm guessing.
>
> Kenn
>
>
>>
>> On Fri, Mar 12, 2021 at 2:53 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> I'm OK with this as long as they are treated as strictly merge blocking.
>>>
>>> On the other hand, modifying core is a less common developer use case so
>>> passing a couple flags to skip it seems manageable for those people who are
>>> touching the core.
>>>
>>> Kenn
>>>
>>> On Fri, Mar 12, 2021 at 1:19 PM Pablo Estrada <pa...@google.com>
>>> wrote:
>>>
>>>> Does it make sense to add a Jenkins precommit suite that runs null
>>>> checking exclusively?
>>>>
>>>> On Fri, Mar 12, 2021 at 11:57 AM Kyle Weaver <kc...@google.com>
>>>> wrote:
>>>>
>>>>> I don't mind fixing my code or suppressing nullness errors, but the
>>>>> cost of the null check itself is hurting my productivity. In the best case,
>>>>> null checks add about ten minutes to the build time for large modules
>>>>> like :sdks:java:core. In the worst case, they cause my build to fail
>>>>> altogether, because the framework logs a warning that "Memory constraints
>>>>> are impeding performance," which is interpreted by -Wall as an error. It
>>>>> might not be a problem on big machines with a lot of memory, but on my
>>>>> Macbook, and on the Github Actions executors it is a real problem.
>>>>> https://issues.apache.org/jira/browse/BEAM-11837
>>>>>
>>>>> Since nullness checks seem to work fine for now on Jenkins, I propose
>>>>> making them opt-in rather than opt-out, and only run them on Jenkins by
>>>>> default.
>>>>>
>>>>> On Tue, Mar 2, 2021 at 12:13 PM Kyle Weaver <kc...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Can you try adding the generated classes to generatedClassPatterns in
>>>>>> the JavaNatureConfiguration?
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/beam/blob/03b883b415d27244ddabb17a0fb5bab147b86f89/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L92
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 2, 2021 at 12:05 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> I'm running into a problem with this check. I added a
>>>>>>> protocol-buffer file to a module (google-cloud-platform) that previously
>>>>>>> did have any protobuf files in it. The generated files contain lines that
>>>>>>> violate this null checker, so they won't compile. I can't annotate the
>>>>>>> files, because they are codegen files. I tried adding the package to
>>>>>>> spotbugs-filter.xml, but that didn't help.
>>>>>>>
>>>>>>> Any suggestions?
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Fri, Jan 22, 2021 at 10:38 AM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <je...@seznam.cz>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'll give my two cents here.
>>>>>>>>>
>>>>>>>>> I'm not 100% sure that the 1-5% of bugs are as severe as other
>>>>>>>>> types of bugs. Yes, throwing NPEs at user is not very polite. On the other
>>>>>>>>> hand, many of these actually boil down to user errors. Then we might ask
>>>>>>>>> what a correct solution would be. If we manage to figure out what the
>>>>>>>>> actual problem is and tell user what specifically is missing or going
>>>>>>>>> wrong, that would be just awesome. On the other hand, if a tool used for
>>>>>>>>> avoiding "unexpected" NPEs forces us to code
>>>>>>>>>
>>>>>>>>>    Object value = Objects.requireNonNull(myNullableObject); // or
>>>>>>>>> similar using Preconditions
>>>>>>>>>    value.doStuff();
>>>>>>>>>
>>>>>>>>> instead of just
>>>>>>>>>
>>>>>>>>>   myNullableObject.doStuff()
>>>>>>>>>
>>>>>>>>> what we actually did, is a) made a framework happy, and b) changed
>>>>>>>>> a line at which NPE is thrown by 1 (and yes, internally prevented JVM from
>>>>>>>>> thrown SIGSEGV at itself, but that is deeply technical thing). Nothing
>>>>>>>>> changed semantically, from user perspective.
>>>>>>>>>
>>>>>>>> I'd argue there's value in asking Beam developers to make that
>>>>>>>> change. It makes us acknowledge that there's a possibility myNullableObject
>>>>>>>> is null. If myNullableObject being null is something relevant to the user
>>>>>>>> we would likely want to wrap it in some other exception or provide a more
>>>>>>>> useful message than just NPE(!!).
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now, given that the framework significantly rises compile time
>>>>>>>>> (due to all the checks), causes many "bugs" being reported by static code
>>>>>>>>> analysis tools (because the framework adds @Nonnull default annotations
>>>>>>>>> everywhere, even when Beam's code actually counts with nullability of a
>>>>>>>>> field), and given how much we currently suppress these checks ($ git grep
>>>>>>>>> BEAM-10402 | wc -l -> 1981), I'd say this deserves a deeper discussion.
>>>>>>>>>
>>>>>>>> The reason there are so many suppressions is that fixing all the
>>>>>>>> errors is a monumental task. Rather than addressing them all, Kenn
>>>>>>>> programmatically added suppressions for classes that failed the checks (
>>>>>>>> https://github.com/apache/beam/pull/13261). This allowed us to
>>>>>>>> start running the checker on the code that passes it while the errors are
>>>>>>>> fixed.
>>>>>>>>
>>>>>>>>>  Jan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/20/21 10:48 PM, Kenneth Knowles wrote:
>>>>>>>>>
>>>>>>>>> Yes, completely sound nullability checking has been added to the
>>>>>>>>> project via checkerframework, based on a large number of NPE bugs (1-5%
>>>>>>>>> depending on how you search, but many other bugs likely attributable to
>>>>>>>>> nullness-based design errors) which are extra embarrassing because NPEs
>>>>>>>>> have were essentially solved, even in practice for Java, well before Beam
>>>>>>>>> existed.
>>>>>>>>>
>>>>>>>>> Checker framework is a pluggable type system analysis with some
>>>>>>>>> amount of control flow sensitivity. Every value has a type that is either
>>>>>>>>> nullable or not, and certain control structures (like checking for null)
>>>>>>>>> can alter the type inside a scope. The best way to think about it is to
>>>>>>>>> consider every value in the program as either nullable or not, much like
>>>>>>>>> you think of every value as either a string or not, and to view method
>>>>>>>>> calls as inherently stateful/nondetermistic. This can be challenging
>>>>>>>>> in esoteric cases, but usually makes the overall code health better anyhow.
>>>>>>>>>
>>>>>>>>> Your example illustrates how problematic the design of the Java
>>>>>>>>> language is: the analysis cannot assume that `getDescription` is a pure
>>>>>>>>> function, and neither should you. Even if it is aware of
>>>>>>>>> boolean-short-circuit it would not be sound to accept this code. There is
>>>>>>>>> an annotation for this in the cases where it is true (like proto-generate
>>>>>>>>> getters):
>>>>>>>>> https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html
>>>>>>>>>
>>>>>>>>> The refactor for cases like this is trivial so there isn't a lot
>>>>>>>>> of value to thinking too hard about it.
>>>>>>>>>
>>>>>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)
>>>>>>>>>   @Nullable String desc = statusCode.toStatus().getDescription();
>>>>>>>>>   if (desc != null && desc.contains("finalized")) {
>>>>>>>>>     return false;
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> To a casual eye, this may look like a noop change. To the analysis
>>>>>>>>> it makes all the difference. And IMO this difference is real. Humans may
>>>>>>>>> assume it is a noop and humans would be wrong. So many times when you
>>>>>>>>> think/expect/hope that `getXYZ()` is a trivial getter method you later
>>>>>>>>> learn that it was tweaked for some odd reason. I believe this code change
>>>>>>>>> makes the code better. Suppressing these errors should be exceptionally
>>>>>>>>> rare, and never in normal code. It is far better to improve your code than
>>>>>>>>> to suppress the issue.
>>>>>>>>>
>>>>>>>>> It would be very cool for the proto compiler to annotate enough
>>>>>>>>> that new-and-improved type checkers could make things more convenient.
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Mon, Jan 11, 2021 at 8:53 PM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I can use that trick. However I'm surprised that the check
>>>>>>>>>> appears to be so simplistic.
>>>>>>>>>>
>>>>>>>>>> For example, the following code triggers the check on
>>>>>>>>>> getDescription().contains(), since getDescription returns a Nullable
>>>>>>>>>> string. However even a simplistic analysis should realize that
>>>>>>>>>> getDescription() was checked for null first! I have a dozen or so cases
>>>>>>>>>> like this, and I question how useful such a simplistic check it will be.
>>>>>>>>>>
>>>>>>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)    && statusCode.toStatus().getDescription() != null    && statusCode.toStatus().getDescription().contains("finalized")) {  return false;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 11, 2021 at 8:32 PM Boyuan Zhang <bo...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yeah it seems like the checker is enabled:
>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-10402. I used
>>>>>>>>>>> @SuppressWarnings({"nullness" )}) to suppress the error when I think it's
>>>>>>>>>>> not really a concern.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 11, 2021 at 8:28 PM Reuven Lax <re...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Has extra Nullable checking been enabled in the Beam project? I
>>>>>>>>>>>> have a PR that was on hold for several months, and I'm struggling now with
>>>>>>>>>>>> compile failing to complaints about assigning something that is nullable to
>>>>>>>>>>>> something that is not nullable. Even when the immediate control flow makes
>>>>>>>>>>>> it absolutely impossible for the variable to be null.
>>>>>>>>>>>>
>>>>>>>>>>>> Has something changed here?
>>>>>>>>>>>>
>>>>>>>>>>>> Reuven
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

On Fri, Mar 12, 2021 at 3:05 PM Kyle Weaver <kc...@google.com> wrote:

> > On the other hand, modifying core is a less common developer use case so
> passing a couple flags to skip it seems manageable for those people who are
> touching the core.
>
> Every time I check out a new git branch that has a modified core compared
> to the previous branch (i.e. almost always) it needs to be rebuilt, and
> runs the null checker again. I usually use Intellij to run tests, and I
> haven't figured out how to make it pass skipCheckerFramework by default, so
> I've resorted to just setting "skipCheckerFramework=true" in my
> gradle.properties.
>

Makes sense. We have had good experiences with spotless and spotbugs so I
think it can work as a separate CI job.

You might also benefit from having a cache (not just up-to-date checking).
Actually the project as a whole would benefit from a shared distributed
cache I'm guessing.

Kenn


>
> On Fri, Mar 12, 2021 at 2:53 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> I'm OK with this as long as they are treated as strictly merge blocking.
>>
>> On the other hand, modifying core is a less common developer use case so
>> passing a couple flags to skip it seems manageable for those people who are
>> touching the core.
>>
>> Kenn
>>
>> On Fri, Mar 12, 2021 at 1:19 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Does it make sense to add a Jenkins precommit suite that runs null
>>> checking exclusively?
>>>
>>> On Fri, Mar 12, 2021 at 11:57 AM Kyle Weaver <kc...@google.com>
>>> wrote:
>>>
>>>> I don't mind fixing my code or suppressing nullness errors, but the
>>>> cost of the null check itself is hurting my productivity. In the best case,
>>>> null checks add about ten minutes to the build time for large modules
>>>> like :sdks:java:core. In the worst case, they cause my build to fail
>>>> altogether, because the framework logs a warning that "Memory constraints
>>>> are impeding performance," which is interpreted by -Wall as an error. It
>>>> might not be a problem on big machines with a lot of memory, but on my
>>>> Macbook, and on the Github Actions executors it is a real problem.
>>>> https://issues.apache.org/jira/browse/BEAM-11837
>>>>
>>>> Since nullness checks seem to work fine for now on Jenkins, I propose
>>>> making them opt-in rather than opt-out, and only run them on Jenkins by
>>>> default.
>>>>
>>>> On Tue, Mar 2, 2021 at 12:13 PM Kyle Weaver <kc...@google.com>
>>>> wrote:
>>>>
>>>>> Can you try adding the generated classes to generatedClassPatterns in
>>>>> the JavaNatureConfiguration?
>>>>>
>>>>>
>>>>> https://github.com/apache/beam/blob/03b883b415d27244ddabb17a0fb5bab147b86f89/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L92
>>>>>
>>>>>
>>>>> On Tue, Mar 2, 2021 at 12:05 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> I'm running into a problem with this check. I added a protocol-buffer
>>>>>> file to a module (google-cloud-platform) that previously did have any
>>>>>> protobuf files in it. The generated files contain lines that violate this
>>>>>> null checker, so they won't compile. I can't annotate the files, because
>>>>>> they are codegen files. I tried adding the package to spotbugs-filter.xml,
>>>>>> but that didn't help.
>>>>>>
>>>>>> Any suggestions?
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Fri, Jan 22, 2021 at 10:38 AM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <je...@seznam.cz>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'll give my two cents here.
>>>>>>>>
>>>>>>>> I'm not 100% sure that the 1-5% of bugs are as severe as other
>>>>>>>> types of bugs. Yes, throwing NPEs at user is not very polite. On the other
>>>>>>>> hand, many of these actually boil down to user errors. Then we might ask
>>>>>>>> what a correct solution would be. If we manage to figure out what the
>>>>>>>> actual problem is and tell user what specifically is missing or going
>>>>>>>> wrong, that would be just awesome. On the other hand, if a tool used for
>>>>>>>> avoiding "unexpected" NPEs forces us to code
>>>>>>>>
>>>>>>>>    Object value = Objects.requireNonNull(myNullableObject); // or
>>>>>>>> similar using Preconditions
>>>>>>>>    value.doStuff();
>>>>>>>>
>>>>>>>> instead of just
>>>>>>>>
>>>>>>>>   myNullableObject.doStuff()
>>>>>>>>
>>>>>>>> what we actually did, is a) made a framework happy, and b) changed
>>>>>>>> a line at which NPE is thrown by 1 (and yes, internally prevented JVM from
>>>>>>>> thrown SIGSEGV at itself, but that is deeply technical thing). Nothing
>>>>>>>> changed semantically, from user perspective.
>>>>>>>>
>>>>>>> I'd argue there's value in asking Beam developers to make that
>>>>>>> change. It makes us acknowledge that there's a possibility myNullableObject
>>>>>>> is null. If myNullableObject being null is something relevant to the user
>>>>>>> we would likely want to wrap it in some other exception or provide a more
>>>>>>> useful message than just NPE(!!).
>>>>>>>
>>>>>>>>
>>>>>>>> Now, given that the framework significantly rises compile time (due
>>>>>>>> to all the checks), causes many "bugs" being reported by static code
>>>>>>>> analysis tools (because the framework adds @Nonnull default annotations
>>>>>>>> everywhere, even when Beam's code actually counts with nullability of a
>>>>>>>> field), and given how much we currently suppress these checks ($ git grep
>>>>>>>> BEAM-10402 | wc -l -> 1981), I'd say this deserves a deeper discussion.
>>>>>>>>
>>>>>>> The reason there are so many suppressions is that fixing all the
>>>>>>> errors is a monumental task. Rather than addressing them all, Kenn
>>>>>>> programmatically added suppressions for classes that failed the checks (
>>>>>>> https://github.com/apache/beam/pull/13261). This allowed us to
>>>>>>> start running the checker on the code that passes it while the errors are
>>>>>>> fixed.
>>>>>>>
>>>>>>>>  Jan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/20/21 10:48 PM, Kenneth Knowles wrote:
>>>>>>>>
>>>>>>>> Yes, completely sound nullability checking has been added to the
>>>>>>>> project via checkerframework, based on a large number of NPE bugs (1-5%
>>>>>>>> depending on how you search, but many other bugs likely attributable to
>>>>>>>> nullness-based design errors) which are extra embarrassing because NPEs
>>>>>>>> have were essentially solved, even in practice for Java, well before Beam
>>>>>>>> existed.
>>>>>>>>
>>>>>>>> Checker framework is a pluggable type system analysis with some
>>>>>>>> amount of control flow sensitivity. Every value has a type that is either
>>>>>>>> nullable or not, and certain control structures (like checking for null)
>>>>>>>> can alter the type inside a scope. The best way to think about it is to
>>>>>>>> consider every value in the program as either nullable or not, much like
>>>>>>>> you think of every value as either a string or not, and to view method
>>>>>>>> calls as inherently stateful/nondetermistic. This can be challenging
>>>>>>>> in esoteric cases, but usually makes the overall code health better anyhow.
>>>>>>>>
>>>>>>>> Your example illustrates how problematic the design of the Java
>>>>>>>> language is: the analysis cannot assume that `getDescription` is a pure
>>>>>>>> function, and neither should you. Even if it is aware of
>>>>>>>> boolean-short-circuit it would not be sound to accept this code. There is
>>>>>>>> an annotation for this in the cases where it is true (like proto-generate
>>>>>>>> getters):
>>>>>>>> https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html
>>>>>>>>
>>>>>>>> The refactor for cases like this is trivial so there isn't a lot of
>>>>>>>> value to thinking too hard about it.
>>>>>>>>
>>>>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)
>>>>>>>>   @Nullable String desc = statusCode.toStatus().getDescription();
>>>>>>>>   if (desc != null && desc.contains("finalized")) {
>>>>>>>>     return false;
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>> To a casual eye, this may look like a noop change. To the analysis
>>>>>>>> it makes all the difference. And IMO this difference is real. Humans may
>>>>>>>> assume it is a noop and humans would be wrong. So many times when you
>>>>>>>> think/expect/hope that `getXYZ()` is a trivial getter method you later
>>>>>>>> learn that it was tweaked for some odd reason. I believe this code change
>>>>>>>> makes the code better. Suppressing these errors should be exceptionally
>>>>>>>> rare, and never in normal code. It is far better to improve your code than
>>>>>>>> to suppress the issue.
>>>>>>>>
>>>>>>>> It would be very cool for the proto compiler to annotate enough
>>>>>>>> that new-and-improved type checkers could make things more convenient.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Mon, Jan 11, 2021 at 8:53 PM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I can use that trick. However I'm surprised that the check appears
>>>>>>>>> to be so simplistic.
>>>>>>>>>
>>>>>>>>> For example, the following code triggers the check on
>>>>>>>>> getDescription().contains(), since getDescription returns a Nullable
>>>>>>>>> string. However even a simplistic analysis should realize that
>>>>>>>>> getDescription() was checked for null first! I have a dozen or so cases
>>>>>>>>> like this, and I question how useful such a simplistic check it will be.
>>>>>>>>>
>>>>>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)    && statusCode.toStatus().getDescription() != null    && statusCode.toStatus().getDescription().contains("finalized")) {  return false;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 11, 2021 at 8:32 PM Boyuan Zhang <bo...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yeah it seems like the checker is enabled:
>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-10402. I used
>>>>>>>>>> @SuppressWarnings({"nullness" )}) to suppress the error when I think it's
>>>>>>>>>> not really a concern.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 11, 2021 at 8:28 PM Reuven Lax <re...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Has extra Nullable checking been enabled in the Beam project? I
>>>>>>>>>>> have a PR that was on hold for several months, and I'm struggling now with
>>>>>>>>>>> compile failing to complaints about assigning something that is nullable to
>>>>>>>>>>> something that is not nullable. Even when the immediate control flow makes
>>>>>>>>>>> it absolutely impossible for the variable to be null.
>>>>>>>>>>>
>>>>>>>>>>> Has something changed here?
>>>>>>>>>>>
>>>>>>>>>>> Reuven
>>>>>>>>>>>
>>>>>>>>>>

Re: Null checking in Beam

Posted by Kyle Weaver <kc...@google.com>.

> On the other hand, modifying core is a less common developer use case so
passing a couple flags to skip it seems manageable for those people who are
touching the core.

Every time I check out a new git branch that has a modified core compared
to the previous branch (i.e. almost always) it needs to be rebuilt, and
runs the null checker again. I usually use Intellij to run tests, and I
haven't figured out how to make it pass skipCheckerFramework by default, so
I've resorted to just setting "skipCheckerFramework=true" in my
gradle.properties.

On Fri, Mar 12, 2021 at 2:53 PM Kenneth Knowles <ke...@apache.org> wrote:

> I'm OK with this as long as they are treated as strictly merge blocking.
>
> On the other hand, modifying core is a less common developer use case so
> passing a couple flags to skip it seems manageable for those people who are
> touching the core.
>
> Kenn
>
> On Fri, Mar 12, 2021 at 1:19 PM Pablo Estrada <pa...@google.com> wrote:
>
>> Does it make sense to add a Jenkins precommit suite that runs null
>> checking exclusively?
>>
>> On Fri, Mar 12, 2021 at 11:57 AM Kyle Weaver <kc...@google.com> wrote:
>>
>>> I don't mind fixing my code or suppressing nullness errors, but the cost
>>> of the null check itself is hurting my productivity. In the best case, null
>>> checks add about ten minutes to the build time for large modules
>>> like :sdks:java:core. In the worst case, they cause my build to fail
>>> altogether, because the framework logs a warning that "Memory constraints
>>> are impeding performance," which is interpreted by -Wall as an error. It
>>> might not be a problem on big machines with a lot of memory, but on my
>>> Macbook, and on the Github Actions executors it is a real problem.
>>> https://issues.apache.org/jira/browse/BEAM-11837
>>>
>>> Since nullness checks seem to work fine for now on Jenkins, I propose
>>> making them opt-in rather than opt-out, and only run them on Jenkins by
>>> default.
>>>
>>> On Tue, Mar 2, 2021 at 12:13 PM Kyle Weaver <kc...@google.com> wrote:
>>>
>>>> Can you try adding the generated classes to generatedClassPatterns in
>>>> the JavaNatureConfiguration?
>>>>
>>>>
>>>> https://github.com/apache/beam/blob/03b883b415d27244ddabb17a0fb5bab147b86f89/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L92
>>>>
>>>>
>>>> On Tue, Mar 2, 2021 at 12:05 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> I'm running into a problem with this check. I added a protocol-buffer
>>>>> file to a module (google-cloud-platform) that previously did have any
>>>>> protobuf files in it. The generated files contain lines that violate this
>>>>> null checker, so they won't compile. I can't annotate the files, because
>>>>> they are codegen files. I tried adding the package to spotbugs-filter.xml,
>>>>> but that didn't help.
>>>>>
>>>>> Any suggestions?
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Fri, Jan 22, 2021 at 10:38 AM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'll give my two cents here.
>>>>>>>
>>>>>>> I'm not 100% sure that the 1-5% of bugs are as severe as other types
>>>>>>> of bugs. Yes, throwing NPEs at user is not very polite. On the other hand,
>>>>>>> many of these actually boil down to user errors. Then we might ask what a
>>>>>>> correct solution would be. If we manage to figure out what the actual
>>>>>>> problem is and tell user what specifically is missing or going wrong, that
>>>>>>> would be just awesome. On the other hand, if a tool used for avoiding
>>>>>>> "unexpected" NPEs forces us to code
>>>>>>>
>>>>>>>    Object value = Objects.requireNonNull(myNullableObject); // or
>>>>>>> similar using Preconditions
>>>>>>>    value.doStuff();
>>>>>>>
>>>>>>> instead of just
>>>>>>>
>>>>>>>   myNullableObject.doStuff()
>>>>>>>
>>>>>>> what we actually did, is a) made a framework happy, and b) changed a
>>>>>>> line at which NPE is thrown by 1 (and yes, internally prevented JVM from
>>>>>>> thrown SIGSEGV at itself, but that is deeply technical thing). Nothing
>>>>>>> changed semantically, from user perspective.
>>>>>>>
>>>>>> I'd argue there's value in asking Beam developers to make that
>>>>>> change. It makes us acknowledge that there's a possibility myNullableObject
>>>>>> is null. If myNullableObject being null is something relevant to the user
>>>>>> we would likely want to wrap it in some other exception or provide a more
>>>>>> useful message than just NPE(!!).
>>>>>>
>>>>>>>
>>>>>>> Now, given that the framework significantly rises compile time (due
>>>>>>> to all the checks), causes many "bugs" being reported by static code
>>>>>>> analysis tools (because the framework adds @Nonnull default annotations
>>>>>>> everywhere, even when Beam's code actually counts with nullability of a
>>>>>>> field), and given how much we currently suppress these checks ($ git grep
>>>>>>> BEAM-10402 | wc -l -> 1981), I'd say this deserves a deeper discussion.
>>>>>>>
>>>>>> The reason there are so many suppressions is that fixing all the
>>>>>> errors is a monumental task. Rather than addressing them all, Kenn
>>>>>> programmatically added suppressions for classes that failed the checks (
>>>>>> https://github.com/apache/beam/pull/13261). This allowed us to start
>>>>>> running the checker on the code that passes it while the errors are fixed.
>>>>>>
>>>>>>>  Jan
>>>>>>>
>>>>>>>
>>>>>>> On 1/20/21 10:48 PM, Kenneth Knowles wrote:
>>>>>>>
>>>>>>> Yes, completely sound nullability checking has been added to the
>>>>>>> project via checkerframework, based on a large number of NPE bugs (1-5%
>>>>>>> depending on how you search, but many other bugs likely attributable to
>>>>>>> nullness-based design errors) which are extra embarrassing because NPEs
>>>>>>> have were essentially solved, even in practice for Java, well before Beam
>>>>>>> existed.
>>>>>>>
>>>>>>> Checker framework is a pluggable type system analysis with some
>>>>>>> amount of control flow sensitivity. Every value has a type that is either
>>>>>>> nullable or not, and certain control structures (like checking for null)
>>>>>>> can alter the type inside a scope. The best way to think about it is to
>>>>>>> consider every value in the program as either nullable or not, much like
>>>>>>> you think of every value as either a string or not, and to view method
>>>>>>> calls as inherently stateful/nondetermistic. This can be challenging
>>>>>>> in esoteric cases, but usually makes the overall code health better anyhow.
>>>>>>>
>>>>>>> Your example illustrates how problematic the design of the Java
>>>>>>> language is: the analysis cannot assume that `getDescription` is a pure
>>>>>>> function, and neither should you. Even if it is aware of
>>>>>>> boolean-short-circuit it would not be sound to accept this code. There is
>>>>>>> an annotation for this in the cases where it is true (like proto-generate
>>>>>>> getters):
>>>>>>> https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html
>>>>>>>
>>>>>>> The refactor for cases like this is trivial so there isn't a lot of
>>>>>>> value to thinking too hard about it.
>>>>>>>
>>>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)
>>>>>>>   @Nullable String desc = statusCode.toStatus().getDescription();
>>>>>>>   if (desc != null && desc.contains("finalized")) {
>>>>>>>     return false;
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> To a casual eye, this may look like a noop change. To the analysis
>>>>>>> it makes all the difference. And IMO this difference is real. Humans may
>>>>>>> assume it is a noop and humans would be wrong. So many times when you
>>>>>>> think/expect/hope that `getXYZ()` is a trivial getter method you later
>>>>>>> learn that it was tweaked for some odd reason. I believe this code change
>>>>>>> makes the code better. Suppressing these errors should be exceptionally
>>>>>>> rare, and never in normal code. It is far better to improve your code than
>>>>>>> to suppress the issue.
>>>>>>>
>>>>>>> It would be very cool for the proto compiler to annotate enough that
>>>>>>> new-and-improved type checkers could make things more convenient.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Mon, Jan 11, 2021 at 8:53 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> I can use that trick. However I'm surprised that the check appears
>>>>>>>> to be so simplistic.
>>>>>>>>
>>>>>>>> For example, the following code triggers the check on
>>>>>>>> getDescription().contains(), since getDescription returns a Nullable
>>>>>>>> string. However even a simplistic analysis should realize that
>>>>>>>> getDescription() was checked for null first! I have a dozen or so cases
>>>>>>>> like this, and I question how useful such a simplistic check it will be.
>>>>>>>>
>>>>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)    && statusCode.toStatus().getDescription() != null    && statusCode.toStatus().getDescription().contains("finalized")) {  return false;
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 11, 2021 at 8:32 PM Boyuan Zhang <bo...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yeah it seems like the checker is enabled:
>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-10402. I used
>>>>>>>>> @SuppressWarnings({"nullness" )}) to suppress the error when I think it's
>>>>>>>>> not really a concern.
>>>>>>>>>
>>>>>>>>> On Mon, Jan 11, 2021 at 8:28 PM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Has extra Nullable checking been enabled in the Beam project? I
>>>>>>>>>> have a PR that was on hold for several months, and I'm struggling now with
>>>>>>>>>> compile failing to complaints about assigning something that is nullable to
>>>>>>>>>> something that is not nullable. Even when the immediate control flow makes
>>>>>>>>>> it absolutely impossible for the variable to be null.
>>>>>>>>>>
>>>>>>>>>> Has something changed here?
>>>>>>>>>>
>>>>>>>>>> Reuven
>>>>>>>>>>
>>>>>>>>>

Re: Null checking in Beam

Posted by Kenneth Knowles <ke...@apache.org>.

I'm OK with this as long as they are treated as strictly merge blocking.

On the other hand, modifying core is a less common developer use case so
passing a couple flags to skip it seems manageable for those people who are
touching the core.

Kenn

On Fri, Mar 12, 2021 at 1:19 PM Pablo Estrada <pa...@google.com> wrote:

> Does it make sense to add a Jenkins precommit suite that runs null
> checking exclusively?
>
> On Fri, Mar 12, 2021 at 11:57 AM Kyle Weaver <kc...@google.com> wrote:
>
>> I don't mind fixing my code or suppressing nullness errors, but the cost
>> of the null check itself is hurting my productivity. In the best case, null
>> checks add about ten minutes to the build time for large modules
>> like :sdks:java:core. In the worst case, they cause my build to fail
>> altogether, because the framework logs a warning that "Memory constraints
>> are impeding performance," which is interpreted by -Wall as an error. It
>> might not be a problem on big machines with a lot of memory, but on my
>> Macbook, and on the Github Actions executors it is a real problem.
>> https://issues.apache.org/jira/browse/BEAM-11837
>>
>> Since nullness checks seem to work fine for now on Jenkins, I propose
>> making them opt-in rather than opt-out, and only run them on Jenkins by
>> default.
>>
>> On Tue, Mar 2, 2021 at 12:13 PM Kyle Weaver <kc...@google.com> wrote:
>>
>>> Can you try adding the generated classes to generatedClassPatterns in
>>> the JavaNatureConfiguration?
>>>
>>>
>>> https://github.com/apache/beam/blob/03b883b415d27244ddabb17a0fb5bab147b86f89/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L92
>>>
>>>
>>> On Tue, Mar 2, 2021 at 12:05 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> I'm running into a problem with this check. I added a protocol-buffer
>>>> file to a module (google-cloud-platform) that previously did have any
>>>> protobuf files in it. The generated files contain lines that violate this
>>>> null checker, so they won't compile. I can't annotate the files, because
>>>> they are codegen files. I tried adding the package to spotbugs-filter.xml,
>>>> but that didn't help.
>>>>
>>>> Any suggestions?
>>>>
>>>> Reuven
>>>>
>>>> On Fri, Jan 22, 2021 at 10:38 AM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'll give my two cents here.
>>>>>>
>>>>>> I'm not 100% sure that the 1-5% of bugs are as severe as other types
>>>>>> of bugs. Yes, throwing NPEs at user is not very polite. On the other hand,
>>>>>> many of these actually boil down to user errors. Then we might ask what a
>>>>>> correct solution would be. If we manage to figure out what the actual
>>>>>> problem is and tell user what specifically is missing or going wrong, that
>>>>>> would be just awesome. On the other hand, if a tool used for avoiding
>>>>>> "unexpected" NPEs forces us to code
>>>>>>
>>>>>>    Object value = Objects.requireNonNull(myNullableObject); // or
>>>>>> similar using Preconditions
>>>>>>    value.doStuff();
>>>>>>
>>>>>> instead of just
>>>>>>
>>>>>>   myNullableObject.doStuff()
>>>>>>
>>>>>> what we actually did, is a) made a framework happy, and b) changed a
>>>>>> line at which NPE is thrown by 1 (and yes, internally prevented JVM from
>>>>>> thrown SIGSEGV at itself, but that is deeply technical thing). Nothing
>>>>>> changed semantically, from user perspective.
>>>>>>
>>>>> I'd argue there's value in asking Beam developers to make that change.
>>>>> It makes us acknowledge that there's a possibility myNullableObject is
>>>>> null. If myNullableObject being null is something relevant to the user we
>>>>> would likely want to wrap it in some other exception or provide a more
>>>>> useful message than just NPE(!!).
>>>>>
>>>>>>
>>>>>> Now, given that the framework significantly rises compile time (due
>>>>>> to all the checks), causes many "bugs" being reported by static code
>>>>>> analysis tools (because the framework adds @Nonnull default annotations
>>>>>> everywhere, even when Beam's code actually counts with nullability of a
>>>>>> field), and given how much we currently suppress these checks ($ git grep
>>>>>> BEAM-10402 | wc -l -> 1981), I'd say this deserves a deeper discussion.
>>>>>>
>>>>> The reason there are so many suppressions is that fixing all the
>>>>> errors is a monumental task. Rather than addressing them all, Kenn
>>>>> programmatically added suppressions for classes that failed the checks (
>>>>> https://github.com/apache/beam/pull/13261). This allowed us to start
>>>>> running the checker on the code that passes it while the errors are fixed.
>>>>>
>>>>>>  Jan
>>>>>>
>>>>>>
>>>>>> On 1/20/21 10:48 PM, Kenneth Knowles wrote:
>>>>>>
>>>>>> Yes, completely sound nullability checking has been added to the
>>>>>> project via checkerframework, based on a large number of NPE bugs (1-5%
>>>>>> depending on how you search, but many other bugs likely attributable to
>>>>>> nullness-based design errors) which are extra embarrassing because NPEs
>>>>>> have were essentially solved, even in practice for Java, well before Beam
>>>>>> existed.
>>>>>>
>>>>>> Checker framework is a pluggable type system analysis with some
>>>>>> amount of control flow sensitivity. Every value has a type that is either
>>>>>> nullable or not, and certain control structures (like checking for null)
>>>>>> can alter the type inside a scope. The best way to think about it is to
>>>>>> consider every value in the program as either nullable or not, much like
>>>>>> you think of every value as either a string or not, and to view method
>>>>>> calls as inherently stateful/nondetermistic. This can be challenging
>>>>>> in esoteric cases, but usually makes the overall code health better anyhow.
>>>>>>
>>>>>> Your example illustrates how problematic the design of the Java
>>>>>> language is: the analysis cannot assume that `getDescription` is a pure
>>>>>> function, and neither should you. Even if it is aware of
>>>>>> boolean-short-circuit it would not be sound to accept this code. There is
>>>>>> an annotation for this in the cases where it is true (like proto-generate
>>>>>> getters):
>>>>>> https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html
>>>>>>
>>>>>> The refactor for cases like this is trivial so there isn't a lot of
>>>>>> value to thinking too hard about it.
>>>>>>
>>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)
>>>>>>   @Nullable String desc = statusCode.toStatus().getDescription();
>>>>>>   if (desc != null && desc.contains("finalized")) {
>>>>>>     return false;
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> To a casual eye, this may look like a noop change. To the analysis it
>>>>>> makes all the difference. And IMO this difference is real. Humans may
>>>>>> assume it is a noop and humans would be wrong. So many times when you
>>>>>> think/expect/hope that `getXYZ()` is a trivial getter method you later
>>>>>> learn that it was tweaked for some odd reason. I believe this code change
>>>>>> makes the code better. Suppressing these errors should be exceptionally
>>>>>> rare, and never in normal code. It is far better to improve your code than
>>>>>> to suppress the issue.
>>>>>>
>>>>>> It would be very cool for the proto compiler to annotate enough that
>>>>>> new-and-improved type checkers could make things more convenient.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Mon, Jan 11, 2021 at 8:53 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> I can use that trick. However I'm surprised that the check appears
>>>>>>> to be so simplistic.
>>>>>>>
>>>>>>> For example, the following code triggers the check on
>>>>>>> getDescription().contains(), since getDescription returns a Nullable
>>>>>>> string. However even a simplistic analysis should realize that
>>>>>>> getDescription() was checked for null first! I have a dozen or so cases
>>>>>>> like this, and I question how useful such a simplistic check it will be.
>>>>>>>
>>>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)    && statusCode.toStatus().getDescription() != null    && statusCode.toStatus().getDescription().contains("finalized")) {  return false;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 11, 2021 at 8:32 PM Boyuan Zhang <bo...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yeah it seems like the checker is enabled:
>>>>>>>> https://issues.apache.org/jira/browse/BEAM-10402. I used
>>>>>>>> @SuppressWarnings({"nullness" )}) to suppress the error when I think it's
>>>>>>>> not really a concern.
>>>>>>>>
>>>>>>>> On Mon, Jan 11, 2021 at 8:28 PM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Has extra Nullable checking been enabled in the Beam project? I
>>>>>>>>> have a PR that was on hold for several months, and I'm struggling now with
>>>>>>>>> compile failing to complaints about assigning something that is nullable to
>>>>>>>>> something that is not nullable. Even when the immediate control flow makes
>>>>>>>>> it absolutely impossible for the variable to be null.
>>>>>>>>>
>>>>>>>>> Has something changed here?
>>>>>>>>>
>>>>>>>>> Reuven
>>>>>>>>>
>>>>>>>>

Re: Null checking in Beam

Posted by Pablo Estrada <pa...@google.com>.

Does it make sense to add a Jenkins precommit suite that runs null checking
exclusively?

On Fri, Mar 12, 2021 at 11:57 AM Kyle Weaver <kc...@google.com> wrote:

> I don't mind fixing my code or suppressing nullness errors, but the cost
> of the null check itself is hurting my productivity. In the best case, null
> checks add about ten minutes to the build time for large modules
> like :sdks:java:core. In the worst case, they cause my build to fail
> altogether, because the framework logs a warning that "Memory constraints
> are impeding performance," which is interpreted by -Wall as an error. It
> might not be a problem on big machines with a lot of memory, but on my
> Macbook, and on the Github Actions executors it is a real problem.
> https://issues.apache.org/jira/browse/BEAM-11837
>
> Since nullness checks seem to work fine for now on Jenkins, I propose
> making them opt-in rather than opt-out, and only run them on Jenkins by
> default.
>
> On Tue, Mar 2, 2021 at 12:13 PM Kyle Weaver <kc...@google.com> wrote:
>
>> Can you try adding the generated classes to generatedClassPatterns in the
>> JavaNatureConfiguration?
>>
>>
>> https://github.com/apache/beam/blob/03b883b415d27244ddabb17a0fb5bab147b86f89/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L92
>>
>>
>> On Tue, Mar 2, 2021 at 12:05 AM Reuven Lax <re...@google.com> wrote:
>>
>>> I'm running into a problem with this check. I added a protocol-buffer
>>> file to a module (google-cloud-platform) that previously did have any
>>> protobuf files in it. The generated files contain lines that violate this
>>> null checker, so they won't compile. I can't annotate the files, because
>>> they are codegen files. I tried adding the package to spotbugs-filter.xml,
>>> but that didn't help.
>>>
>>> Any suggestions?
>>>
>>> Reuven
>>>
>>> On Fri, Jan 22, 2021 at 10:38 AM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'll give my two cents here.
>>>>>
>>>>> I'm not 100% sure that the 1-5% of bugs are as severe as other types
>>>>> of bugs. Yes, throwing NPEs at user is not very polite. On the other hand,
>>>>> many of these actually boil down to user errors. Then we might ask what a
>>>>> correct solution would be. If we manage to figure out what the actual
>>>>> problem is and tell user what specifically is missing or going wrong, that
>>>>> would be just awesome. On the other hand, if a tool used for avoiding
>>>>> "unexpected" NPEs forces us to code
>>>>>
>>>>>    Object value = Objects.requireNonNull(myNullableObject); // or
>>>>> similar using Preconditions
>>>>>    value.doStuff();
>>>>>
>>>>> instead of just
>>>>>
>>>>>   myNullableObject.doStuff()
>>>>>
>>>>> what we actually did, is a) made a framework happy, and b) changed a
>>>>> line at which NPE is thrown by 1 (and yes, internally prevented JVM from
>>>>> thrown SIGSEGV at itself, but that is deeply technical thing). Nothing
>>>>> changed semantically, from user perspective.
>>>>>
>>>> I'd argue there's value in asking Beam developers to make that change.
>>>> It makes us acknowledge that there's a possibility myNullableObject is
>>>> null. If myNullableObject being null is something relevant to the user we
>>>> would likely want to wrap it in some other exception or provide a more
>>>> useful message than just NPE(!!).
>>>>
>>>>>
>>>>> Now, given that the framework significantly rises compile time (due to
>>>>> all the checks), causes many "bugs" being reported by static code analysis
>>>>> tools (because the framework adds @Nonnull default annotations everywhere,
>>>>> even when Beam's code actually counts with nullability of a field), and
>>>>> given how much we currently suppress these checks ($ git grep BEAM-10402 |
>>>>> wc -l -> 1981), I'd say this deserves a deeper discussion.
>>>>>
>>>> The reason there are so many suppressions is that fixing all the errors
>>>> is a monumental task. Rather than addressing them all, Kenn
>>>> programmatically added suppressions for classes that failed the checks (
>>>> https://github.com/apache/beam/pull/13261). This allowed us to start
>>>> running the checker on the code that passes it while the errors are fixed.
>>>>
>>>>>  Jan
>>>>>
>>>>>
>>>>> On 1/20/21 10:48 PM, Kenneth Knowles wrote:
>>>>>
>>>>> Yes, completely sound nullability checking has been added to the
>>>>> project via checkerframework, based on a large number of NPE bugs (1-5%
>>>>> depending on how you search, but many other bugs likely attributable to
>>>>> nullness-based design errors) which are extra embarrassing because NPEs
>>>>> have were essentially solved, even in practice for Java, well before Beam
>>>>> existed.
>>>>>
>>>>> Checker framework is a pluggable type system analysis with some amount
>>>>> of control flow sensitivity. Every value has a type that is either nullable
>>>>> or not, and certain control structures (like checking for null) can alter
>>>>> the type inside a scope. The best way to think about it is to consider
>>>>> every value in the program as either nullable or not, much like you think
>>>>> of every value as either a string or not, and to view method calls as
>>>>> inherently stateful/nondetermistic. This can be challenging in esoteric
>>>>> cases, but usually makes the overall code health better anyhow.
>>>>>
>>>>> Your example illustrates how problematic the design of the Java
>>>>> language is: the analysis cannot assume that `getDescription` is a pure
>>>>> function, and neither should you. Even if it is aware of
>>>>> boolean-short-circuit it would not be sound to accept this code. There is
>>>>> an annotation for this in the cases where it is true (like proto-generate
>>>>> getters):
>>>>> https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html
>>>>>
>>>>> The refactor for cases like this is trivial so there isn't a lot of
>>>>> value to thinking too hard about it.
>>>>>
>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)
>>>>>   @Nullable String desc = statusCode.toStatus().getDescription();
>>>>>   if (desc != null && desc.contains("finalized")) {
>>>>>     return false;
>>>>>   }
>>>>> }
>>>>>
>>>>> To a casual eye, this may look like a noop change. To the analysis it
>>>>> makes all the difference. And IMO this difference is real. Humans may
>>>>> assume it is a noop and humans would be wrong. So many times when you
>>>>> think/expect/hope that `getXYZ()` is a trivial getter method you later
>>>>> learn that it was tweaked for some odd reason. I believe this code change
>>>>> makes the code better. Suppressing these errors should be exceptionally
>>>>> rare, and never in normal code. It is far better to improve your code than
>>>>> to suppress the issue.
>>>>>
>>>>> It would be very cool for the proto compiler to annotate enough that
>>>>> new-and-improved type checkers could make things more convenient.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Mon, Jan 11, 2021 at 8:53 PM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> I can use that trick. However I'm surprised that the check appears to
>>>>>> be so simplistic.
>>>>>>
>>>>>> For example, the following code triggers the check on
>>>>>> getDescription().contains(), since getDescription returns a Nullable
>>>>>> string. However even a simplistic analysis should realize that
>>>>>> getDescription() was checked for null first! I have a dozen or so cases
>>>>>> like this, and I question how useful such a simplistic check it will be.
>>>>>>
>>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)    && statusCode.toStatus().getDescription() != null    && statusCode.toStatus().getDescription().contains("finalized")) {  return false;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 11, 2021 at 8:32 PM Boyuan Zhang <bo...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yeah it seems like the checker is enabled:
>>>>>>> https://issues.apache.org/jira/browse/BEAM-10402. I used
>>>>>>> @SuppressWarnings({"nullness" )}) to suppress the error when I think it's
>>>>>>> not really a concern.
>>>>>>>
>>>>>>> On Mon, Jan 11, 2021 at 8:28 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> Has extra Nullable checking been enabled in the Beam project? I
>>>>>>>> have a PR that was on hold for several months, and I'm struggling now with
>>>>>>>> compile failing to complaints about assigning something that is nullable to
>>>>>>>> something that is not nullable. Even when the immediate control flow makes
>>>>>>>> it absolutely impossible for the variable to be null.
>>>>>>>>
>>>>>>>> Has something changed here?
>>>>>>>>
>>>>>>>> Reuven
>>>>>>>>
>>>>>>>

Re: Null checking in Beam

Posted by Kyle Weaver <kc...@google.com>.

I don't mind fixing my code or suppressing nullness errors, but the cost of
the null check itself is hurting my productivity. In the best case, null
checks add about ten minutes to the build time for large modules
like :sdks:java:core. In the worst case, they cause my build to fail
altogether, because the framework logs a warning that "Memory constraints
are impeding performance," which is interpreted by -Wall as an error. It
might not be a problem on big machines with a lot of memory, but on my
Macbook, and on the Github Actions executors it is a real problem.
https://issues.apache.org/jira/browse/BEAM-11837

Since nullness checks seem to work fine for now on Jenkins, I propose
making them opt-in rather than opt-out, and only run them on Jenkins by
default.

On Tue, Mar 2, 2021 at 12:13 PM Kyle Weaver <kc...@google.com> wrote:

> Can you try adding the generated classes to generatedClassPatterns in the
> JavaNatureConfiguration?
>
>
> https://github.com/apache/beam/blob/03b883b415d27244ddabb17a0fb5bab147b86f89/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L92
>
>
> On Tue, Mar 2, 2021 at 12:05 AM Reuven Lax <re...@google.com> wrote:
>
>> I'm running into a problem with this check. I added a protocol-buffer
>> file to a module (google-cloud-platform) that previously did have any
>> protobuf files in it. The generated files contain lines that violate this
>> null checker, so they won't compile. I can't annotate the files, because
>> they are codegen files. I tried adding the package to spotbugs-filter.xml,
>> but that didn't help.
>>
>> Any suggestions?
>>
>> Reuven
>>
>> On Fri, Jan 22, 2021 at 10:38 AM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'll give my two cents here.
>>>>
>>>> I'm not 100% sure that the 1-5% of bugs are as severe as other types of
>>>> bugs. Yes, throwing NPEs at user is not very polite. On the other hand,
>>>> many of these actually boil down to user errors. Then we might ask what a
>>>> correct solution would be. If we manage to figure out what the actual
>>>> problem is and tell user what specifically is missing or going wrong, that
>>>> would be just awesome. On the other hand, if a tool used for avoiding
>>>> "unexpected" NPEs forces us to code
>>>>
>>>>    Object value = Objects.requireNonNull(myNullableObject); // or
>>>> similar using Preconditions
>>>>    value.doStuff();
>>>>
>>>> instead of just
>>>>
>>>>   myNullableObject.doStuff()
>>>>
>>>> what we actually did, is a) made a framework happy, and b) changed a
>>>> line at which NPE is thrown by 1 (and yes, internally prevented JVM from
>>>> thrown SIGSEGV at itself, but that is deeply technical thing). Nothing
>>>> changed semantically, from user perspective.
>>>>
>>> I'd argue there's value in asking Beam developers to make that change.
>>> It makes us acknowledge that there's a possibility myNullableObject is
>>> null. If myNullableObject being null is something relevant to the user we
>>> would likely want to wrap it in some other exception or provide a more
>>> useful message than just NPE(!!).
>>>
>>>>
>>>> Now, given that the framework significantly rises compile time (due to
>>>> all the checks), causes many "bugs" being reported by static code analysis
>>>> tools (because the framework adds @Nonnull default annotations everywhere,
>>>> even when Beam's code actually counts with nullability of a field), and
>>>> given how much we currently suppress these checks ($ git grep BEAM-10402 |
>>>> wc -l -> 1981), I'd say this deserves a deeper discussion.
>>>>
>>> The reason there are so many suppressions is that fixing all the errors
>>> is a monumental task. Rather than addressing them all, Kenn
>>> programmatically added suppressions for classes that failed the checks (
>>> https://github.com/apache/beam/pull/13261). This allowed us to start
>>> running the checker on the code that passes it while the errors are fixed.
>>>
>>>>  Jan
>>>>
>>>>
>>>> On 1/20/21 10:48 PM, Kenneth Knowles wrote:
>>>>
>>>> Yes, completely sound nullability checking has been added to the
>>>> project via checkerframework, based on a large number of NPE bugs (1-5%
>>>> depending on how you search, but many other bugs likely attributable to
>>>> nullness-based design errors) which are extra embarrassing because NPEs
>>>> have were essentially solved, even in practice for Java, well before Beam
>>>> existed.
>>>>
>>>> Checker framework is a pluggable type system analysis with some amount
>>>> of control flow sensitivity. Every value has a type that is either nullable
>>>> or not, and certain control structures (like checking for null) can alter
>>>> the type inside a scope. The best way to think about it is to consider
>>>> every value in the program as either nullable or not, much like you think
>>>> of every value as either a string or not, and to view method calls as
>>>> inherently stateful/nondetermistic. This can be challenging in esoteric
>>>> cases, but usually makes the overall code health better anyhow.
>>>>
>>>> Your example illustrates how problematic the design of the Java
>>>> language is: the analysis cannot assume that `getDescription` is a pure
>>>> function, and neither should you. Even if it is aware of
>>>> boolean-short-circuit it would not be sound to accept this code. There is
>>>> an annotation for this in the cases where it is true (like proto-generate
>>>> getters):
>>>> https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html
>>>>
>>>> The refactor for cases like this is trivial so there isn't a lot of
>>>> value to thinking too hard about it.
>>>>
>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)
>>>>   @Nullable String desc = statusCode.toStatus().getDescription();
>>>>   if (desc != null && desc.contains("finalized")) {
>>>>     return false;
>>>>   }
>>>> }
>>>>
>>>> To a casual eye, this may look like a noop change. To the analysis it
>>>> makes all the difference. And IMO this difference is real. Humans may
>>>> assume it is a noop and humans would be wrong. So many times when you
>>>> think/expect/hope that `getXYZ()` is a trivial getter method you later
>>>> learn that it was tweaked for some odd reason. I believe this code change
>>>> makes the code better. Suppressing these errors should be exceptionally
>>>> rare, and never in normal code. It is far better to improve your code than
>>>> to suppress the issue.
>>>>
>>>> It would be very cool for the proto compiler to annotate enough that
>>>> new-and-improved type checkers could make things more convenient.
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Jan 11, 2021 at 8:53 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> I can use that trick. However I'm surprised that the check appears to
>>>>> be so simplistic.
>>>>>
>>>>> For example, the following code triggers the check on
>>>>> getDescription().contains(), since getDescription returns a Nullable
>>>>> string. However even a simplistic analysis should realize that
>>>>> getDescription() was checked for null first! I have a dozen or so cases
>>>>> like this, and I question how useful such a simplistic check it will be.
>>>>>
>>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)    && statusCode.toStatus().getDescription() != null    && statusCode.toStatus().getDescription().contains("finalized")) {  return false;
>>>>> }
>>>>>
>>>>>
>>>>> On Mon, Jan 11, 2021 at 8:32 PM Boyuan Zhang <bo...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Yeah it seems like the checker is enabled:
>>>>>> https://issues.apache.org/jira/browse/BEAM-10402. I used
>>>>>> @SuppressWarnings({"nullness" )}) to suppress the error when I think it's
>>>>>> not really a concern.
>>>>>>
>>>>>> On Mon, Jan 11, 2021 at 8:28 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> Has extra Nullable checking been enabled in the Beam project? I have
>>>>>>> a PR that was on hold for several months, and I'm struggling now with
>>>>>>> compile failing to complaints about assigning something that is nullable to
>>>>>>> something that is not nullable. Even when the immediate control flow makes
>>>>>>> it absolutely impossible for the variable to be null.
>>>>>>>
>>>>>>> Has something changed here?
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>

Re: Null checking in Beam

Posted by Kyle Weaver <kc...@google.com>.

Can you try adding the generated classes to generatedClassPatterns in the
JavaNatureConfiguration?

https://github.com/apache/beam/blob/03b883b415d27244ddabb17a0fb5bab147b86f89/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L92


On Tue, Mar 2, 2021 at 12:05 AM Reuven Lax <re...@google.com> wrote:

> I'm running into a problem with this check. I added a protocol-buffer file
> to a module (google-cloud-platform) that previously did have any protobuf
> files in it. The generated files contain lines that violate this null
> checker, so they won't compile. I can't annotate the files, because they
> are codegen files. I tried adding the package to spotbugs-filter.xml, but
> that didn't help.
>
> Any suggestions?
>
> Reuven
>
> On Fri, Jan 22, 2021 at 10:38 AM Brian Hulette <bh...@google.com>
> wrote:
>
>>
>>
>> On Fri, Jan 22, 2021 at 1:18 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Hi,
>>>
>>> I'll give my two cents here.
>>>
>>> I'm not 100% sure that the 1-5% of bugs are as severe as other types of
>>> bugs. Yes, throwing NPEs at user is not very polite. On the other hand,
>>> many of these actually boil down to user errors. Then we might ask what a
>>> correct solution would be. If we manage to figure out what the actual
>>> problem is and tell user what specifically is missing or going wrong, that
>>> would be just awesome. On the other hand, if a tool used for avoiding
>>> "unexpected" NPEs forces us to code
>>>
>>>    Object value = Objects.requireNonNull(myNullableObject); // or
>>> similar using Preconditions
>>>    value.doStuff();
>>>
>>> instead of just
>>>
>>>   myNullableObject.doStuff()
>>>
>>> what we actually did, is a) made a framework happy, and b) changed a
>>> line at which NPE is thrown by 1 (and yes, internally prevented JVM from
>>> thrown SIGSEGV at itself, but that is deeply technical thing). Nothing
>>> changed semantically, from user perspective.
>>>
>> I'd argue there's value in asking Beam developers to make that change. It
>> makes us acknowledge that there's a possibility myNullableObject is null.
>> If myNullableObject being null is something relevant to the user we would
>> likely want to wrap it in some other exception or provide a more useful
>> message than just NPE(!!).
>>
>>>
>>> Now, given that the framework significantly rises compile time (due to
>>> all the checks), causes many "bugs" being reported by static code analysis
>>> tools (because the framework adds @Nonnull default annotations everywhere,
>>> even when Beam's code actually counts with nullability of a field), and
>>> given how much we currently suppress these checks ($ git grep BEAM-10402 |
>>> wc -l -> 1981), I'd say this deserves a deeper discussion.
>>>
>> The reason there are so many suppressions is that fixing all the errors
>> is a monumental task. Rather than addressing them all, Kenn
>> programmatically added suppressions for classes that failed the checks (
>> https://github.com/apache/beam/pull/13261). This allowed us to start
>> running the checker on the code that passes it while the errors are fixed.
>>
>>>  Jan
>>>
>>>
>>> On 1/20/21 10:48 PM, Kenneth Knowles wrote:
>>>
>>> Yes, completely sound nullability checking has been added to the project
>>> via checkerframework, based on a large number of NPE bugs (1-5% depending
>>> on how you search, but many other bugs likely attributable to
>>> nullness-based design errors) which are extra embarrassing because NPEs
>>> have were essentially solved, even in practice for Java, well before Beam
>>> existed.
>>>
>>> Checker framework is a pluggable type system analysis with some amount
>>> of control flow sensitivity. Every value has a type that is either nullable
>>> or not, and certain control structures (like checking for null) can alter
>>> the type inside a scope. The best way to think about it is to consider
>>> every value in the program as either nullable or not, much like you think
>>> of every value as either a string or not, and to view method calls as
>>> inherently stateful/nondetermistic. This can be challenging in esoteric
>>> cases, but usually makes the overall code health better anyhow.
>>>
>>> Your example illustrates how problematic the design of the Java language
>>> is: the analysis cannot assume that `getDescription` is a pure function,
>>> and neither should you. Even if it is aware of boolean-short-circuit it
>>> would not be sound to accept this code. There is an annotation for this in
>>> the cases where it is true (like proto-generate getters):
>>> https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html
>>>
>>> The refactor for cases like this is trivial so there isn't a lot of
>>> value to thinking too hard about it.
>>>
>>> if (statusCode.equals(Code.INVALID_ARGUMENT)
>>>   @Nullable String desc = statusCode.toStatus().getDescription();
>>>   if (desc != null && desc.contains("finalized")) {
>>>     return false;
>>>   }
>>> }
>>>
>>> To a casual eye, this may look like a noop change. To the analysis it
>>> makes all the difference. And IMO this difference is real. Humans may
>>> assume it is a noop and humans would be wrong. So many times when you
>>> think/expect/hope that `getXYZ()` is a trivial getter method you later
>>> learn that it was tweaked for some odd reason. I believe this code change
>>> makes the code better. Suppressing these errors should be exceptionally
>>> rare, and never in normal code. It is far better to improve your code than
>>> to suppress the issue.
>>>
>>> It would be very cool for the proto compiler to annotate enough that
>>> new-and-improved type checkers could make things more convenient.
>>>
>>> Kenn
>>>
>>> On Mon, Jan 11, 2021 at 8:53 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> I can use that trick. However I'm surprised that the check appears to
>>>> be so simplistic.
>>>>
>>>> For example, the following code triggers the check on
>>>> getDescription().contains(), since getDescription returns a Nullable
>>>> string. However even a simplistic analysis should realize that
>>>> getDescription() was checked for null first! I have a dozen or so cases
>>>> like this, and I question how useful such a simplistic check it will be.
>>>>
>>>> if (statusCode.equals(Code.INVALID_ARGUMENT)    && statusCode.toStatus().getDescription() != null    && statusCode.toStatus().getDescription().contains("finalized")) {  return false;
>>>> }
>>>>
>>>>
>>>> On Mon, Jan 11, 2021 at 8:32 PM Boyuan Zhang <bo...@google.com>
>>>> wrote:
>>>>
>>>>> Yeah it seems like the checker is enabled:
>>>>> https://issues.apache.org/jira/browse/BEAM-10402. I used
>>>>> @SuppressWarnings({"nullness" )}) to suppress the error when I think it's
>>>>> not really a concern.
>>>>>
>>>>> On Mon, Jan 11, 2021 at 8:28 PM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Has extra Nullable checking been enabled in the Beam project? I have
>>>>>> a PR that was on hold for several months, and I'm struggling now with
>>>>>> compile failing to complaints about assigning something that is nullable to
>>>>>> something that is not nullable. Even when the immediate control flow makes
>>>>>> it absolutely impossible for the variable to be null.
>>>>>>
>>>>>> Has something changed here?
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>