You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/03 21:54:11 UTC

[GitHub] [beam] jlentin opened a new issue, #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

jlentin opened a new issue, #24801:
URL: https://github.com/apache/beam/issues/24801

   ### What happened?
   
   We are trying to upgrade version of Beam from version `2.31.0` to `2.43.0`
   
   While upgrading, we noticed that `@Nullable` annotation has been removed from `org.apache.beam.sdk.state.ReadableState.read()`. This is causing our code quality checks to fail when we check null on the return values of the `read` method.
   
   I traced it back to this PR: https://github.com/apache/beam/pull/16721 which removed the `@Nullable` annotation
   
   Here is an example code:
   
   ```java
   /**
    * Keeps a state of current total. Keeps accumulating the new value to the total, and outputs the
    * total value.
    */
   class AccumulateDoFn extends DoFn<KV<String, Integer>, KV<String, Integer>> {
   
     private static final Logger LOGGER = LoggerFactory.getLogger(AccumulateDoFn.class);
   
     @StateId("PREV_SUM_STATE")
     private static final StateSpec<ValueState<Integer>> PREV_SUM_STATE =
         StateSpecs.value(BigEndianIntegerCoder.of());
   
     @ProcessElement
     public void process(
         @Element final KV<String, Integer> element,
         @Timestamp final Instant time,
         @StateId("PREV_SUM_STATE") final ValueState<Integer> prevSumState,
         final OutputReceiver<KV<String, Integer>> outputReceiver) {
   
       final String key = element.getKey();
       final Integer currentValue = element.getValue();
   
       Integer previousSum = prevSumState.read();
       if (previousSum == null) {
         // This line is executed. So `ReadableState.read()` can return null.
         LOGGER.info("Current state is null. Will initialize with 0");
         previousSum = 0;
       }
   
       final Integer newSum = previousSum + currentValue;
       LOGGER.info("For {}, after {}, new sum is {}", key, currentValue, newSum);
       prevSumState.write(newSum);
   
       outputReceiver.outputWithTimestamp(KV.of(key, newSum), time);
     }
   }
   
   
   /** Tests for {@link AccumulateDoFn}. */
   class AccumulateDoFnTest {
   
     @Test
     void testSum() {
   
       final TestPipeline pipeline = TestPipeline.create().enableAbandonedNodeEnforcement(false);
   
       final List<TimestampedValue<KV<String, Integer>>> values =
           List.of(
               TimestampedValue.of(KV.of("key1", 2), Instant.EPOCH),
               TimestampedValue.of(KV.of("key2", 3), Instant.EPOCH.plus(1L)),
               TimestampedValue.of(KV.of("key1", 5), Instant.EPOCH.plus(2L)));
   
       final PCollection<KV<String, Integer>> result =
           pipeline
               .apply(Create.timestamped(values))
               .apply(ParDo.of(new AccumulateDoFn()))
               .apply(Latest.perKey());
   
       PAssert.thatMap(result).isEqualTo(Map.of("key1", 7, "key2", 3));
       pipeline.run().waitUntilFinish();
     }
   }
   
   ```
   
   ### Issue Priority
   
   Priority: 3 (minor)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles closed issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
kennknowles closed issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized
URL: https://github.com/apache/beam/issues/24801


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jlentin commented on issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
jlentin commented on issue #24801:
URL: https://github.com/apache/beam/issues/24801#issuecomment-1370480799

   Great, thank you for the fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
lukecwik commented on issue #24801:
URL: https://github.com/apache/beam/issues/24801#issuecomment-1371196171

   > This is a good catch. It is actually a bit of a poor design. It would be preferable for an empty valuestate to throw no such element if it hasn't been written yet, since now we have the state "unwritten" and the state "written explicitly to null" that cannot be distinguished.
   
   Being able to differentiate between these two is important sometimes but I feel as though it is uncommon and users can always use bagstate to differentiate between unset and null value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #24801:
URL: https://github.com/apache/beam/issues/24801#issuecomment-1370447895

   This is a good catch. It is actually a bit of a poor design. It would be preferable for an empty valuestate to throw no such element if it hasn't been written yet, since now we have the state "unwritten" and the state "written explicitly to null" that cannot be distinguished.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
reuvenlax commented on issue #24801:
URL: https://github.com/apache/beam/issues/24801#issuecomment-1370346449

   @kennknowles ValueState (and related objects such as CombiningState) return null if the state has not been set yet, and this is how users check for that case today (related: adding a readWithDefault might simplify usage). This is a property of ValueState, i.e. it is true for every single type T in ValueState<T>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #24801:
URL: https://github.com/apache/beam/issues/24801#issuecomment-1370446007

   Ah, yes. Luke's change is right.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
lukecwik commented on issue #24801:
URL: https://github.com/apache/beam/issues/24801#issuecomment-1370241600

   In general `ReadableState.read()` should not be `@Nullable` but we should allow for the overrides like `ValueState` to specify that `T` can be `@Nullable` while others like `ListState` we should have `List<@Nullable T>`.
   
   There was some discussion on the mailing list about this:
   https://lists.apache.org/thread/71q9xtb9vgj08d0qnbjbp1p10gxnnn1x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #24801:
URL: https://github.com/apache/beam/issues/24801#issuecomment-1370258242

   yea, the `T` can be instantiated to `@Nullable Foo`. In most cases, generics should not be marked nullable because that is a property of the actual type parameter. The generic parameter can have lower/upper bounds that force it to always be a supertype or subtype of nullable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik closed issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
lukecwik closed issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized
URL: https://github.com/apache/beam/issues/24801


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on issue #24801: [Bug]: ReadableState.read() marked not nullable, but it does return null when not initialized

Posted by GitBox <gi...@apache.org>.
lukecwik commented on issue #24801:
URL: https://github.com/apache/beam/issues/24801#issuecomment-1370252865

   CC: @kennknowles 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org