You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/08/12 11:21:04 UTC

[GitHub] [flink-training] NicoK commented on a change in pull request #13: [FLINK-18630] Redo the LongRideAlerts solutions to cover a corner case

NicoK commented on a change in pull request #13:
URL: https://github.com/apache/flink-training/pull/13#discussion_r469179715



##########
File path: long-ride-alerts/DISCUSSION.md
##########
@@ -21,7 +21,23 @@ under the License.
 
 (Discussion of [Lab: `ProcessFunction` and Timers (Long Ride Alerts)](./))
 
+It would be interesting to test that the solution does not leak state.
 
+A good way to write unit tests for a `KeyedProcessFunction` that check for state retention, etc., is to
+use the test harnesses described in the
+[documentation on testing](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#unit-testing-stateful-or-timely-udfs--custom-operators). 
+
+In fact, the reference solutions will leak state in the case where a START event is missing. They also
+leak in the case where the alert is generated, but then the END event does eventually arrive (after `onTimer()`
+has cleared the matching START event).
+
+This could be addressed either by using [state TTL](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl),
+or by using another timer that eventually
+clears any remaining state. There is a tradeoff here, however: once that state has been removed,
+then if the matching events are't actually missing, but are instead very, very late, they will cause erroneous alerts.

Review comment:
       ```suggestion
   then if the matching events are not actually missing, but are instead very, very late, they will cause erroneous alerts.
   ```

##########
File path: long-ride-alerts/DISCUSSION.md
##########
@@ -21,7 +21,23 @@ under the License.
 
 (Discussion of [Lab: `ProcessFunction` and Timers (Long Ride Alerts)](./))
 
+It would be interesting to test that the solution does not leak state.
 
+A good way to write unit tests for a `KeyedProcessFunction` that check for state retention, etc., is to

Review comment:
       ```suggestion
   A good way to write unit tests for a `KeyedProcessFunction` to check for state retention, etc., is to
   ```

##########
File path: long-ride-alerts/README.md
##########
@@ -62,13 +62,11 @@ The resulting stream should be printed to standard out.
 <details>
 <summary><strong>Overall approach</strong></summary>
 
-This exercise revolves around using a `ProcessFunction` to manage some keyed state and event time timers, and doing so in a way that works even when the END event for a given `rideId` arrives before the START (which will happen). The challenge is figuring out what state to keep, and when to set and clear that state.
-</details>
-
-<details>
-<summary><strong>Timers and State</strong></summary>
-
-You will want to use event time timers that fire two hours after the incoming events, and in the `onTimer()` method, collect START events to the output only if a matching END event hasn't yet arrived. As for what state to keep, it is enough to remember the "last" event for each `rideId`, where "last" is based on event time and ride type (START vs END &mdash; yes, there are rides where the START and END have the same timestamp), rather than the order in which the events are processed. The `TaxiRide` class implements `Comparable`; feel free to take advantage of that, and be sure to eventually clear any state you create.

Review comment:
       Why did you remove the hints on what state to keep?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org