You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/09/30 18:59:12 UTC

[GitHub] [beam] lukecwik commented on a change in pull request #15540: [BEAM-12931] Allow for DoFn#getAllowedTimestampSkew() when checking the output timestamp

lukecwik commented on a change in pull request #15540:
URL: https://github.com/apache/beam/pull/15540#discussion_r719664915



##########
File path: runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java
##########
@@ -386,6 +399,180 @@ public void testInfiniteSkew() {
             BoundedWindow.TIMESTAMP_MAX_VALUE));
   }
 
+  @Test
+  @Category({ValidatesRunner.class})
+  public void testRunnerNoSkew() {
+    PCollection<Duration> input =
+        p.apply("create", Create.timestamped(Arrays.asList(new Duration(0L), new Duration(1L)), Arrays.asList(0L, 1L)));
+    input.apply(ParDo.of(new SkewingDoFn(Duration.ZERO)));
+    // The errors differ between runners but at least check that the output timestamp is printed.
+    thrown.expectMessage(String.format("%s", new Instant(0L)));

Review comment:
       Note that you can solve this problem by having the DoFn catch the exception from within the processElement method outputting `success` or `exception stack trace/message` messages and performing a PAssert on the output PCollection.

##########
File path: sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
##########
@@ -1604,6 +1606,11 @@ public ProcessContinuation processElement(
         }
       }
 
+      @Override
+      public Duration getAllowedTimestampSkew() {

Review comment:
       Why this change?

##########
File path: runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
##########
@@ -1190,13 +1190,20 @@ public Timer withOutputTimestamp(Instant outputTimestamp) {
      * </ul>
      */
     private void setAndVerifyOutputTimestamp() {
-
       if (outputTimestamp != null) {
-        checkArgument(
-            !outputTimestamp.isBefore(elementInputTimestamp),
-            "output timestamp %s should be after input message timestamp or output timestamp of firing timers %s",
-            outputTimestamp,
-            elementInputTimestamp);
+        // The documentation of getAllowedTimestampSkew explicitly permits Long.MAX_VALUE to be used
+        // for infinite skew. Defend against underflow in that case for timestamps before the epoch.
+        if (fn.getAllowedTimestampSkew().getMillis() != Long.MAX_VALUE

Review comment:
       You can handle the proper bounds via:
   ```
   Instant lowerBound;
   try {
      lowerBound = elementInputTimestamp.minus(fn.getAllowedTimestampSkew());
   catch (ArithmeticException e) {
      lowerBound = BoundedWindow.TIMESTAMP_MIN_VALUE;
   }
   if (outputTimestamp.isBefore(lowerBound)) {
     ...
   }
   ```
   
   Finally it would make sense to check the upper bound as well of `BoundedWindow.TIMESTAMP_MAX_VALUE`

##########
File path: sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
##########
@@ -164,7 +166,7 @@
 @SuppressWarnings({
   "rawtypes", // TODO(https://issues.apache.org/jira/browse/BEAM-10556)
 })
-public class FnApiDoFnRunnerTest implements Serializable {

Review comment:
       Why drop serializable?

##########
File path: runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java
##########
@@ -386,6 +399,180 @@ public void testInfiniteSkew() {
             BoundedWindow.TIMESTAMP_MAX_VALUE));
   }
 
+  @Test
+  @Category({ValidatesRunner.class})

Review comment:
       Need to tag with `UsesTimersInParDo.class` and/or `UsesTimerMap.class`.

##########
File path: sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
##########
@@ -3878,6 +3885,186 @@ public void testProcessElementForWindowedTruncateAndSizeRestriction() throws Exc
     }
   }
 
+  @RunWith(JUnit4.class)
+  public static class ExceptionThrowingExecutionTest {
+    @Rule public final ExpectedException thrown = ExpectedException.none();
+
+    public static final String TEST_TRANSFORM_ID = "pTransformId";
+
+    /**
+     * A {@link DoFn} that outputs elements with timestamp equal to the input timestamp minus the
+     * input element.
+     */
+    private static class SkewingDoFn extends DoFn<String, String> {
+      private final Duration allowedSkew;
+
+      private SkewingDoFn(Duration allowedSkew) {
+        this.allowedSkew = allowedSkew;
+      }
+
+      @ProcessElement
+      public void processElement(ProcessContext context) {
+        Duration duration = new Duration(Long.valueOf(context.element()));
+        context.outputWithTimestamp(context.element(), context.timestamp().minus(duration));
+      }
+
+      @Override
+      public Duration getAllowedTimestampSkew() {
+        return allowedSkew;
+      }
+    }
+
+    @Test
+    public void testDoFnSkewNotAllowed() throws Exception {
+      Pipeline p = Pipeline.create();
+      PCollection<String> valuePCollection = p.apply(Create.of("0", "1"));
+      PCollection<String> outputPCollection =
+          valuePCollection.apply(TEST_TRANSFORM_ID, ParDo.of(new SkewingDoFn(Duration.ZERO)));
+
+      SdkComponents sdkComponents = SdkComponents.create(p.getOptions());
+      RunnerApi.Pipeline pProto = PipelineTranslation.toProto(p, sdkComponents);
+      String inputPCollectionId = sdkComponents.registerPCollection(valuePCollection);
+      String outputPCollectionId = sdkComponents.registerPCollection(outputPCollection);
+      RunnerApi.PTransform pTransform =
+          pProto
+              .getComponents()
+              .getTransformsOrThrow(
+                  pProto
+                      .getComponents()
+                      .getTransformsOrThrow(TEST_TRANSFORM_ID)
+                      .getSubtransforms(0));
+
+      List<WindowedValue<String>> mainOutputValues = new ArrayList<>();
+      MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap();
+      PCollectionConsumerRegistry consumers =
+          new PCollectionConsumerRegistry(
+              metricsContainerRegistry, mock(ExecutionStateTracker.class));
+
+      consumers.register(
+          outputPCollectionId,
+          TEST_TRANSFORM_ID,
+          (FnDataReceiver) (FnDataReceiver<WindowedValue<String>>) mainOutputValues::add,
+          StringUtf8Coder.of());
+      PTransformFunctionRegistry startFunctionRegistry =
+          new PTransformFunctionRegistry(
+              mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start");
+      PTransformFunctionRegistry finishFunctionRegistry =
+          new PTransformFunctionRegistry(
+              mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish");
+      List<ThrowingRunnable> teardownFunctions = new ArrayList<>();
+
+      new FnApiDoFnRunner.Factory<>()
+          .createRunnerForPTransform(
+              PipelineOptionsFactory.create(),
+              null /* beamFnDataClient */,
+              null /**/,
+              null /* beamFnTimerClient */,
+              TEST_TRANSFORM_ID,
+              pTransform,
+              Suppliers.ofInstance("57L")::get,
+              pProto.getComponents().getPcollectionsMap(),
+              pProto.getComponents().getCodersMap(),
+              pProto.getComponents().getWindowingStrategiesMap(),
+              consumers,
+              startFunctionRegistry,
+              finishFunctionRegistry,
+              null /* addResetFunction */,
+              teardownFunctions::add,
+              null /* addProgressRequestCallback */,
+              null /* splitListener */,
+              null /* bundleFinalizer */);
+
+      thrown.expect(UserCodeException.class);
+      thrown.expectMessage(
+          String.format("timestamp %s", new Instant(0).minus(new Duration(1L))));
+      thrown.expectMessage(
+          String.format(
+              "allowed skew (%s)",
+              PeriodFormat.getDefault().print(Duration.ZERO.toPeriod())));
+
+      Iterables.getOnlyElement(startFunctionRegistry.getFunctions()).run();
+      mainOutputValues.clear();
+
+      FnDataReceiver<WindowedValue<?>> mainInput =
+          consumers.getMultiplexingConsumer(inputPCollectionId);
+      mainInput.accept(valueInGlobalWindow("0"));
+      mainInput.accept(
+          timestampedValueInGlobalWindow("1", new Instant(0L)));
+    }
+
+    @Test
+    public void testDoFnSkewAllowed() throws Exception {
+      Pipeline p = Pipeline.create();
+      PCollection<String> valuePCollection = p.apply(Create.of("0", "3"));
+      PCollection<String> outputPCollection =
+          valuePCollection.apply(TEST_TRANSFORM_ID, ParDo.of(new SkewingDoFn(new Duration(5L))));
+
+      SdkComponents sdkComponents = SdkComponents.create(p.getOptions());
+      RunnerApi.Pipeline pProto = PipelineTranslation.toProto(p, sdkComponents);
+      String inputPCollectionId = sdkComponents.registerPCollection(valuePCollection);
+      String outputPCollectionId = sdkComponents.registerPCollection(outputPCollection);
+      RunnerApi.PTransform pTransform =
+          pProto
+              .getComponents()
+              .getTransformsOrThrow(
+                  pProto
+                      .getComponents()
+                      .getTransformsOrThrow(TEST_TRANSFORM_ID)
+                      .getSubtransforms(0));
+
+      List<WindowedValue<String>> mainOutputValues = new ArrayList<>();
+      MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap();
+      PCollectionConsumerRegistry consumers =
+          new PCollectionConsumerRegistry(
+              metricsContainerRegistry, mock(ExecutionStateTracker.class));
+
+      consumers.register(
+          outputPCollectionId,
+          TEST_TRANSFORM_ID,
+          (FnDataReceiver) (FnDataReceiver<WindowedValue<String>>) mainOutputValues::add,
+          StringUtf8Coder.of());
+      PTransformFunctionRegistry startFunctionRegistry =
+          new PTransformFunctionRegistry(
+              mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start");
+      PTransformFunctionRegistry finishFunctionRegistry =
+          new PTransformFunctionRegistry(
+              mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish");
+      List<ThrowingRunnable> teardownFunctions = new ArrayList<>();
+
+      new FnApiDoFnRunner.Factory<>()
+          .createRunnerForPTransform(
+              PipelineOptionsFactory.create(),
+              null /* beamFnDataClient */,
+              null /**/,

Review comment:
       ```suggestion
                 null /* beamFnStateClient */,
   ```

##########
File path: sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
##########
@@ -3878,6 +3885,186 @@ public void testProcessElementForWindowedTruncateAndSizeRestriction() throws Exc
     }
   }
 
+  @RunWith(JUnit4.class)
+  public static class ExceptionThrowingExecutionTest {
+    @Rule public final ExpectedException thrown = ExpectedException.none();
+
+    public static final String TEST_TRANSFORM_ID = "pTransformId";
+
+    /**
+     * A {@link DoFn} that outputs elements with timestamp equal to the input timestamp minus the
+     * input element.
+     */
+    private static class SkewingDoFn extends DoFn<String, String> {
+      private final Duration allowedSkew;
+
+      private SkewingDoFn(Duration allowedSkew) {
+        this.allowedSkew = allowedSkew;
+      }
+
+      @ProcessElement
+      public void processElement(ProcessContext context) {
+        Duration duration = new Duration(Long.valueOf(context.element()));
+        context.outputWithTimestamp(context.element(), context.timestamp().minus(duration));
+      }
+
+      @Override
+      public Duration getAllowedTimestampSkew() {
+        return allowedSkew;
+      }
+    }
+
+    @Test
+    public void testDoFnSkewNotAllowed() throws Exception {
+      Pipeline p = Pipeline.create();
+      PCollection<String> valuePCollection = p.apply(Create.of("0", "1"));
+      PCollection<String> outputPCollection =
+          valuePCollection.apply(TEST_TRANSFORM_ID, ParDo.of(new SkewingDoFn(Duration.ZERO)));
+
+      SdkComponents sdkComponents = SdkComponents.create(p.getOptions());
+      RunnerApi.Pipeline pProto = PipelineTranslation.toProto(p, sdkComponents);
+      String inputPCollectionId = sdkComponents.registerPCollection(valuePCollection);
+      String outputPCollectionId = sdkComponents.registerPCollection(outputPCollection);
+      RunnerApi.PTransform pTransform =
+          pProto
+              .getComponents()
+              .getTransformsOrThrow(
+                  pProto
+                      .getComponents()
+                      .getTransformsOrThrow(TEST_TRANSFORM_ID)
+                      .getSubtransforms(0));
+
+      List<WindowedValue<String>> mainOutputValues = new ArrayList<>();
+      MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap();
+      PCollectionConsumerRegistry consumers =
+          new PCollectionConsumerRegistry(
+              metricsContainerRegistry, mock(ExecutionStateTracker.class));
+
+      consumers.register(
+          outputPCollectionId,
+          TEST_TRANSFORM_ID,
+          (FnDataReceiver) (FnDataReceiver<WindowedValue<String>>) mainOutputValues::add,
+          StringUtf8Coder.of());
+      PTransformFunctionRegistry startFunctionRegistry =
+          new PTransformFunctionRegistry(
+              mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start");
+      PTransformFunctionRegistry finishFunctionRegistry =
+          new PTransformFunctionRegistry(
+              mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish");
+      List<ThrowingRunnable> teardownFunctions = new ArrayList<>();
+
+      new FnApiDoFnRunner.Factory<>()
+          .createRunnerForPTransform(
+              PipelineOptionsFactory.create(),
+              null /* beamFnDataClient */,
+              null /**/,

Review comment:
       ```suggestion
                 null /* beamFnStateClient */,
   ```

##########
File path: runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java
##########
@@ -472,6 +659,40 @@ public Duration getAllowedTimestampSkew() {
     }
   }
 
+  /**
+   * A {@link DoFn} that creates/sets a timer with an output timestamp equal to the input timestamp
+   * minus the input element's value. Keys are ignored but required for timers.
+   */
+  private static class TimerSkewingDoFn extends DoFn<KV<String, Duration>, Duration> {
+    static final String TIMER_ID = "testTimerFamily";
+    private final Duration allowedSkew;
+
+    @TimerFamily(TIMER_ID)
+    private final TimerSpec timers = TimerSpecs.timerMap(TimeDomain.PROCESSING_TIME);

Review comment:
       Can you not use `TimerMap` as it limits the number of runners this can run on and use individual timers?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org