You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/04/23 20:38:35 UTC

[GitHub] [beam] lukecwik edited a comment on pull request #11454: [BEAM-8871] Support trySplit for ByteKeyRangeTracker

lukecwik edited a comment on pull request #11454:
URL: https://github.com/apache/beam/pull/11454#issuecomment-618654517


   > (Sorry for adding this but cannot find a better moment ot ask). Curious question related to fractional splits (but not to this PR). Would testing trySplit for a RestrictionTracker would be enough to test that a SDF based IO that uses this kind of Restriction supports DWR correctly?
   > 
   > I am asking because in HBaseIO based on SDF it is one of the missing tests compared with the tests we had in the past for `splitAtFranction`
   > 
   > https://github.com/apache/beam/blob/ec67a9374671ea9ae670fb0f3935ead2ebed7981/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java#L359-L384
   > 
   > If it is not the case, any suggestions on how to test this?
   
   Testing the restriction tracker is almost all that is needed for the most part for testing DWR.
   
   The last bit would be to make sure that the DoFn doesn't assume its processing the initial restriction and interacts with the restriction tracker correctly. An example of a splittable DoFn that would work a lot of the time but be incorrect when splitting occurs would be something like:
   ```
   @ProcessElement
   public void process(RestrictionTracker<OffsetRange, Long> foo) {
     int start = foo.getCurrentRestriction().getFrom();
     int end = foo.getCurrentRestriction().getTo();
     from (int i = start; i < end; ++i) {
       ... produce output ...
     }
     tracker.tryClaim(end);
   }
   ```
   All the examples I know of stem from producing output before claiming. Also, moving the tryClaim before the loop would make the splittable DoFn correct but would prevent it from being able to effectively split.
   
   Testing that output is only produced after the DoFn has done the claim call makes a lot of sense and ensuring that only the part that has been claimed was produced as output would cover correctness.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org