You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by GitBox <gi...@apache.org> on 2020/03/23 20:57:55 UTC

[GitHub] [beam] KevinGG commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised

KevinGG commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised
URL: https://github.com/apache/beam/pull/11174#issuecomment-602852406
 
 
   Fixed a failed test.
   
   TL;DR, the test was wrong and passed in the past because it got lucky.
   
   The test used to pass and started failing with the try-append-finally-pop change, because:
   1. The test tried to append two no name / label transforms into a same pipeline, both of them will generate the same label and raise errors while the tests asserted for errors, thus allowing pipeline construction to continue even if it ran into fatal errors;
   2. In the past, the pipeline was in a broken state where the current_transform was not popped when the 1st error was raised. The full label generated was `WriteToBigQuery`;
   3. In the past, the second transform appending wrongly added parts into the broken current_transform `WriteToBigQuery`, causing the transform node to be appended after a wrong failed parent node and luckily got a unique transform full label `WriteToBigQuery/WriteToBigQuery`, thus the test didn't fail due to duplicated labels. But the pipeline was constructed as `WriteToBigQuery`->`WriteToBigQuery` which was completely messed up;
   4. Still the test didn't fail because it was testing for errors even though it built a broken pipeline with broken usages.
   
   Once the try-append-finally-pop change is applied, the test functions as:
   1. First transform still fails to be appended and has side-effects including an applied transform label, but it would not leave the pipeline in a broken state where current transform is still the right node;
   2. Second transform still fails but now appended to the correct node;
   3. As long as the 2 transforms have different labels (or executed in different cells if in an interactive environment), the test still passes, pipeline is not broken and can be used for future development, side-effects are ruled out when data-centric APIs such as `show` and `collect` are invoked.
   
   The reason we keep the applied label is that we never know what side effects are when the error is raised.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services