You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2021/06/28 19:14:06 UTC

[GitHub] [samza] cameronlee314 commented on pull request #1506: SAMZA-2303: Exclude side inputs when handling end-of-stream and watermarks for high-level

cameronlee314 commented on pull request #1506:
URL: https://github.com/apache/samza/pull/1506#issuecomment-869951777


   > 1. Would it better to have a separation within TaskModel instead of conflating SSPs into both inputs & side inputs? Doing so, will eliminate yet another hack addition to `TaskContextImpl` and potentially solve scenarios that require this divide information upstream. Checking if you have evaluated this option and what the initial thoughts are.
   
   I didn't really consider changing the `TaskModel`. My initial thoughts are that if we think it is useful for the general `TaskModel` API to separate out the side inputs (for use by actual applications), then that could help the implementation here. However, if this won't be helpful for the general API, then I don't think we should bloat/modify the API just to fix this issue (for backwards compat, we probably would need to add a new field and not modify the current one). I'm not currently convinced that this is useful enough for the general API, so I don't think that we should change the general API for this bug fix. What do you think?
   I think the root reason for why we need this `TaskContextImpl` hack is because we don't have a clean way to wire in internal components for the high-level implementation (e.g. we also need the `TaskContextImpl` hack to wire through `StreamMetadataCache` into high-level). I don't see a clean way to fix this root reason right now though.
   
   > 3. Looks like fix for SAMZA-2300 is going in as part of this PR. Why is that? Can we separate it into another PR to keep the scope of the PR to single issue?
   
   Answering #3 first: In order to test the side inputs fix (i.e. `TestLocalTableWithSideInputsEndToEnd`), I needed to modify existing tests to use multiple partitions. The current tests (only single partition) don't rely on the end-of-stream propagation. This is why we didn't recognize this bug with the current tests. However, when I switch to using multiple partitions, then the bug in SAMZA-2300 gets triggered. So it was not ideal to split them up. I also don't think it is ideal to fix both tickets in the same commit (since it makes the single commit more complex), but I felt that the benefit towards having more complete tests for the side inputs fix (by merging into a single commit) was better. 
   
   > 2. Can we separate out the tests into two categories (one that needs to belong as part of the fix vs one that isn't) and have a separate PR for the latter category of tests?
   
   I believe the only test changes that aren't as directly related to the side inputs fix is `TestLocalTableEndToEnd`. However, it does help add confidence that the SAMZA-2300 change is working as expected, so that is why I included the changes to `TestLocalTableEndToEnd` here. I do have another set of changes for migrating other tests which I did keep separate (I will post a PR for those when this one is done), but I wanted to at least have this one test in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org