You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/02/08 23:55:06 UTC

[GitHub] [beam] dmkozh edited a comment on pull request #13739: [BEAM-11629] Don't attach window info to cached batch PCollections.

dmkozh edited a comment on pull request #13739:
URL: https://github.com/apache/beam/pull/13739#issuecomment-775544426


   > Even with the latest changes, this is still not writing the windowing information (including timestamps) to the cache.
   
   That's exactly the intent of the change - we don't want to cache trivial windowing information.
    
   > Maybe it would be helpful to understand what the objective of this change is?
   
   The objective is described in the attached ticket - basically, we don't want to cache redundant information at all, as it adds a huge overhead of ~500 bytes/record. It can be somewhat reduced, but it's still hundreds of bytes.  There may be some terminology confusion - by 'batch' pipelines I initially meant the pipelines which don't ever care about windowing as they process all the data at once.
   
   If there is a better way to figure out if the pipeline doesn't care about windowing, I could use that instead. Also, since this is an environment setting now, it should be pretty hard to get unexpected results (though for users who don't care about windowing there won't be an immediate benefit either...)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org