You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/10/12 23:27:49 UTC

[GitHub] [beam] tvalentyn opened a new pull request #13081: Adds optional default value initializer in CombineFn.

tvalentyn opened a new pull request #13081:
URL: https://github.com/apache/beam/pull/13081


   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](htt
 ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam
 .apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website | Whitespace | Typescript
   --- | --- | --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/be
 am_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn edited a comment on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
tvalentyn edited a comment on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-707893036






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] kamilwu commented on a change in pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
kamilwu commented on a change in pull request #13081:
URL: https://github.com/apache/beam/pull/13081#discussion_r503794860



##########
File path: sdks/python/apache_beam/transforms/core.py
##########
@@ -888,13 +888,32 @@ class CombineFn(WithTypeHints, HasDisplayData, urns.RunnerApiFn):
   5. The extract_output operation is invoked on the final accumulator to get
      the output value.
 
-  Note: If this **CombineFn** is used with a transform that has defaults,
-  **apply** will be called with an empty list at expansion time to get the
-  default value.
   """
   def default_label(self):
     return self.__class__.__name__
 
+  def default_value(self, *args, **kwargs):
+    """Returns a default reduction of an empty input.
+
+    Some combiners require a default value when reducing an empty collection,
+    which may be necessary when combining elements in an empty window.
+
+    If **CombineFn** is used with a transform that requires defaults,
+    default_value may be called during transform expansion.
+
+    Args:
+      *args: Additional arguments and side inputs.
+      **kwargs: Additional arguments and side inputs.
+    """
+    # Defalut values may be evaluated at pipeline construction time.

Review comment:
       nit: Default

##########
File path: sdks/python/apache_beam/transforms/core.py
##########
@@ -888,13 +888,32 @@ class CombineFn(WithTypeHints, HasDisplayData, urns.RunnerApiFn):
   5. The extract_output operation is invoked on the final accumulator to get
      the output value.
 
-  Note: If this **CombineFn** is used with a transform that has defaults,
-  **apply** will be called with an empty list at expansion time to get the
-  default value.
   """
   def default_label(self):
     return self.__class__.__name__
 
+  def default_value(self, *args, **kwargs):
+    """Returns a default reduction of an empty input.
+
+    Some combiners require a default value when reducing an empty collection,
+    which may be necessary when combining elements in an empty window.
+
+    If **CombineFn** is used with a transform that requires defaults,
+    default_value may be called during transform expansion.
+
+    Args:
+      *args: Additional arguments and side inputs.
+      **kwargs: Additional arguments and side inputs.
+    """
+    # Defalut values may be evaluated at pipeline construction time.
+    # Make a copy to avoid passing any side-effects to the serialized pipeline
+    # representaiton.
+    combine_copy = copy.copy(self)

Review comment:
       The user can provide a combiner that has nested combiners, e.g. `TupleCombineFn`, so we should make a deep copy instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] github-actions[bot] commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-1039043990


   This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] InigoSJ commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
InigoSJ commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-707539905


   > R: @kamilwu
   > cc: @aaltay @robertwb @pabloem @InigoSJ who may have feedback.
   
   It looks so much better than `CombineWithoutDefaults`, thanks for that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
aaltay commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-707864764


   Renaming `CombinerWithoutDefaults` would be backwards incompatible. Users needs to change their potentially and I think we can avoid that part of the change. Rest of the change LGTM.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] github-actions[bot] closed pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #13081:
URL: https://github.com/apache/beam/pull/13081


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-777862734


   @robertwb I think this is still waiting for your review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] github-actions[bot] commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-1030825435


   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
robertwb commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-710668994


   Here's what I was thinking of in terms of deferring the call to `apply()` to workers: https://github.com/apache/beam/pull/13132
   
   I am still generally opposed to having a separate `default_value` method because it opens up ambiguity of when it is called instead of `apply([])` as well as surprising behavior if the two do not agree. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-707880011


   > Renaming `CombinerWithoutDefaults` would be backwards incompatible. Users needs to change their potentially and I think we can avoid that part of the change. Rest of the change LGTM.
   
   It's a fairly recent introduction, see: #12074, and not included in https://github.com/apache/beam/blob/c3477a8eadf3fef6c173166bb1f720ed6c8ffe42/sdks/python/apache_beam/transforms/combiners.py#L56.
   
   Do you still think renaming would be a concern? If so I can add an alias.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-707474965


   R: @kamilwu   
   cc: @aaltay @robertwb  @pabloem @InigoSJ  who may have feedback.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-707893036


   > Note that one can always override CombineFn.apply(), so this seems more about setup/teardown, right?
   Yes, setup/teardown was primary motivation for writing this up. 
   
   > The disadvantage of a special method means that the semantics are unclear of whether one is allowed to call apply() with an empty iterable... or is one forced to check if the iterable is empty to choose to call apply or default_value? I'm thinking about other places CombineFns may be used used, for example, in CombiningStates.
   
   I can clarify in the api that this is optional, and defaults to empty apply() call if we decide to proceed as is.
   
   > If the goal is just to avoid calling setup/teardown in the main program
   In your opinion, is calling setup/teardown in main program a concern at all? Perhaps I am overthinking this.
   
   > we can rewrite the InjectDefault inside CombineGlobally to take an iterable side input and invoke apply([]) manually (with the setup and teardown) iff the iterable is empty.
   So in this case evaluation of the default will happen after job submission?  I admit I don't fully follow the idea, so if writing a prototype of this idea is faster than explaining how to do it right to myself of Kamil, we'd appreciate a draft; thanks!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on pull request #13081: [BEAM-3376] Adds possibility to customize of default_value evaluation in CombineFn.

Posted by GitBox <gi...@apache.org>.
robertwb commented on pull request #13081:
URL: https://github.com/apache/beam/pull/13081#issuecomment-707877140


   Note that one can always override `CombineFn.apply()`, so this seems more about setup/teardown, right? 
   
   The disadvantage of a special method means that the semantics are unclear of whether one is allowed to call apply() with an empty iterable... or is one forced to check if the iterable is empty to choose to call apply or default_value? I'm thinking about other places CombineFns may be used used, for example, in CombiningStates. 
   
   If the goal is just to avoid calling setup/teardown in the main program, we can rewrite the `InjectDefault` inside `CombineGlobally` to take an iterable side input and invoke `apply([])` manually (with the setup and teardown) iff the iterable is empty. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org