You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/08/03 20:53:30 UTC

[GitHub] [beam] robertwb opened a new pull request #12459: [BEAM-9547] Simplify pandas implementation.

robertwb opened a new pull request #12459:
URL: https://github.com/apache/beam/pull/12459


   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](htt
 ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_P
 ostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/b
 eam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   ![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on pull request #12459: [BEAM-9547] Simplify pandas implementation.

Posted by GitBox <gi...@apache.org>.
robertwb commented on pull request #12459:
URL: https://github.com/apache/beam/pull/12459#issuecomment-668251189


   R: @apilloud


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #12459: [BEAM-9547] Simplify pandas implementation.

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #12459:
URL: https://github.com/apache/beam/pull/12459#discussion_r465257198



##########
File path: sdks/python/apache_beam/dataframe/frame_base.py
##########
@@ -147,7 +149,11 @@ def wrapper(*args, **kwargs):
       else:
         try:
           # pylint: disable=deprecated-method
-          ix = inspect.getargspec(func).args.index(key)
+          if sys.version_info < (3, ):

Review comment:
       nit: this appears more than once in this file, might it be worth adding a `getargspec` helper function?

##########
File path: sdks/python/apache_beam/dataframe/frame_base.py
##########
@@ -226,6 +232,68 @@ def wrapper(self, *args, **kwargs):
   return wrapper
 
 
+def maybe_inplace(func):
+  @functools.wraps(func)
+  def wrapper(self, inplace=False, **kwargs):
+    result = func(self, **kwargs)
+    if inplace:
+      self._expr = result._expr
+    else:
+      return result
+
+  return wrapper
+
+
+def args_to_kwargs(base_type):
+  def wrap(func):
+    if sys.version_info < (3, ):
+      getargspec = inspect.getargspec
+    else:
+      getargspec = inspect.getfullargspec
+
+    # This is used to map all positional arguments to keyword arguments.
+    base = getattr(base_type, func.__name__)
+    while hasattr(base, '__wrapped__'):
+      base = base.__wrapped__
+    base_argspec = getargspec(base)
+    arg_names = base_argspec.args
+
+    # These are used to populate any defaults in base for arguments in func.
+    if base_argspec.defaults:
+      arg_to_default = dict(
+          zip(
+              base_argspec.args[-len(base_argspec.defaults):],
+              base_argspec.defaults))
+    else:
+      arg_to_default = {}
+
+    unwrapped_func = func
+    while hasattr(unwrapped_func, '__wrapped__'):
+      unwrapped_func = unwrapped_func.__wrapped__
+    # args that do not have defaults in func, but do have defaults in base

Review comment:
       This comment is a little confusing, I believe it is describing the contents of `populate_defaults`?
   
   After spending some time reading through this, it might be clearer if you broke this into a function that returned `arg_to_default`, and a second that returned `populate_defaults`.

##########
File path: sdks/python/apache_beam/dataframe/frame_base.py
##########
@@ -226,6 +232,68 @@ def wrapper(self, *args, **kwargs):
   return wrapper
 
 
+def maybe_inplace(func):
+  @functools.wraps(func)
+  def wrapper(self, inplace=False, **kwargs):
+    result = func(self, **kwargs)
+    if inplace:
+      self._expr = result._expr
+    else:
+      return result
+
+  return wrapper
+
+
+def args_to_kwargs(base_type):

Review comment:
       This seems generic and useful enough that I'm surprised it isn't in a dataframes library.

##########
File path: sdks/python/apache_beam/dataframe/frame_base.py
##########
@@ -226,6 +232,68 @@ def wrapper(self, *args, **kwargs):
   return wrapper
 
 
+def maybe_inplace(func):
+  @functools.wraps(func)
+  def wrapper(self, inplace=False, **kwargs):
+    result = func(self, **kwargs)
+    if inplace:
+      self._expr = result._expr
+    else:
+      return result
+
+  return wrapper
+
+
+def args_to_kwargs(base_type):
+  def wrap(func):
+    if sys.version_info < (3, ):
+      getargspec = inspect.getargspec
+    else:
+      getargspec = inspect.getfullargspec
+
+    # This is used to map all positional arguments to keyword arguments.
+    base = getattr(base_type, func.__name__)
+    while hasattr(base, '__wrapped__'):
+      base = base.__wrapped__
+    base_argspec = getargspec(base)
+    arg_names = base_argspec.args
+
+    # These are used to populate any defaults in base for arguments in func.
+    if base_argspec.defaults:
+      arg_to_default = dict(

Review comment:
       `arg_to_default` might be clearer as `arg_to_base_default` or `base_defaults`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12459: [BEAM-9547] Simplify pandas implementation.

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12459:
URL: https://github.com/apache/beam/pull/12459#discussion_r465402767



##########
File path: sdks/python/apache_beam/dataframe/frame_base.py
##########
@@ -226,6 +232,68 @@ def wrapper(self, *args, **kwargs):
   return wrapper
 
 
+def maybe_inplace(func):
+  @functools.wraps(func)
+  def wrapper(self, inplace=False, **kwargs):
+    result = func(self, **kwargs)
+    if inplace:
+      self._expr = result._expr
+    else:
+      return result
+
+  return wrapper
+
+
+def args_to_kwargs(base_type):
+  def wrap(func):
+    if sys.version_info < (3, ):
+      getargspec = inspect.getargspec
+    else:
+      getargspec = inspect.getfullargspec
+
+    # This is used to map all positional arguments to keyword arguments.
+    base = getattr(base_type, func.__name__)
+    while hasattr(base, '__wrapped__'):
+      base = base.__wrapped__
+    base_argspec = getargspec(base)
+    arg_names = base_argspec.args
+
+    # These are used to populate any defaults in base for arguments in func.
+    if base_argspec.defaults:
+      arg_to_default = dict(
+          zip(
+              base_argspec.args[-len(base_argspec.defaults):],
+              base_argspec.defaults))
+    else:
+      arg_to_default = {}
+
+    unwrapped_func = func
+    while hasattr(unwrapped_func, '__wrapped__'):
+      unwrapped_func = unwrapped_func.__wrapped__
+    # args that do not have defaults in func, but do have defaults in base

Review comment:
       I think I never want one without the other, but I'll see what it looks like to break them up. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12459: [BEAM-9547] Simplify pandas implementation.

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12459:
URL: https://github.com/apache/beam/pull/12459#discussion_r465402482



##########
File path: sdks/python/apache_beam/dataframe/frame_base.py
##########
@@ -147,7 +149,11 @@ def wrapper(*args, **kwargs):
       else:
         try:
           # pylint: disable=deprecated-method
-          ix = inspect.getargspec(func).args.index(key)
+          if sys.version_info < (3, ):

Review comment:
       Good point. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb merged pull request #12459: [BEAM-9547] Simplify pandas implementation.

Posted by GitBox <gi...@apache.org>.
robertwb merged pull request #12459:
URL: https://github.com/apache/beam/pull/12459


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org