You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/09/21 01:20:00 UTC

[jira] [Work logged] (BEAM-10200) Improve memory profiling for users of Portable Beam Python

     [ https://issues.apache.org/jira/browse/BEAM-10200?focusedWorklogId=486733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-486733 ]

ASF GitHub Bot logged work on BEAM-10200:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Sep/20 01:19
            Start Date: 21/Sep/20 01:19
    Worklog Time Spent: 10m 
      Work Description: angoenka commented on a change in pull request #12562:
URL: https://github.com/apache/beam/pull/12562#discussion_r491759030



##########
File path: sdks/python/apache_beam/utils/profiler.py
##########
@@ -44,59 +42,91 @@
 
 
 class Profile(object):
-  """cProfile wrapper context for saving and logging profiler results."""
+  """cProfile and Heapy wrapper context for saving and logging profiler
+  results."""
 
   SORTBY = 'cumulative'
 
   def __init__(
       self,
-      profile_id,
-      profile_location=None,
-      log_results=False,
-      file_copy_fn=None,
-      time_prefix='%Y-%m-%d_%H_%M_%S-'):
+      profile_id, # type: str
+      profile_location=None, # type: Optional[str]
+      log_results=False, # type: bool
+      file_copy_fn=None, # type: Optional[Callable[[str, str], None]]
+      time_prefix='%Y-%m-%d_%H_%M_%S-', # type: str
+      enable_cpu_profiling=False, # type: bool
+      enable_memory_profiling=False, # type: bool
+  ):
+    """Creates a Profile object.
+
+    Args:
+      profile_id: Unique id of the profiling session.
+      profile_location: The file location where the profiling results will be
+        stored.
+      log_results: Log the result to console if true.
+      file_copy_fn: Lambda function for copying files.
+      time_prefix: Format of the timestamp prefix in profiling result files.
+      enable_cpu_profiling: CPU profiler will be enabled during the profiling
+        session.
+      enable_memory_profiling: Memory profiler will be enabled during the
+        profiling session, the profiler only records the newly allocated objects
+        in this session.
+    """
     self.stats = None
     self.profile_id = str(profile_id)
     self.profile_location = profile_location
     self.log_results = log_results
     self.file_copy_fn = file_copy_fn or self.default_file_copy_fn
     self.time_prefix = time_prefix
     self.profile_output = None
+    self.enable_cpu_profiling = enable_cpu_profiling
+    self.enable_memory_profiling = enable_memory_profiling
 
   def __enter__(self):
     _LOGGER.info('Start profiling: %s', self.profile_id)
-    self.profile = cProfile.Profile()
-    self.profile.enable()
+    if self.enable_cpu_profiling:
+      self.profile = cProfile.Profile()
+      self.profile.enable()
+    if self.enable_memory_profiling:
+      try:
+        from guppy import hpy
+        self.hpy = hpy()
+        self.hpy.setrelheap()
+      except ImportError:

Review comment:
       Let's log the import failure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 486733)
    Time Spent: 2h  (was: 1h 50m)

> Improve memory profiling for users of Portable Beam Python
> ----------------------------------------------------------
>
>                 Key: BEAM-10200
>                 URL: https://issues.apache.org/jira/browse/BEAM-10200
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-harness
>            Reporter: Valentyn Tymofieiev
>            Assignee: Yichi Zhang
>            Priority: P2
>              Labels: stale-P2, stale-assigned, starter
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> We have a Profiler[1] that is integrated with SDK worker[1a], however it only saves CPU metrics [1b].
> We have a MemoryReporter util[2] which can log heap dumps, however it is not documented on Beam Website and does not respect the --profile_memory and --profile_location options[3]. The profile_memory flag currently works only for  Dataflow Runner users who run non-portable batch pipelines;  profiles are saved only if memory usage between samples exceeds 1000M. 
> We should improve memory profiling experience for Portable Python users and consider making a guide on how users can investigate OOMing pipelines on Beam website.
>  
> [1] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L46
> [1a] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L157
> [1b] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L112
> [2] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L124
> [3] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/options/pipeline_options.py#L846



--
This message was sent by Atlassian Jira
(v8.3.4#803005)