You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by GitBox <gi...@apache.org> on 2020/03/05 00:30:22 UTC

[GitHub] [beam] KevinGG opened a new pull request #11050: Beam 8335

KevinGG opened a new pull request #11050: Beam 8335
URL: https://github.com/apache/beam/pull/11050
 
 
   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) 
   Portable | --- | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388046234
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##########
 @@ -202,10 +202,20 @@ def has_source_to_cache(user_pipeline):
   if has_cache:
     if not isinstance(ie.current_env().cache_manager(),
                       streaming_cache.StreamingCache):
-      # TODO(BEAM-8335): convert the cache manager into a streaming cache
-      # manager. Note this does not invalidate the current cache including the
-      # source data capture.
-      pass
+      # Wrap the cache manager into a streaming cache manager. Note this
+      # does not invalidate the current cache manager.
+      def is_cache_complete():
+        job = ie.current_env().get_background_caching_job(user_pipeline)
+        is_done = job and job.is_done()
+        cache_changed = is_source_to_cache_changed(
+            user_pipeline, update_cached_source_signature=False)
+        return is_done and not cache_changed
+
+      ie.current_env().set_cache_manager(
+          streaming_cache.StreamingCache(
+              ie.current_env().cache_manager()._cache_dir,
+              is_cache_complete=is_cache_complete,
 
 Review comment:
   Do you mean to pass `is_cache_complete()` or `is_cache_complete` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388479176
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##########
 @@ -202,10 +202,20 @@ def has_source_to_cache(user_pipeline):
   if has_cache:
     if not isinstance(ie.current_env().cache_manager(),
                       streaming_cache.StreamingCache):
-      # TODO(BEAM-8335): convert the cache manager into a streaming cache
-      # manager. Note this does not invalidate the current cache including the
-      # source data capture.
-      pass
+      # Wrap the cache manager into a streaming cache manager. Note this
+      # does not invalidate the current cache manager.
+      def is_cache_complete():
+        job = ie.current_env().get_background_caching_job(user_pipeline)
+        is_done = job and job.is_done()
+        cache_changed = is_source_to_cache_changed(
+            user_pipeline, update_cached_source_signature=False)
+        return is_done and not cache_changed
+
+      ie.current_env().set_cache_manager(
+          streaming_cache.StreamingCache(
+              ie.current_env().cache_manager()._cache_dir,
+              is_cache_complete=is_cache_complete,
 
 Review comment:
   `is_cache_complete` is a function required by StreamingCache, not the return value of the function.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay merged pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay merged pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG edited a comment on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG edited a comment on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-594967107
 
 
   R: @rohdesamuel 
   R: @aaltay 
   
   Data captured from sources are stored in cache just like intermediate PCollections that are assigned to variables.
   The capture_size limit is only applied to disk usage of data captured from sources.
   
   The implementation of getting a capture cache file's size
   `os.stat(self._path).st_size`
   
   The implementation of summing up all capture cache file's sizes
   `sum([sink.size_in_bytes for _, sink in self._capture_sinks.items()])`
   
   They both locate in [this](https://github.com/apache/beam/pull/11050/commits/a6d9e2382eeea148b3f667726f8e8e8933a7196c#diff-e15d1558a3154511b759ef711deeaddb) change.
   
   Everything else is wiring, logging and testing.
   
   The first commit is a patch from Sam's ongoing PR, there is no need to review diff of it.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG edited a comment on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG edited a comment on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-594967107
 
 
   R: @rohdesamuel 
   R: @aaltay 
   
   Data captured from sources are stored in cache just like intermediate PCollections that are assigned to variables.
   The capture_size limit is only applied to disk usage of data captured from sources.
   
   The implementation of getting a capture cache file's size
   `os.stat(self._path).st_size`
   
   The implementation of summing up all capture cache file's sizes
   `sum([sink.size_in_bytes for _, sink in self._capture_sinks.items()])`
   
   They both locate in [this](https://github.com/apache/beam/pull/11050/commits/a6d9e2382eeea148b3f667726f8e8e8933a7196c#diff-e15d1558a3154511b759ef711deeaddb) change.
   
   Everything else is wiring, logging and testing.
   
   The first commit is a patch from Sam's ongoing PR, there is no need to review diff of it.
   Please directly review [diff](https://github.com/apache/beam/pull/11050/commits/a6d9e2382eeea148b3f667726f8e8e8933a7196c) of the second commit.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388482551
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##########
 @@ -88,7 +88,22 @@ def capture_duration(self, value):
     """
     self.capture_control._capture_duration = value
 
-  # TODO(BEAM-8335): add capture_size options when they are supported.
+  @property
+  def capture_size(self):
 
 Review comment:
   Renaming it to `capture_size_limit`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-598317875
 
 
   Discarded outdated changes from dependency PRs and squashed the empty patch commits.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-598313855
 
 
   Resolved merge conflicts and force pushed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-597831219
 
 
   Waiting on merge of https://github.com/apache/beam/pull/11005, then I'll resolve the merge conflicts.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-598337372
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388483873
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##########
 @@ -63,6 +63,14 @@ def path(self):
     """Returns the path the sink leads to."""
     return self._path
 
+  @property
+  def size_in_bytes(self):
+    """Returns the space usage in bytes of the sink."""
+    try:
+      return os.stat(self._path).st_size
+    except:
+      return 0
 
 Review comment:
   Could add a debug level logging.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-598363964
 
 
   Run Python PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388483271
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##########
 @@ -63,6 +63,14 @@ def path(self):
     """Returns the path the sink leads to."""
     return self._path
 
+  @property
+  def size_in_bytes(self):
+    """Returns the space usage in bytes of the sink."""
+    try:
+      return os.stat(self._path).st_size
 
 Review comment:
   It works. Though in some versions of some OSes, some of the fields of the returned stat might be dummy values.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388046153
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##########
 @@ -259,21 +269,31 @@ def is_source_to_cache_changed(
   # The computation of extract_unbounded_source_signature is expensive, track on
   # change by default.
   if is_changed and update_cached_source_signature:
-    if ie.current_env().options.enable_capture_replay:
-      # TODO(BEAM-8335): display rather than logging when is_in_notebook.
+    options = ie.current_env().options
+    # No info needed when capture replay is disabled.
+    if options.enable_capture_replay:
+
+      def sizeof_fmt(num, suffix='B'):
 
 Review comment:
   Should this be moved inside the if below? Is it used anywhere else?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388045279
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##########
 @@ -63,6 +63,14 @@ def path(self):
     """Returns the path the sink leads to."""
     return self._path
 
+  @property
+  def size_in_bytes(self):
+    """Returns the space usage in bytes of the sink."""
+    try:
+      return os.stat(self._path).st_size
 
 Review comment:
   Does this work in all OSes?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388479192
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##########
 @@ -259,21 +269,31 @@ def is_source_to_cache_changed(
   # The computation of extract_unbounded_source_signature is expensive, track on
   # change by default.
   if is_changed and update_cached_source_signature:
-    if ie.current_env().options.enable_capture_replay:
-      # TODO(BEAM-8335): display rather than logging when is_in_notebook.
+    options = ie.current_env().options
+    # No info needed when capture replay is disabled.
+    if options.enable_capture_replay:
+
+      def sizeof_fmt(num, suffix='B'):
 
 Review comment:
   No, it's not used anywhere else. Moving it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388045249
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##########
 @@ -63,6 +63,14 @@ def path(self):
     """Returns the path the sink leads to."""
     return self._path
 
+  @property
+  def size_in_bytes(self):
+    """Returns the space usage in bytes of the sink."""
+    try:
+      return os.stat(self._path).st_size
+    except:
+      return 0
 
 Review comment:
   Maybe log here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-598454768
 
 
   Run Python PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on a change in pull request #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#discussion_r388045784
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##########
 @@ -88,7 +88,22 @@ def capture_duration(self, value):
     """
     self.capture_control._capture_duration = value
 
-  # TODO(BEAM-8335): add capture_size options when they are supported.
+  @property
+  def capture_size(self):
 
 Review comment:
   Shoulds this have "limit" or "max" in its name, or something to indicate that this is an upper limit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-594967107
 
 
   R: @rohdesamuel 
   R: @aaltay 
   
   Data captured from sources are stored in cache just like intermediate PCollections that are assigned to variables.
   The capture_size limit is only applied to disk usage of data captured from sources.
   
   The implementation of getting a capture cache file's size
   `os.stat(self._path).st_size`
   
   The implementation of summing up all capture cache file's sizes
   `sum([sink.size_in_bytes for _, sink in self._capture_sinks.items()])`
   
   Everything else is wiring, logging and testing.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-598363121
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-598362994
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation

Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation
URL: https://github.com/apache/beam/pull/11050#issuecomment-598363209
 
 
   Run Python PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services