You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/05/01 00:01:23 UTC

[jira] [Work logged] (BEAM-11841) Optimize calculation of element size for iterables

     [ https://issues.apache.org/jira/browse/BEAM-11841?focusedWorklogId=591770&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591770 ]

ASF GitHub Bot logged work on BEAM-11841:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/May/21 00:01
            Start Date: 01/May/21 00:01
    Worklog Time Spent: 10m 
      Work Description: robertwb commented on a change in pull request #14610:
URL: https://github.com/apache/beam/pull/14610#discussion_r624307005



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java
##########
@@ -357,31 +357,34 @@ protected void reportElementSize(long elementSize) {
     }
 
     final Distribution distribution;
+    ByteSizeObserver byteCountObserver;
 
     public SampleByteSizeDistribution(Distribution distribution) {
       this.distribution = distribution;
+      this.byteCountObserver = null;
     }
 
     public void tryUpdate(T value, Coder<T> coder) throws Exception {
       if (shouldSampleElement()) {
         // First try using byte size observer
-        ByteSizeObserver observer = new ByteSizeObserver();
-        coder.registerByteSizeObserver(value, observer);
-
-        if (!observer.getIsLazy()) {
-          observer.advance();
-          this.distribution.update(observer.observedSize);
-        } else {
-          // TODO(BEAM-11841): Optimize calculation of element size for iterables.
-          // Coder byte size observation is lazy (requires iteration for observation) so fall back
-          // to counting output stream
-          CountingOutputStream os = new CountingOutputStream(ByteStreams.nullOutputStream());
-          coder.encode(value, os);
-          this.distribution.update(os.getCount());
+        byteCountObserver = new ByteSizeObserver();
+        coder.registerByteSizeObserver(value, byteCountObserver);
+
+        if (!byteCountObserver.getIsLazy()) {
+          byteCountObserver.advance();
+          this.distribution.update(byteCountObserver.observedSize);
         }
       }

Review comment:
       Else set the byteCountObserver to null (so that finishLazyUpdate becomes a no-op). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 591770)
    Time Spent: 50m  (was: 40m)

> Optimize calculation of element size for iterables
> --------------------------------------------------
>
>                 Key: BEAM-11841
>                 URL: https://issues.apache.org/jira/browse/BEAM-11841
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-harness
>            Reporter: Kiley Sok
>            Priority: P2
>              Labels: stale-P2
>          Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)