You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "lukecwik (via GitHub)" <gi...@apache.org> on 2023/02/03 19:28:56 UTC

[GitHub] [beam] lukecwik commented on a diff in pull request #25219: Optimize PGBK table to only update cache when there is a large enough size change. #21250

lukecwik commented on code in PR #25219:
URL: https://github.com/apache/beam/pull/25219#discussion_r1096182774


##########
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy:
##########
@@ -1553,7 +1553,7 @@ class BeamModulePlugin implements Plugin<Project> {
             if (project.file("/opt/cprof/profiler_java_agent.so").exists()) {
               def gcpProject = project.findProperty('gcpProject') ?: 'apache-beam-testing'
               def userName = System.getProperty("user.name").toLowerCase().replaceAll(" ", "_")
-              jvmArgs '-agentpath:/opt/cprof/profiler_java_agent.so=-cprof_service=' + userName + "_" + project.getProperty("benchmark").toLowerCase() + '_' + System.currentTimeMillis() + ',-cprof_project_id=' + gcpProject + ',-cprof_zone_name=us-central1-a'
+              jvmArgs '-agentpath:/opt/cprof/profiler_java_agent.so=-cprof_service=' + userName + "_" + project.getProperty("benchmark").toLowerCase() + '_' + String.format('%1$tm%1$td%1$tY_%1$tH%1$tM%1$tS%1$tL', System.currentTimeMillis()) + ',-cprof_project_id=' + gcpProject + ',-cprof_zone_name=us-central1-a'

Review Comment:
   Swapped to make year first.
   
   Unfortunately we are limited to `^[a-z0-9]([-a-z0-9_.]{0,253}[a-z0-9])?$` based upon https://cloud.google.com/profiler/docs/profiling-java#service_name_and_version_arguments



##########
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/Caches.java:
##########
@@ -53,12 +53,16 @@ public final class Caches {
    */
   @VisibleForTesting static final int WEIGHT_RATIO = 6;
 
+  /** Objects which change in this amount should always update the cache. */
+  private static final long CACHE_SIZE_CHANGE_LIMIT_BYTES = 1 << 16;

Review Comment:
   Added comment.
   
   Not expected to help avoid OOMs we saw on n1-standard-1



##########
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/Caches.java:
##########
@@ -73,6 +77,25 @@ public static long weigh(Object o) {
     }
   }
 
+  /**
+   * Returns whether the cache should be updated in the case where the objects size has changed.
+   *
+   * <p>Note that this should only be used in the case where the cache is being updated very often
+   * in a tight loop and is not a good fit for cases where the object being cached is the result of
+   * an expensive operation like a disk read or remote service call.
+   */
+  public static boolean shouldUpdateOnSizeChange(long oldSize, long newSize) {
+    /*
+    Our strategy is three fold:
+    - tiny objects (<= 2^WEIGHT_RATIO) don't change the amount being weighed
+    - large changes (>= CACHE_SIZE_CHANGE_LIMIT_BYTES) should always update the size
+    - all others if the size changed by a factor of 2
+    */
+    return (oldSize > 1 << WEIGHT_RATIO || newSize > 1 << WEIGHT_RATIO)

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org