You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/03/18 14:28:07 UTC

[GitHub] [beam] scwhittle opened a new pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

scwhittle opened a new pull request #17121:
URL: https://github.com/apache/beam/pull/17121


   The cache provided to the StateFetchingIterator is already a subcache
   based upon the StateKey so we can just us a singleton null key within
   the subcache. See FnApiStateAccessor.getCacheFor.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #17121:
URL: https://github.com/apache/beam/pull/17121#issuecomment-1075450327


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik merged pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik merged pull request #17121:
URL: https://github.com/apache/beam/pull/17121


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #17121:
URL: https://github.com/apache/beam/pull/17121#discussion_r830260789



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java
##########
@@ -84,10 +83,7 @@ private StateFetchingIterators() {}
       StateRequest stateRequestForFirstChunk,
       Coder<T> valueCoder) {
     return new CachingStateIterable<>(
-        (Cache<StateKey, Blocks<T>>) cache,
-        beamFnStateClient,
-        stateRequestForFirstChunk,
-        valueCoder);
+        (Cache<Object, Blocks<T>>) cache, beamFnStateClient, stateRequestForFirstChunk, valueCoder);

Review comment:
       I would suggest using a static object that is weighted that is internal. Using `null` hides useful information when debugging the caches contents. Please also update the method comment saying that the state cache should be namespaced for this object to prevent collisions (like BagUserState).
   
   Like 
   ```
   private static class IterableCacheKey implements Weighted {
     static INSTANCE = new IterableCacheKey();
     static long weight = Caches.weigh(INSTANCE);
     getWeight() {
       return weight;
     }
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #17121:
URL: https://github.com/apache/beam/pull/17121#issuecomment-1075791084


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #17121:
URL: https://github.com/apache/beam/pull/17121#discussion_r830260789



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java
##########
@@ -84,10 +83,7 @@ private StateFetchingIterators() {}
       StateRequest stateRequestForFirstChunk,
       Coder<T> valueCoder) {
     return new CachingStateIterable<>(
-        (Cache<StateKey, Blocks<T>>) cache,
-        beamFnStateClient,
-        stateRequestForFirstChunk,
-        valueCoder);
+        (Cache<Object, Blocks<T>>) cache, beamFnStateClient, stateRequestForFirstChunk, valueCoder);

Review comment:
       I would suggest using a static object that is weighted that is internal. Using `null` hides useful information when debugging the caches contents. Please also update the method comment saying that the state cache should be namespaced for this object to prevent collisions (like BagUserState).
   
   Like 
   ```
   private static class IterableCacheKey implements Weighted {
     static INSTANCE = new IterableCacheKey();
     getWeight() {
       return 0;
     }
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #17121:
URL: https://github.com/apache/beam/pull/17121#discussion_r831553441



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java
##########
@@ -84,12 +84,27 @@ private StateFetchingIterators() {}
       StateRequest stateRequestForFirstChunk,
       Coder<T> valueCoder) {
     return new CachingStateIterable<>(
-        (Cache<StateKey, Blocks<T>>) cache,
+        (Cache<IterableCacheKey, Blocks<T>>) cache,
         beamFnStateClient,
         stateRequestForFirstChunk,
         valueCoder);
   }
 
+  @VisibleForTesting
+  static class IterableCacheKey implements Weighted {
+    private IterableCacheKey() {}
+
+    static final IterableCacheKey INSTANCE = new IterableCacheKey();
+
+    @Override
+    public long getWeight() {
+      // Ignore the actual size of this singleton because it is trivial and because
+      // the weight reported here will be counted many times as it is present in
+      // many different state subcaches.
+      return 0;

Review comment:
       The reference to the object still matters as a key even though there are many copies.
   
   I would still suggest weighing an instance and storing that statically or if there is a way to look up the cost of a pointer via JAMM or the JVM.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on a change in pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
scwhittle commented on a change in pull request #17121:
URL: https://github.com/apache/beam/pull/17121#discussion_r830946109



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java
##########
@@ -84,10 +83,7 @@ private StateFetchingIterators() {}
       StateRequest stateRequestForFirstChunk,
       Coder<T> valueCoder) {
     return new CachingStateIterable<>(
-        (Cache<StateKey, Blocks<T>>) cache,
-        beamFnStateClient,
-        stateRequestForFirstChunk,
-        valueCoder);
+        (Cache<Object, Blocks<T>>) cache, beamFnStateClient, stateRequestForFirstChunk, valueCoder);

Review comment:
       Done.
   Instead of weighting the singleton instance I report zero since the weight will be counted many times in separate subcaches. Added comment with reasoning.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #17121:
URL: https://github.com/apache/beam/pull/17121#issuecomment-1072464640


   R: @lukecwik 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #17121:
URL: https://github.com/apache/beam/pull/17121#issuecomment-1074439854


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #17121:
URL: https://github.com/apache/beam/pull/17121#issuecomment-1072464640


   R: @lukecwik 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #17121:
URL: https://github.com/apache/beam/pull/17121#issuecomment-1075616945


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #17121:
URL: https://github.com/apache/beam/pull/17121#discussion_r831553441



##########
File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java
##########
@@ -84,12 +84,27 @@ private StateFetchingIterators() {}
       StateRequest stateRequestForFirstChunk,
       Coder<T> valueCoder) {
     return new CachingStateIterable<>(
-        (Cache<StateKey, Blocks<T>>) cache,
+        (Cache<IterableCacheKey, Blocks<T>>) cache,
         beamFnStateClient,
         stateRequestForFirstChunk,
         valueCoder);
   }
 
+  @VisibleForTesting
+  static class IterableCacheKey implements Weighted {
+    private IterableCacheKey() {}
+
+    static final IterableCacheKey INSTANCE = new IterableCacheKey();
+
+    @Override
+    public long getWeight() {
+      // Ignore the actual size of this singleton because it is trivial and because
+      // the weight reported here will be counted many times as it is present in
+      // many different state subcaches.
+      return 0;

Review comment:
       The reference to the object still matters as a key even though there is only a single copy.
   
   I would still suggest weighing an instance and storing that statically or if there is a way to look up the cost of a pointer via JAMM or the JVM.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #17121:
URL: https://github.com/apache/beam/pull/17121#issuecomment-1073739388


   R: @lukecwik PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #17121:
URL: https://github.com/apache/beam/pull/17121#issuecomment-1074863861


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #17121: [BEAM-13015] Avoid repeated weighing of StateKey in StateFetchingIterator cache usage.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #17121:
URL: https://github.com/apache/beam/pull/17121#issuecomment-1075837049


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org