You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/11/25 22:44:46 UTC

[GitHub] [incubator-datasketches-memory] jihoonson opened a new pull request #122: Don't call getCapacity() in BaseState when asserting for read and write

jihoonson opened a new pull request #122:
URL: https://github.com/apache/incubator-datasketches-memory/pull/122


   In `assertValidAndBoundsForRead()` and `assertValidAndBoundsForWrite()` of `BaseState`, they call `getCapacity()` to pass it to `assertBounds()` which performs a boundary check. Interestingly, this doesn't seem good in terms of performance in some cases.
   
   I'm doing some experiments with Apache Druid for some faster groupBy idea. In my branch result, [I sort the offsets in `Memory` by their values, so that I can read them in a sorted order](https://github.com/jihoonson/druid/blob/dict-merge/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/HashVectorGrouper.java#L304-L326) (someone could say this may hurt the CPU cache locality, which is true but is out of the scope of this PR). In my benchmark using [GroupByBenchmark](https://github.com/jihoonson/druid/blob/dict-merge/benchmarks/src/test/java/org/apache/druid/benchmark/query/GroupByBenchmark.java) of Druid, `getCapacity()` in `assertValidAndBoundsForRead()` contributed lots of time during sort as shown in the first flame graph below. This seems because calling `getCapacity()` makes JVM not inline assert methods even though they do nothing in production code path.
   
   ![Screenshot from 2020-11-25 14-20-38](https://user-images.githubusercontent.com/2322288/100287447-836c2780-2f29-11eb-9ddf-bbec02fe7bfd.png)
   
   Another flame graph below was taken after applying the change in this PR. 
   
   ![Screenshot from 2020-11-25 14-21-01](https://user-images.githubusercontent.com/2322288/100287450-86671800-2f29-11eb-8d43-e4566d4379f3.png)
   
   Regarding asserting the validity, the snippet of `getCapacity()` is:
   
   ```java
     public final long getCapacity() {
       assertValid();
       return capacityBytes_;
     }
   ```
   
   So, this change seems OK to me because, in `assertValidAndBoundsForRead()`, `assertValid()` is called right before `capacityBytes_` is directly read which seems practically the same as what it did before.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-memory] leerho commented on pull request #122: Don't call getCapacity() in BaseState when asserting for read and write

Posted by GitBox <gi...@apache.org>.
leerho commented on pull request #122:
URL: https://github.com/apache/incubator-datasketches-memory/pull/122#issuecomment-736798374


   @jihoonson This is a really good suggestion.  It looks like there are other places where calling capacityBytes_ directly instead of calling getCapacity() could be done.  I'm going to merge your PR into branch "GetCapacityDirect" where we can fix this in all the places where it makes sense to do so.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-memory] jihoonson commented on pull request #122: Don't call getCapacity() in BaseState when asserting for read and write

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #122:
URL: https://github.com/apache/incubator-datasketches-memory/pull/122#issuecomment-736823209


   @leerho thanks. I looked at other callers of `getCapacity()` when I made this PR, but wasn't sure if they can also call `capacityBytes_` directly as well. It makes sense to do the same in other places if it is applicable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-memory] jihoonson commented on pull request #122: Don't call getCapacity() in BaseState when asserting for read and write

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #122:
URL: https://github.com/apache/incubator-datasketches-memory/pull/122#issuecomment-737002555


   @leerho, I ran my benchmark again with the updated master and saw the same improvement. January time-frame for the next release sounds good to me. Thanks for merging these changes!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-memory] jihoonson edited a comment on pull request #122: Don't call getCapacity() in BaseState when asserting for read and write

Posted by GitBox <gi...@apache.org>.
jihoonson edited a comment on pull request #122:
URL: https://github.com/apache/incubator-datasketches-memory/pull/122#issuecomment-734971684


   > [2020-11-25 22:56:59] [build] [2020-11-25 22:56:59] [autobuild] [ERROR] Failed to execute goal org.jacoco:jacoco-maven-plugin:0.8.4:prepare-agent (prepare-agent) on project datasketches-memory: Execution prepare-agent of goal org.jacoco:jacoco-maven-plugin:0.8.4:prepare-agent failed: Plugin org.jacoco:jacoco-maven-plugin:0.8.4 or one of its dependencies could not be resolved: Failed to collect dependencies at org.jacoco:jacoco-maven-plugin:jar:0.8.4 -> org.apache.maven.reporting:maven-reporting-impl:jar:2.1 -> org.apache.maven:maven-project:jar:2.0.10 -> org.apache.maven:maven-artifact-manager:jar:2.0.10 -> org.apache.maven:maven-repository-metadata:jar:2.0.10: Failed to read artifact descriptor for org.apache.maven:maven-repository-metadata:jar:2.0.10: Could not transfer artifact org.apache.maven:maven-repository-metadata:pom:2.0.10 from/to central (https://repo.maven.apache.org/maven2): Failed to transfer file https://repo.maven.apache.org/maven2/org/apache/maven/maven-reposit
 ory-metadata/2.0.10/maven-repository-metadata-2.0.10.pom with status code 502 -> [Help 1]
   
   It seems like that LGTM failed because of a dependency resolution failure. Could someone please re-trigger it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-memory] jihoonson commented on pull request #122: Don't call getCapacity() in BaseState when asserting for read and write

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #122:
URL: https://github.com/apache/incubator-datasketches-memory/pull/122#issuecomment-734971684


   > [2020-11-25 22:56:59] [build] [2020-11-25 22:56:59] [autobuild] [ERROR] Failed to execute goal org.jacoco:jacoco-maven-plugin:0.8.4:prepare-agent (prepare-agent) on project datasketches-memory: Execution prepare-agent of goal org.jacoco:jacoco-maven-plugin:0.8.4:prepare-agent failed: Plugin org.jacoco:jacoco-maven-plugin:0.8.4 or one of its dependencies could not be resolved: Failed to collect dependencies at org.jacoco:jacoco-maven-plugin:jar:0.8.4 -> org.apache.maven.reporting:maven-reporting-impl:jar:2.1 -> org.apache.maven:maven-project:jar:2.0.10 -> org.apache.maven:maven-artifact-manager:jar:2.0.10 -> org.apache.maven:maven-repository-metadata:jar:2.0.10: Failed to read artifact descriptor for org.apache.maven:maven-repository-metadata:jar:2.0.10: Could not transfer artifact org.apache.maven:maven-repository-metadata:pom:2.0.10 from/to central (https://repo.maven.apache.org/maven2): Failed to transfer file https://repo.maven.apache.org/maven2/org/apache/maven/maven-reposit
 ory-metadata/2.0.10/maven-repository-metadata-2.0.10.pom with status code 502 -> [Help 1]
   
   It seems like that LGTM failed because of a dependency resolution failure. Could anyone please re-trigger it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-memory] leerho commented on pull request #122: Don't call getCapacity() in BaseState when asserting for read and write

Posted by GitBox <gi...@apache.org>.
leerho commented on pull request #122:
URL: https://github.com/apache/incubator-datasketches-memory/pull/122#issuecomment-736900300


   @jihoonson These changes are now in master.  It would be great if you could run your flame graphs again to see if this fixed the problem.  Also, can you use this code in master for a while? We will have a new release coming in January time-frame.
   
   Cheers,
   Lee.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-memory] leerho merged pull request #122: Don't call getCapacity() in BaseState when asserting for read and write

Posted by GitBox <gi...@apache.org>.
leerho merged pull request #122:
URL: https://github.com/apache/incubator-datasketches-memory/pull/122


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org