You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "dlmarion (via GitHub)" <gi...@apache.org> on 2023/04/04 15:52:58 UTC

[GitHub] [accumulo] dlmarion opened a new pull request, #3276: Optimize internal data structures in Authorizations

dlmarion opened a new pull request, #3276:
URL: https://github.com/apache/accumulo/pull/3276

   Instead of creating collections with default initial capacities, create them with the known capacities. This will reduce the time taken and amount of memory consumed when populating these objects during Authorizations object creation when the number of elements in the Authorizations is larger than the default initial capacity of the collection data structure. This may also reduce the time it takes the VisibilityEvaluator to determine if the Authoriations object contains a token found in a ColumnVisibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157660609


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -45,6 +46,9 @@ public class Authorizations implements Iterable<byte[]>, Serializable, Authoriza
   private final Set<ByteSequence> auths;
   private final List<byte[]> authsList; // sorted order
 
+  private static final Pair<Set<ByteSequence>,List<byte[]>> EMPTY_INTERNAL_COLLECTIONS =

Review Comment:
   While I agree that  `INTERNAL in `EMPTY_INTERNAL_COLLECTIONS` may be redundant given that the variable is private, I don't agree that the length of the name reduces readability or that the name is confusing. `DEFAULT_EMPTY` as a variable name may contain less characters, but I don't think it provides *more* clarity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157626783


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -45,6 +46,9 @@ public class Authorizations implements Iterable<byte[]>, Serializable, Authoriza
   private final Set<ByteSequence> auths;
   private final List<byte[]> authsList; // sorted order
 
+  private static final Pair<Set<ByteSequence>,List<byte[]>> EMPTY_INTERNAL_COLLECTIONS =

Review Comment:
   I think the variable name `EMPTY_AUTHS` could be confused with the `EMPTY` variable on line 55. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157672378


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -112,6 +112,8 @@ private void checkAuths() {
    */
   public Authorizations(Collection<byte[]> authorizations) {
     checkArgument(authorizations != null, "authorizations is null");
+    this.auths = new HashSet<>(authorizations.size());
+    this.authsList = new ArrayList<>(authorizations.size());

Review Comment:
   Ah the fun of going back to Java 8.  I looked at the implementation of emptySet() and emptyList() and for the version of Java I am using those both avoid any object allocation like List.of() and Set.of()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion merged pull request #3276: Optimize internal data structures in Authorizations

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion merged PR #3276:
URL: https://github.com/apache/accumulo/pull/3276


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157660609


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -45,6 +46,9 @@ public class Authorizations implements Iterable<byte[]>, Serializable, Authoriza
   private final Set<ByteSequence> auths;
   private final List<byte[]> authsList; // sorted order
 
+  private static final Pair<Set<ByteSequence>,List<byte[]>> EMPTY_INTERNAL_COLLECTIONS =

Review Comment:
   While I agree that  `INTERNAL` in `EMPTY_INTERNAL_COLLECTIONS` may be redundant given that the variable is private, I don't agree that the length of the name reduces readability or that the name is confusing. `DEFAULT_EMPTY` as a variable name may contain less characters, but I don't think it provides *more* clarity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157507174


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -112,6 +112,8 @@ private void checkAuths() {
    */
   public Authorizations(Collection<byte[]> authorizations) {
     checkArgument(authorizations != null, "authorizations is null");
+    this.auths = new HashSet<>(authorizations.size());
+    this.authsList = new ArrayList<>(authorizations.size());

Review Comment:
   If it does not break anything, we may be able to add a further optimization for an incoming empty set.  I looked into the implemenation of List.of() and Set.of() and both seem to return a ref to a static immutable object, so that would avoid any internal object allocation for the empty case.
   
   ```suggestion
      if(authorizations.isEmpty()){
         this.auths = Set.of();
         this.authsList = List.of();
       } else {
         this.auths = new HashSet<>(authorizations.size());
         this.authsList = new ArrayList<>(authorizations.size());    
       }
   ```
   
   If we had a static method for creating auths objects then we could similarly avoid allocating the Authorizations object when the input is empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "EdColeman (via GitHub)" <gi...@apache.org>.
EdColeman commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157636109


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -45,6 +46,9 @@ public class Authorizations implements Iterable<byte[]>, Serializable, Authoriza
   private final Set<ByteSequence> auths;
   private final List<byte[]> authsList; // sorted order
 
+  private static final Pair<Set<ByteSequence>,List<byte[]>> EMPTY_INTERNAL_COLLECTIONS =

Review Comment:
   `DEFAULT_EMPTY`?  To me, INTERNAL and COLLECTIONS are not adding information and the length of the name reduces readability instead of adding clarity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] DomGarguilo commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "DomGarguilo (via GitHub)" <gi...@apache.org>.
DomGarguilo commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157610622


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -112,6 +112,8 @@ private void checkAuths() {
    */
   public Authorizations(Collection<byte[]> authorizations) {
     checkArgument(authorizations != null, "authorizations is null");
+    this.auths = new HashSet<>(authorizations.size());
+    this.authsList = new ArrayList<>(authorizations.size());

Review Comment:
   Would `Collections.emptySet()` and `emptyList()` be suitable java 8 replacements for `Set.of()` and `List.of()` in this case? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157692172


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -112,8 +124,13 @@ private void checkAuths() {
    */
   public Authorizations(Collection<byte[]> authorizations) {
     checkArgument(authorizations != null, "authorizations is null");
-    for (byte[] auth : authorizations)
+    Pair<Set<ByteSequence>,List<byte[]>> collections =
+        createInternalCollections(authorizations.size());
+    this.auths = collections.getFirst();
+    this.authsList = collections.getSecond();

Review Comment:
   I went ahead and broke the single method into two, removing the need for Pair. I was using Pair as a struct, trying to minimize the methods called, but I think it may not matter in the long run. Certainly, not allocating a temporary object is better here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157692965


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -45,6 +46,9 @@ public class Authorizations implements Iterable<byte[]>, Serializable, Authoriza
   private final Set<ByteSequence> auths;
   private final List<byte[]> authsList; // sorted order
 
+  private static final Pair<Set<ByteSequence>,List<byte[]>> EMPTY_INTERNAL_COLLECTIONS =

Review Comment:
   This is OBE, this variable was removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "EdColeman (via GitHub)" <gi...@apache.org>.
EdColeman commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157607792


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -45,6 +46,9 @@ public class Authorizations implements Iterable<byte[]>, Serializable, Authoriza
   private final Set<ByteSequence> auths;
   private final List<byte[]> authsList; // sorted order
 
+  private static final Pair<Set<ByteSequence>,List<byte[]>> EMPTY_INTERNAL_COLLECTIONS =

Review Comment:
   Because this is private it seems internal is implied - do you think that EMPTY_AUTHS would still provide the same clarity, but improve readability? (just a personnel preference / suggestion) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] DomGarguilo commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "DomGarguilo (via GitHub)" <gi...@apache.org>.
DomGarguilo commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157612005


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -112,6 +112,8 @@ private void checkAuths() {
    */
   public Authorizations(Collection<byte[]> authorizations) {
     checkArgument(authorizations != null, "authorizations is null");
+    this.auths = new HashSet<>(authorizations.size());
+    this.authsList = new ArrayList<>(authorizations.size());

Review Comment:
   Oops never mind. I think this was done in the latest commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157602532


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -112,6 +112,8 @@ private void checkAuths() {
    */
   public Authorizations(Collection<byte[]> authorizations) {
     checkArgument(authorizations != null, "authorizations is null");
+    this.auths = new HashSet<>(authorizations.size());
+    this.authsList = new ArrayList<>(authorizations.size());

Review Comment:
   `Set.of` and `List.of` don't exist in Java 8. I did implement your idea in f398f1d though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3276: Optimize internal data structures in Authorizations

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3276:
URL: https://github.com/apache/accumulo/pull/3276#discussion_r1157668745


##########
core/src/main/java/org/apache/accumulo/core/security/Authorizations.java:
##########
@@ -112,8 +124,13 @@ private void checkAuths() {
    */
   public Authorizations(Collection<byte[]> authorizations) {
     checkArgument(authorizations != null, "authorizations is null");
-    for (byte[] auth : authorizations)
+    Pair<Set<ByteSequence>,List<byte[]>> collections =
+        createInternalCollections(authorizations.size());
+    this.auths = collections.getFirst();
+    this.authsList = collections.getSecond();

Review Comment:
   We could avoid creating the Pair object by replacing `createInternalCollections` with two private static methods `createInternalSet` and `createInternalList`.
   
   ```suggestion
       this.auths = createInternalSet(authorizations.size());
       this.authsList = createInternalList(authorizations.size());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org