You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/09/01 21:56:26 UTC

[GitHub] [accumulo] ctubbsii opened a new pull request, #2912: Updates to ScanAttempts

ctubbsii opened a new pull request, #2912:
URL: https://github.com/apache/accumulo/pull/2912

   As a follow-up to #2880:
   
   * Remove unused and undocumented endTime in ScanAttempt API
   * Rename ScanAttemptImpl method and variables from "mutationCounter" to
     "attemptNumber" with corresponding comments to make it clear how this
     is used for the snapshot
   * Set attemptNumber in ScanAttemptImpl constructor as an immutable
     value, rather than setting it immediately after construction
   * Remove unnecessary synchronization and use an AtomicLong for the
     currentAttemptNumber
   * IDE changes also removed some unneeded keywords in the interface and
     converted an anonymous inner class to a lambda
   * Create the snapshot using Java streams instead of using "forEach()"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] keith-turner commented on pull request #2912: Updates to ScanAttempts

Posted by GitBox <gi...@apache.org>.

keith-turner commented on PR #2912:
URL: https://github.com/apache/accumulo/pull/2912#issuecomment-1236082750

   The reason that the mutationCount/attemptNumber existed was because ScanAttemptsImpl used to use guava methods that filtered a concurrent hash map in place. If the map is being copied instead of filtered in place, I think mutationCount/attemptNumber could just be dropped.  
   
   I was trying to make multiple comments across the diff, but this just got too complicated for a small diff. So I copied the new changes and modified them in a text editor to illustrate what I was thinking. Below is an example of dropping mutationCount/attemptNumber.
   
   ```
   public class ScanAttemptsImpl {
   
     private static final Logger LOG = LoggerFactory.getLogger(ScanAttemptsImpl.class);
   
     static class ScanAttemptImpl implements ScanAttempt {
   
       private final String server;
       private final Result result;
   
       private ScanAttemptImpl(Result result, String server, long attemptNumber) {
         this.result = result;
         this.server = Objects.requireNonNull(server);
       }
   
       @Override
       public String getServer() {
         return server;
       }
   
       @Override
       public Result getResult() {
         return result;
       }
     }
   
     private final Map<TabletId,Collection<ScanAttemptImpl>> attempts = new HashMap<>();
   
     public interface ScanAttemptReporter {
       void report(ScanAttempt.Result result);
     }
   
     ScanAttemptReporter createReporter(String server, TabletId tablet) {
       return result -> {
         LOG.trace("Received result: {}", result);
         synchronized(ScanAttemptImpl.this){
           attempts.computeIfAbsent(tablet, k -> ConcurrentHashMap.newKeySet())
             .add(new ScanAttemptImpl(result, server, currentAttemptNumber.getAndIncrement()));
         }
       };
     }
   
     /**
      * Creates and returns a snapshot of ScanAttempt objects that were added before this call
      *
      * @return a map of TabletId to a collection ScanAttempt objects associated with that TabletId
      */
     synchronized Map<TabletId,Collection<ScanAttemptImpl>> snapshot() {
       //TODO If the map and collections created below are not immutable, then need to wrap w/ immutable. Snapshot should be immutable.
       return attempts.entrySet().stream().map(sourceEntry ->
         new SimpleEntry(sourceEntry.getKey(), List.copyOf(sourceEntry.getValue()))
       ).collect(Collectors.toMap(Entry::getKey, Entry::getValue));
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] ctubbsii commented on pull request #2912: Updates to ScanAttempts

Posted by GitBox <gi...@apache.org>.

ctubbsii commented on PR #2912:
URL: https://github.com/apache/accumulo/pull/2912#issuecomment-1238689447

   Okay, thanks @keith-turner . I'll see about simplifying my PR based on what you said. Since we're doing copy anyway now, it's not necessary to have that. I would like to to modify what you have above to make the concurrency a bit more readable, though. Maybe it would make sense to just use some kind of concurrent list, rather than ArrayList and synchronizing on `ScanAttemptImpl.this`, which I find to be not very intuitive. Or just block accesses with an explicit lock/mutex object? I'll have to think about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] dlmarion commented on pull request #2912: Updates to ScanAttempts

Posted by GitBox <gi...@apache.org>.

dlmarion commented on PR #2912:
URL: https://github.com/apache/accumulo/pull/2912#issuecomment-1282809182

   Merging to main, can fix any issues laster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] ctubbsii commented on pull request #2912: Updates to ScanAttempts

Posted by GitBox <gi...@apache.org>.

ctubbsii commented on PR #2912:
URL: https://github.com/apache/accumulo/pull/2912#issuecomment-1235938552

   I'll need to rebase this on the changes done in #2896 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] dlmarion merged pull request #2912: Updates to ScanAttempts

Posted by GitBox <gi...@apache.org>.

dlmarion merged PR #2912:
URL: https://github.com/apache/accumulo/pull/2912


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] keith-turner commented on pull request #2912: Updates to ScanAttempts

Posted by GitBox <gi...@apache.org>.

keith-turner commented on PR #2912:
URL: https://github.com/apache/accumulo/pull/2912#issuecomment-1239157105

   > Maybe it would make sense to just use some kind of concurrent list
   
   If the data structure is only accessed in a synchronization block then I would recommend avoiding concurrent data structs.  The concurrent data structs use lots of volatile pointers and therefore can not utilize processor cache as well.  That is why I used hash map and array list.
   
   > rather than ArrayList and synchronizing on ScanAttemptImpl.this, which I find to be not very intuitive. Or just block accesses with an explicit lock/mutex object?
   
   The two synchronized blocks could sync on `attempts`.  Its a final hash map, so should be safe to sync on... this would avoid the sync on `ScanAttemptImpl.this`.  I think that would make the code easier to follow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] ctubbsii commented on pull request #2912: Updates to ScanAttempts

Posted by GitBox <gi...@apache.org>.

ctubbsii commented on PR #2912:
URL: https://github.com/apache/accumulo/pull/2912#issuecomment-1252672649

   > > Maybe it would make sense to just use some kind of concurrent list
   > 
   > If the data structure is only accessed in a synchronization block then I would recommend avoiding concurrent data structs. The concurrent data structs use lots of volatile pointers and therefore can not utilize processor cache as well. That is why I used hash map and array list.
   > 
   > > rather than ArrayList and synchronizing on ScanAttemptImpl.this, which I find to be not very intuitive. Or just block accesses with an explicit lock/mutex object?
   > 
   > The two synchronized blocks could sync on `attempts`. Its a final hash map, so should be safe to sync on... this would avoid the sync on `ScanAttemptImpl.this`. I think that would make the code easier to follow.
   
   Why not sync on an explicit object for that purpose? What's the benefit for synchronizing on one of the data structure itself? I find that harder to reason about, because I don't know what else is synchronizing on it, without tracking down all its references. Tracking down the references to an explicit object for synchronizing seems easier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [accumulo] keith-turner commented on pull request #2912: Updates to ScanAttempts

Posted by GitBox <gi...@apache.org>.

keith-turner commented on PR #2912:
URL: https://github.com/apache/accumulo/pull/2912#issuecomment-1252872871

   > Why not sync on an explicit object for that purpose? What's the benefit for synchronizing on one of the data structure itself? I find that harder to reason about, because I don't know what else is synchronizing on it, without tracking down all its references. Tracking down the references to an explicit object for synchronizing seems easier.
   
   I was thinking that its a really small class (after these changes) that is easy to analyze in totality and that syncing on the central data struct helps keep it simple and small.  I am not opposed to adding more code for sync though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org