You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jclouds.apache.org by Shri Javadekar <no...@github.com> on 2013/11/25 09:23:26 UTC

[jclouds] Delete objects in a container efficiently. (#214)

The existing approach for deleting objects in a container suffers
from a head-of-line blocking problem. This commit implements a better
scheme which does not have that problem. This scheme uses a counting
semaphore for making sure that a certain number of futures are
issued in parallel. As each of these futures is completed, one
permit of the semaphore is released.

Added unit tests for testing this new scheme.
You can merge this Pull Request by running:

  git pull https://github.com/maginatics/jclouds deleteallkeys-v2

Or you can view, comment on it, or merge it online at:

  https://github.com/jclouds/jclouds/pull/214

-- Commit Summary --

  * Delete objects in a container efficiently.

-- File Changes --

    M blobstore/src/main/java/org/jclouds/blobstore/strategy/internal/DeleteAllKeysInList.java (417)
    M blobstore/src/test/java/org/jclouds/blobstore/strategy/internal/DeleteAllKeysInListTest.java (143)

-- Patch Links --

https://github.com/jclouds/jclouds/pull/214.patch
https://github.com/jclouds/jclouds/pull/214.diff

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
> +         switch (md.getType()) {
> +         case BLOB:
> +            blobDelFuture = executorService.submit(new Callable<Void>() {
> +               @Override
> +                  public Void call() {
> +                     blobStore.removeBlob(containerName, fullPath);
> +                     return null;
> +                  }
> +               });
> +            break;
> +         case FOLDER:
> +            if (options.isRecursive()) {
> +               blobDelFuture = executorService.submit(new Callable<Void>() {
> +                  @Override
> +                  public Void call() {
> +                     blobStore.deleteDirectory(containerName, fullPath);

Good point. I'll add an to make executeOneIteration blocking.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10965015

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
> +       * acquired in 'maxTime', a TimeoutException is thrown. Any outstanding
> +       * futures at that time are cancelled.
> +       */
> +      final Semaphore semaphore = new Semaphore(numOutStandingRequests);
> +      /*
> +       * When a future is created, a reference for that is added to the
> +       * outstandingFutures list. This reference is removed from the list in the
> +       * FutureCallback since it no longer needs to be cancelled in the event of
> +       * a timeout. Also, when the reference is removed from this list and when
> +       * the executorService removes the reference that it has maintained, the
> +       * future will be marked for GC since there should be no other references
> +       * to it. This is important because this code can generate an unbounded
> +       * number of futures.
> +       */
> +      final List<AtomicReference<ListenableFuture<Void>>> outstandingFutures = Collections
> +            .synchronizedList(new ArrayList<AtomicReference<ListenableFuture<Void>>>());

Does the AtomicReference provide value that the synchronized wrapper does not?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10853586

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-pull-requests #695](https://jclouds.ci.cloudbees.com/job/jclouds-pull-requests/695/) SUCCESS
This pull request looks good

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38655749

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
> +         if (marker != null) {
> +            logger.debug("%s with marker %s", message, marker);
> +            options = options.afterMarker(marker);
> +            listing = getListing(containerName, options, semaphore,
> +                  outstandingFutures, deleteFailure);
> +         } else {
> +            break;
> +         }
> +      }
> +   }
> +
> +   public void execute(final String containerName,
> +         ListContainerOptions listOptions) {
> +      final AtomicBoolean deleteFailure = new AtomicBoolean();
> +      final int numOutStandingRequests = 1024;
> +      int retries = maxErrors;

Given the large number of HTTP response fixes we have added, I believe the per-operation retry should suffice and we can remove the outer retry loop.  We can address this in a subsequent commit.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10853517

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
Reopened #214.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
> Sure. Let me know if I can do anything to make this go faster.

Some (live?) test results documenting the improvement would be great ;-)

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29475737

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-java-7-pull-requests #1165](https://jclouds.ci.cloudbees.com/job/jclouds-java-7-pull-requests/1165/) SUCCESS
This pull request looks good

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38656006

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
> +         String marker = listing.getNextMarker();
> +         if (marker != null) {
> +            logger.debug("%s with marker %s", message, marker);
> +            options = options.afterMarker(marker);
> +            listing = getListing(containerName, options, semaphore,
> +                  outstandingFutures, deleteFailure);
> +         } else {
> +            break;
> +         }
> +      }
> +   }
> +
> +   public void execute(final String containerName,
> +         ListContainerOptions listOptions) {
> +      final AtomicBoolean deleteFailure = new AtomicBoolean();
> +      final int numOutStandingRequests = 1024;

Done.

There is a configurable property called "jclouds.max-parallel-deletes" that can be used for this.

You're point about whether the # of semaphores should be equal to the # of threads in the executorService is valid. The default value of "jclouds.max-parallel-threads" is equal to the # of user threads.

Given that these two are equal, I thought about whether we need the extra config option. Can we just use the value of PROPERTY_USER_THREADS instead. However, I'm not sure if the executorService object injected into DeleteAllKeysInList is  also being used somewhere else. If it is used somewhere, I guess, it would be good to have an option where the use can limit the number of parallel deletes being issued when deleting a container thereby limiting the number of threads being used from the executorService.

Let me know if this isn't necessary.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10872420

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
Closed #214.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
Can you open a JIRA issue and reference it in the commit message?  This will allow us to communicate this improvement in the release notes.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38321551

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-pull-requests #634](https://jclouds.ci.cloudbees.com/job/jclouds-pull-requests/634/) SUCCESS
This pull request looks good

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36597317

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
> +         switch (md.getType()) {
> +         case BLOB:
> +            blobDelFuture = executorService.submit(new Callable<Void>() {
> +               @Override
> +                  public Void call() {
> +                     blobStore.removeBlob(containerName, fullPath);
> +                     return null;
> +                  }
> +               });
> +            break;
> +         case FOLDER:
> +            if (options.isRecursive()) {
> +               blobDelFuture = executorService.submit(new Callable<Void>() {
> +                  @Override
> +                  public Void call() {
> +                     blobStore.deleteDirectory(containerName, fullPath);

Previously ```clearContainer``` issued a batch of delete blob operations within a subtree before removing the parent but now it can issue delete blob operations concurrently with delete parent.  I believe the new logic will generate operation failures.  This is generally a hard problem to solve; can we simplify this for now by making the recursive call blocking?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10912447

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
> +         listing = blobStore.list(containerName, options);
> +      } catch (ContainerNotFoundException ce) {
> +         return listing;
> +      }
> +
> +      // recurse on subdirectories
> +      if (options.isRecursive()) {
> +         for (StorageMetadata md : listing) {
> +            String fullPath = parentIsFolder(options, md) ? options.getDir()
> +                  + "/" + md.getName() : md.getName();
> +            switch (md.getType()) {
> +            case BLOB:
> +               break;
> +            case FOLDER:
> +            case RELATIVE_PATH:
> +               if (options.isRecursive() &&

Repeated test of isRecusive from above.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10856243

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
Sorry for ignoring this.  By default jclouds uses an unlimited number of user threads, set in ```BaseApiMetadata.defaultProperties```.  I believe this will cause problems for users of Atmos blobstores.  We can work around this by setting a lower number in ```AtmosApiMetadata.defaultProperties```.  Previously I have successfully tested with 8 user and 8 IO threads although this depends on the Atmos version and configuration.  @shrinandj will run some further tests with the defaults and I will look at a solution to [JCLOUDS-137](https://issues.apache.org/jira/browse/JCLOUDS-137).

I also opened [JCLOUDS-459](https://issues.apache.org/jira/browse/JCLOUDS-459) to track adding some limit to all blobstores.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-34665263

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
> +         final String fullPath = parentIsFolder(options, md) ? options.getDir()
> +               + "/" + md.getName() : md.getName();
> +
> +            // Attempt to acquire a semaphore within the time limit. At least
> +            // one outstanding future should complete within this period for the
> +            // semaphore to be acquired.
> +         try {
> +            if (!semaphore.tryAcquire(maxTime, TimeUnit.MILLISECONDS)) {
> +               throw new TimeoutException("Timeout waiting for semaphore");
> +            }
> +         } catch (InterruptedException ie) {
> +            logger.debug("Interrupted while deleting blobs");
> +            Thread.currentThread().interrupt();
> +         }
> +
> +         final ListenableFuture<Void> blobDelFuture;

I have a few cleanups to do in a subsequent patch. Will add this to that patch.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10965237

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
> +       * a timeout. Also, when the reference is removed from this list and when
> +       * the executorService removes the reference that it has maintained, the
> +       * future will be marked for GC since there should be no other references
> +       * to it. This is important because this code can generate an unbounded
> +       * number of futures.
> +       */
> +      final List<AtomicReference<ListenableFuture<Void>>> outstandingFutures = Collections
> +            .synchronizedList(new ArrayList<AtomicReference<ListenableFuture<Void>>>());
> +      while (retries > 0) {
> +         deleteFailure.set(false);
> +         executeOneIteration(containerName, listOptions.clone(), semaphore,
> +               outstandingFutures, deleteFailure);
> +         // Wait for all futures to complete by waiting to acquire all
> +         // semaphores.
> +         try {
> +            if (!semaphore.tryAcquire(numOutStandingRequests, maxTime,

Added a TODO for this.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10872424

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
I committed to master with some Checkstyle, Javadoc, and indentation fixes.  Let's address some of the TODOs and test more before backporting to 1.7.x.  Thank you for your contribution @shrinandj !  The previous implementation has troubled us for some years now.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38737591

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-java-7-pull-requests #1104](https://jclouds.ci.cloudbees.com/job/jclouds-java-7-pull-requests/1104/) SUCCESS
This pull request looks good

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36597547

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
> I'll dig deeper into that.

Hm. Strange! Thanks for looking into this...

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29691969

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
>  import com.google.common.util.concurrent.ListenableFuture;
>  import com.google.common.util.concurrent.ListeningExecutorService;
>  import com.google.inject.Inject;
>  
>  /**
>   * Deletes all keys in the container
> - * 
> + *
>   * @author Adrian Cole

Done

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10872338

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
>     }
>  
> -   private boolean parentIsFolder(final ListContainerOptions options, final StorageMetadata md) {
> -      return options.getDir() != null && md.getName().indexOf('/') == -1;
> +   private void waitForCompletion(final Semaphore semaphore,
> +         final Set<ListenableFuture<Void>> outstandingFutures) {
> +      // Wait for all futures to complete by waiting to acquire all
> +      // semaphores.
> +      try {
> +         // TODO: Each individual blob delete operation itself has a time
> +         // limit of 'maxTime'. Therefore having a time limit for this
> +         // semaphore acquisition provides little value. This could be
> +         // removed.

I removed the timeout as the comment suggests since we can easily encounter this situation with large containers.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10997822

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-java-7-pull-requests #1109](https://jclouds.ci.cloudbees.com/job/jclouds-java-7-pull-requests/1109/) SUCCESS
This pull request looks good

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36742089

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
Even without my changes, I get an HttpResponseException when I try to clear a container :(.

```
org.jclouds.http.HttpResponseException: Invalid argument connecting to GET https://shri-nov6-3-cloud-testing0-3.s3-us-west-1.amazonaws.com/ HTTP/1.1
    at org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:162)
    at org.jclouds.rest.internal.InvokeSyncToAsyncHttpMethod.invoke(InvokeSyncToAsyncHttpMethod.java:131)
    at org.jclouds.rest.internal.InvokeSyncToAsyncHttpMethod.apply(InvokeSyncToAsyncHttpMethod.java:97)
    at org.jclouds.rest.internal.InvokeSyncToAsyncHttpMethod.apply(InvokeSyncToAsyncHttpMethod.java:58)
    at org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:157)
    at org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:124)
    at com.sun.proxy.$Proxy53.listBucket(Unknown Source)
    at org.jclouds.s3.blobstore.S3BlobStore.list(S3BlobStore.java:146)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:37)
    at com.sun.proxy.$Proxy59.list(Unknown Source)
    at org.jclouds.blobstore.strategy.internal.DeleteAllKeysInList.execute(DeleteAllKeysInList.java:104)
    at org.jclouds.blobstore.util.internal.BlobUtilsImpl.clearContainer(BlobUtilsImpl.java:79)
    at org.jclouds.blobstore.internal.BaseBlobStore.clearContainer(BaseBlobStore.java:158)
    at org.jclouds.blobstore.internal.BaseBlobStore.clearContainer(BaseBlobStore.java:147)
```

I verified that I am using the correct credentials and the correct container name. Using jclouds-cli using these credentials and container name works fine. I'll dig deeper into that.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29674718

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
>  See results below.

That's a pretty sizeable improvement, indeed ;-) Any more details on the HttpResponseException you were getting?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29857047

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-java-7-pull-requests #880](https://jclouds.ci.cloudbees.com/job/jclouds-java-7-pull-requests/880/) UNSTABLE
Looks like there's a problem with this pull request

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29185049

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
> +         String marker = listing.getNextMarker();
> +         if (marker != null) {
> +            logger.debug("%s with marker %s", message, marker);
> +            options = options.afterMarker(marker);
> +            listing = getListing(containerName, options, semaphore,
> +                  outstandingFutures, deleteFailure);
> +         } else {
> +            break;
> +         }
> +      }
> +   }
> +
> +   public void execute(final String containerName,
> +         ListContainerOptions listOptions) {
> +      final AtomicBoolean deleteFailure = new AtomicBoolean();
> +      final int numOutStandingRequests = 1024;

Can you make this a user-configurable property?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10853636

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
> jclouds » jclouds #882 UNSTABLE

This looks like a "real" [failure](https://buildhive.cloudbees.com/job/jclouds/job/jclouds/org.apache.jclouds$jclouds-blobstore/882/testReport/junit/org.jclouds.blobstore.strategy.internal/DeleteAllKeysInListTest/testExecuteInDirectory/)? Could you have a look, @shrinandj?

We can also close'n'reopen this PR to trigger the builders again if we think it's likely to be transient...

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36670947

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by BuildHive <no...@github.com>.
[jclouds » jclouds #889](https://buildhive.cloudbees.com/job/jclouds/job/jclouds/889/) SUCCESS
This pull request looks good
[(what's this?)](https://www.cloudbees.com/what-is-buildhive)

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36740432

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
Going back to the original question about how much does this improve performance, here's one result. I had two containers in AWS which had ~84400 blobs in each. I deleted the first container with the existing jclouds code and the second container with the new code. See results below.

```
$ ./bin/jclouds blobstore list shri-nov6-3-cloud-testing0-1 | wc -l
   84389
$ time ./bin/jclouds blobstore container-delete shri-nov6-3-cloud-testing0-1

real    24m59.975s
user    4m44.908s
sys     1m21.324s

$ ./bin/jclouds blobstore list shri-nov6-3-cloud-testing0-2 | wc -l
   84387
$ time ./bin/jclouds blobstore container-delete shri-nov6-3-cloud-testing0-2

real    10m52.382s
user    4m39.216s
sys     1m17.214s
```

I built jclouds-cli for these tests. To make sure jclouds-cli was using my jclouds jar, this is the way I built it (as per the suggestion on the irc channel).

1. run mvn install in jclouds 
2. run mvn install in jclouds-karaf
3. run mvn package in jclouds-cli which created the tar.gz that was extracted and run.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29853486

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
> +         if (marker != null) {
> +            logger.debug("%s with marker %s", message, marker);
> +            options = options.afterMarker(marker);
> +            listing = getListing(containerName, options, semaphore,
> +                  outstandingFutures, deleteFailure);
> +         } else {
> +            break;
> +         }
> +      }
> +   }
> +
> +   public void execute(final String containerName,
> +         ListContainerOptions listOptions) {
> +      final AtomicBoolean deleteFailure = new AtomicBoolean();
> +      final int numOutStandingRequests = 1024;
> +      int retries = maxErrors;

Added a TODO for this.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10872341

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
I was able to successfully delete containers having 10K objects on atmos, swift and aws. This was using 20 user threads and 20 io threads.

Would be happy to remove the outer loop and repeat these experiments. Comments/suggestions welcome.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-31488902

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by BuildHive <no...@github.com>.
[jclouds » jclouds #882](https://buildhive.cloudbees.com/job/jclouds/job/jclouds/882/) UNSTABLE
Looks like there's a problem with this pull request
[(what's this?)](https://www.cloudbees.com/what-is-buildhive)

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36596715

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
> jclouds-java-7-pull-requests #880 UNSTABLE

Haven't looked at the code, just wanted to add that that looks like an [unrelated test failure](https://jclouds.ci.cloudbees.com/job/jclouds-java-7-pull-requests/org.apache.jclouds$jclouds-compute/880/testReport/junit/org.jclouds.compute.callables/BlockUntilInitScriptStatusIsZeroThenReturnOutputTest/testloopUntilTrueOrThrowCancellationExceptionReturnsWhenPredicateIsTrueSecondTimeWhileNotCancelled/)

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29208588

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-pull-requests #417](https://jclouds.ci.cloudbees.com/job/jclouds-pull-requests/417/) SUCCESS
This pull request looks good

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29185041

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
Thanks for submitting, @shrinandj. I or one of the other devs will hopefully have a chance to look at this soon. We're just in the middle of releasing 1.6.3, so it might take a bit longer. Thanks for your patience!

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29437697

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
I am trying to reproduce this locally. I have run > 50 iterations of this test so far, but haven't been able to reproduce the problem.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36681788

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
One last thought, if we decrease the default number of jclouds threads we can likely work around some of these timeout and retry issues.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-30183834

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
> +       * acquired in 'maxTime', a TimeoutException is thrown. Any outstanding
> +       * futures at that time are cancelled.
> +       */
> +      final Semaphore semaphore = new Semaphore(numOutStandingRequests);
> +      /*
> +       * When a future is created, a reference for that is added to the
> +       * outstandingFutures list. This reference is removed from the list in the
> +       * FutureCallback since it no longer needs to be cancelled in the event of
> +       * a timeout. Also, when the reference is removed from this list and when
> +       * the executorService removes the reference that it has maintained, the
> +       * future will be marked for GC since there should be no other references
> +       * to it. This is important because this code can generate an unbounded
> +       * number of futures.
> +       */
> +      final List<AtomicReference<ListenableFuture<Void>>> outstandingFutures = Collections
> +            .synchronizedList(new ArrayList<AtomicReference<ListenableFuture<Void>>>());

Removed the atomic reference.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10872343

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-pull-requests #639](https://jclouds.ci.cloudbees.com/job/jclouds-pull-requests/639/) SUCCESS
This pull request looks good

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36741644

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by BuildHive <no...@github.com>.
[jclouds » jclouds #940](https://buildhive.cloudbees.com/job/jclouds/job/jclouds/940/) FAILURE
Looks like there's a problem with this pull request
[(what's this?)](https://www.cloudbees.com/what-is-buildhive)

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38413229

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
> +         final String fullPath = parentIsFolder(options, md) ? options.getDir()
> +               + "/" + md.getName() : md.getName();
> +
> +            // Attempt to acquire a semaphore within the time limit. At least
> +            // one outstanding future should complete within this period for the
> +            // semaphore to be acquired.
> +         try {
> +            if (!semaphore.tryAcquire(maxTime, TimeUnit.MILLISECONDS)) {
> +               throw new TimeoutException("Timeout waiting for semaphore");
> +            }
> +         } catch (InterruptedException ie) {
> +            logger.debug("Interrupted while deleting blobs");
> +            Thread.currentThread().interrupt();
> +         }
> +
> +         final ListenableFuture<Void> blobDelFuture;

If you wrap ```blobDelFuture``` in an ```Optional``` will this remove some of the duplicated null assignments?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10940709

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
Looks like a network glitch on cloudbees.

<code>
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Could not clone git://github.com/jclouds/jclouds.git
...
stderr: fatal: unable to connect to github.com:
github.com[0: 192.30.252.129]: errno=Connection timed out
</code>

Can we start another build?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38413431

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
> +         listing = blobStore.list(containerName, options);
> +      } catch (ContainerNotFoundException ce) {
> +         return listing;
> +      }
> +
> +      // recurse on subdirectories
> +      if (options.isRecursive()) {
> +         for (StorageMetadata md : listing) {
> +            String fullPath = parentIsFolder(options, md) ? options.getDir()
> +                  + "/" + md.getName() : md.getName();
> +            switch (md.getType()) {
> +            case BLOB:
> +               break;
> +            case FOLDER:
> +            case RELATIVE_PATH:
> +               if (options.isRecursive() &&

Removed.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10872428

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
Thanks a lot @andrewgaul 

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38741560

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
>  import com.google.common.util.concurrent.ListenableFuture;
>  import com.google.common.util.concurrent.ListeningExecutorService;
>  import com.google.inject.Inject;
>  
>  /**
>   * Deletes all keys in the container
> - * 
> + *
>   * @author Adrian Cole

Add your name to since this is a large body of new code?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10853480

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
I tried varying the number of threads from 10 to 50 and deleting a container with at least 5K blobs. Things worked fine. I'll rebase this patch on top of the current ToT and send out.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36594782

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
> but haven't been able to reproduce the problem.

OK, let's kick the PR builders again, then. I also suspect it's transient. Thanks, @shrinandj!

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36737307

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
Closed #214.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
Thanks for taking the time to review, @andrewgaul!

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-30187292

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
> +         // If a future to delete a blob/directory actually got created above,
> +         // keep a reference of that in the outstandingFutures list. This is
> +         // useful in case of a timeout exception. All outstanding futures can
> +         // then be cancelled.
> +         if (blobDelFuture != null) {
> +            final AtomicReference<ListenableFuture<Void>> atomicBlobDelFuture =
> +               new AtomicReference<ListenableFuture<Void>>(blobDelFuture);
> +            outstandingFutures.add(atomicBlobDelFuture);
> +
> +            // Add a callback to release the semaphore. This is required for
> +            // other threads waiting to acquire a semaphore above to make
> +            // progress.
> +            Futures.addCallback(blobDelFuture, new FutureCallback<Object>() {
> +               @Override
> +               public void onSuccess(final Object o) {
> +                  outstandingFutures.remove(atomicBlobDelFuture);

Done.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10872324

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
> +       * a timeout. Also, when the reference is removed from this list and when
> +       * the executorService removes the reference that it has maintained, the
> +       * future will be marked for GC since there should be no other references
> +       * to it. This is important because this code can generate an unbounded
> +       * number of futures.
> +       */
> +      final List<AtomicReference<ListenableFuture<Void>>> outstandingFutures = Collections
> +            .synchronizedList(new ArrayList<AtomicReference<ListenableFuture<Void>>>());
> +      while (retries > 0) {
> +         deleteFailure.set(false);
> +         executeOneIteration(containerName, listOptions.clone(), semaphore,
> +               outstandingFutures, deleteFailure);
> +         // Wait for all futures to complete by waiting to acquire all
> +         // semaphores.
> +         try {
> +            if (!semaphore.tryAcquire(numOutStandingRequests, maxTime,

These timeouts provide little value over the per-operation timouts and we may want to remove them in a subsequent commit.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10853876

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Shri Javadekar <no...@github.com>.
Sure. Let me know if I can do anything to make this go faster.

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-29438511

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-java-7-pull-requests #1154](https://jclouds.ci.cloudbees.com/job/jclouds-java-7-pull-requests/1154/) SUCCESS
This pull request looks good

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38415165

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
At this point we have pushed fixes for Swift 408, Atmos 500/1040, thundering herd, and limiting user threads.  I believe we have addressed all symptoms I previously encountered and we should evaluate this commit for inclusion.  I will look at this tomorrow; @demobox can you also give this a read-through?  Thank you for your patience and persistence @shrinandj!

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36596055

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
@shrinandj I appreciate this commit, although we cannot merge it without a breadth of testing against the major providers and likely several additional fixes.  I wrote but did not push a <a href="https://github.com/maginatics/jclouds/tree/clear-container-parallel-list-container-delete-blob">similar commit</a> a few months ago and found that Atmos Online fails with with <a href="https://issues.apache.org/jira/browse/JCLOUDS-137">500 internal errors</a> with more than 16 outstanding operations and Swift-based blobstores often fail with a 408 request timeout.  Further, the outer retry loop for ```clearContainer``` is entirely unnecessary once we fix these issues.  Related, the internal jclouds retry mechanism suffers from the thundering herd problem (<a href="https://github.com/jclouds/legacy-jclouds/pull/1550">fix available</a>) and fixing this obviates the need for outer retries.

We should coordinate on this in-person to avoid duplicated effort, especially since we only sit a few feet away from each other!

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-30183335

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
>  I also suspect it's transient.

Bingo ;-)

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-36742345

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
@shrinandj This commit represents a big improvement and I apologize for my delayed comments.  Can you address some of these and add TODOs for the rest so we can commit this as soon as possible?  Specifically we must address the O(n) behavior and I do not understand some of the synchronization.  Further I have some doubts about how the size of the ```ExecutorService``` interacts with the ```Semaphore```; should the sizes of the two be the same?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38323072

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Gaul <no...@github.com>.
> +         // If a future to delete a blob/directory actually got created above,
> +         // keep a reference of that in the outstandingFutures list. This is
> +         // useful in case of a timeout exception. All outstanding futures can
> +         // then be cancelled.
> +         if (blobDelFuture != null) {
> +            final AtomicReference<ListenableFuture<Void>> atomicBlobDelFuture =
> +               new AtomicReference<ListenableFuture<Void>>(blobDelFuture);
> +            outstandingFutures.add(atomicBlobDelFuture);
> +
> +            // Add a callback to release the semaphore. This is required for
> +            // other threads waiting to acquire a semaphore above to make
> +            // progress.
> +            Futures.addCallback(blobDelFuture, new FutureCallback<Object>() {
> +               @Override
> +               public void onSuccess(final Object o) {
> +                  outstandingFutures.remove(atomicBlobDelFuture);

Can you make outstandingFutures a HashSet to avoid O(n) lookup?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214/files#r10853439

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by CloudBees pull request builder plugin <no...@github.com>.
[jclouds-pull-requests #684](https://jclouds.ci.cloudbees.com/job/jclouds-pull-requests/684/) SUCCESS
This pull request looks good

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38414897

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
@shrinandj: thanks for the update. Have you been able to discuss this PR with @andrewgaul?

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-31489268

Re: [jclouds] Delete objects in a container efficiently. (#214)

Posted by Andrew Phillips <no...@github.com>.
> Looks like a network glitch on cloudbees.

Looks like things [should be working again](https://buildhive.cloudbees.com/job/jclouds/job/jclouds/). We should be fine for the next PR build...

---
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds/pull/214#issuecomment-38550864