You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@jclouds.apache.org by jo...@gmail.com, jo...@gmail.com on 2018/04/05 02:12:28 UTC

putBlob with an already existing object

What is jclouds's general policy with regard to putting a blob to a cloud service where the blob already exists and the cloud provider doesn't allow overwrites?

Seems like it would be nice to be able to treat the operation like it's an idempotent http PUT, but if the service disallows overwrites, jclouds would receive an exception in this case. Jclouds could then verify that the existing object has the same content and silently return "ok" as if the put worked. 

However, what happens if the cloud service has an object with the same name and different content? The only way to maintain the idempotent quality would be to silently delete the existing object and try the put again under the covers - this seems imprudent to me and unlikely to be the current functionality.

What really happens? 

Thanks,
John

P.S. I'd look this stuff up myself if I could only trace my way to the bottom levels of the jclouds code. There's so much interface wrapping going on in there, along with dependency injection, it's nearly impossible to tell where the rubber hits the road. If anyone can provide a hint about how to read the code from user-level to wire-level, I'd really appreciate it.

Re: putBlob with an already existing object

Posted by jo...@gmail.com, jo...@gmail.com.
> multi-part uploads.  Unfortunately Atmos is odd for a number of reasons
> and delete and retry was the best workaround at the time, especially for
> a low-popularity provider.  Some blobstores like Ceph can address this
> issue with conditional PUT but this is not supported elsewhere.

In the long run, the best solution might be to throw a documented exception and allow the user to handle it. Sadly, this would mean that everyone would have to watch for the exception in all cases now because they don't know what the underlying provider semantics are.

Re: putBlob with an already existing object

Posted by Andrew Gaul <ga...@apache.org>.
On Thu, Apr 05, 2018 at 04:03:04PM -0000, john.calcote@gmail.com wrote:
> Thanks for the quick response Andrew - 
> 
> > The closet analog is AtmosUtils.putBlob which retries on
> > KeyAlreadyExistsException after removing.  Generally the jclouds
> > portable abstraction tries to make all blobstores act the same and uses
> > the native behavior for the providers.  Which blobstore has similar
> > behavior?  I am not sure how we should handle this for almost
> > S3-compatible implementations like Hitachi.
> 
> So, clarifying - you're saying if it gets a KeyAlreadyExistsException, it then deletes the key and retries the put? That seems a bit harsh - what if you're building a distributed system on top of jclouds and you have two cluster nodes racing to put the same key? Would it not be better to at least test the metadata to see if you're trying to overwrite the same data and just silently return ok?

Agreed that this is racy and something
https://issues.apache.org/jira/browse/JCLOUDS-1111 unsuccessfully tried
to address through a newer header that not all implementations support.
Atmos does not return an ETag so we cannot check the same content,
although ETag checking does not always work on S3, for example with
multi-part uploads.  Unfortunately Atmos is odd for a number of reasons
and delete and retry was the best workaround at the time, especially for
a low-popularity provider.  Some blobstores like Ceph can address this
issue with conditional PUT but this is not supported elsewhere.

-- 
Andrew Gaul
http://gaul.org/

Re: putBlob with an already existing object

Posted by jo...@gmail.com, jo...@gmail.com.
Thanks for the quick response Andrew - 

> The closet analog is AtmosUtils.putBlob which retries on
> KeyAlreadyExistsException after removing.  Generally the jclouds
> portable abstraction tries to make all blobstores act the same and uses
> the native behavior for the providers.  Which blobstore has similar
> behavior?  I am not sure how we should handle this for almost
> S3-compatible implementations like Hitachi.

So, clarifying - you're saying if it gets a KeyAlreadyExistsException, it then deletes the key and retries the put? That seems a bit harsh - what if you're building a distributed system on top of jclouds and you have two cluster nodes racing to put the same key? Would it not be better to at least test the metadata to see if you're trying to overwrite the same data and just silently return ok?

John

Re: putBlob with an already existing object

Posted by Andrew Gaul <ga...@apache.org>.
On Thu, Apr 05, 2018 at 02:12:28AM -0000, john.calcote@gmail.com wrote:
> What is jclouds's general policy with regard to putting a blob to a cloud service where the blob already exists and the cloud provider doesn't allow overwrites?
> 
> Seems like it would be nice to be able to treat the operation like it's an idempotent http PUT, but if the service disallows overwrites, jclouds would receive an exception in this case. Jclouds could then verify that the existing object has the same content and silently return "ok" as if the put worked. 
> 
> However, what happens if the cloud service has an object with the same name and different content? The only way to maintain the idempotent quality would be to silently delete the existing object and try the put again under the covers - this seems imprudent to me and unlikely to be the current functionality.

The closet analog is AtmosUtils.putBlob which retries on
KeyAlreadyExistsException after removing.  Generally the jclouds
portable abstraction tries to make all blobstores act the same and uses
the native behavior for the providers.  Which blobstore has similar
behavior?  I am not sure how we should handle this for almost
S3-compatible implementations like Hitachi.

> P.S. I'd look this stuff up myself if I could only trace my way to the bottom levels of the jclouds code. There's so much interface wrapping going on in there, along with dependency injection, it's nearly impossible to tell where the rubber hits the road. If anyone can provide a hint about how to read the code from user-level to wire-level, I'd really appreciate it.

jclouds uses metaprogramming which allows compact notation but obscures
the intent.  Most of the magic lies in RestAnnotationProcessor if you
want to see how it works.

-- 
Andrew Gaul
http://gaul.org/