You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@jclouds.apache.org by John Calcote <jo...@gmail.com> on 2021/09/09 19:37:49 UTC

S3 multipart upload throws NPE occasionally

Hi Andrew,

I'm running jclouds 2.3.0 these days. Here's the call stack:

java.lang.NullPointerException: Null id
        at org.jclouds.blobstore.domain.AutoValue_MultipartUpload.<init>(AutoValue_MultipartUpload.java:32) ~[jclouds-blobstore-2.3.0.jar:2.3.0]
        at org.jclouds.blobstore.domain.MultipartUpload.create(MultipartUpload.java:35) ~[jclouds-blobstore-2.3.0.jar:2.3.0]
        at org.jclouds.s3.blobstore.S3BlobStore.initiateMultipartUpload(S3BlobStore.java:371) ~[s3-2.3.0.jar:2.3.0]
        at org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356) ~[jclouds-blobstore-2.3.0.jar:2.3.0]
        at org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:349) ~[jclouds-blobstore-2.3.0.jar:2.3.0]
        at org.jclouds.s3.blobstore.S3BlobStore.putBlob(S3BlobStore.java:262) ~[s3-2.3.0.jar:2.3.0]

My code calls S3BlobStore.putBlob as follows:

                    blobStore.putBlob(cspInfo.getContainer(), blob, multipart());

blob is defined as:

                    Blob blob = blobStore.blobBuilder(key)
                            .payload(content)
                            .contentDisposition(key)
                            .contentLength(meta.getContentLength())
                            .contentMD5(HashCode.fromBytes(meta.getChecksum()))
                            .contentType(MediaType.APPLICATION_OCTET_STREAM)
                            .userMetadata(meta.getUserMeta())
                            .build();

When it happens (not very often), the problem seems to be here (in S3BlobStore.java):

   @Override
   public MultipartUpload initiateMultipartUpload(String container, BlobMetadata blobMetadata, PutOptions overrides) {
      PutObjectOptions options = new PutObjectOptions();
      if (overrides.getBlobAccess() == BlobAccess.PUBLIC_READ) {
         options = options.withAcl(CannedAccessPolicy.PUBLIC_READ);
      }
      String id = sync.initiateMultipartUpload(container, blob2ObjectMetadata.apply(blobMetadata), options);
      return MultipartUpload.create(container, blobMetadata.getName(), id, blobMetadata, overrides);
   }

In the last two lines 'id' is returned by the S3Client (sync) during the initiateMultipartUpload call. But id is occasionally returned as null for some reason. To be fair, I've only seen this happen once. I tried to chase it further, but (to be completely honest), I do not (nor have I ever been able to) figure out how to get past the S3Client barrier in jclouds. That is, I'm not very good with guice, so I don't know how to follow the code properly past a guice binding interface like S3Client.

Re: S3 multipart upload throws NPE occasionally

Posted by John Calcote <jo...@gmail.com>.
I just realized the problem is probably due to the fact that HTML is
generally case insensitive and the regex expects the HMTL tag to be exactly
<UploadId>. It's much easier (for me) to imagine the match failed because
they used an HTML tag with a different case than to imagine they embedded a
space in the id string.

The simplest fix then would be:

   Pattern pattern =
Pattern.compile("(?i)<UploadId>([\\S&&[^<]]+)</UploadId>");

The addition of the (?i) at the front would allow the entire regex to match
all explicit characters in a case-insensitive manner. I've tested this with
a Java regex tester and it works.

I'll compile and test this myself. I'll let you know if it seems to fix the
issue.

John

On Tue, Sep 14, 2021 at 4:52 PM John Calcote <jo...@gmail.com> wrote:

> Hi Andrew,
>
> Thanks for the quick response, but I don't think there's anything wrong
> with that regex. The expression - "<UploadId>([\\S&&[^<]]+)</UploadId>" -
> is pretty simple once you understand what the double ampersand does - it's
> an intersection operator. This regex means: Match the string
> <UploadId>X</UploadId> where X can be any list of non-whitespace characters
> (\\S) except the '<' character (which, of course, allows the expression to
> pick up all the non-whitespace characters between the opening tag and the
> start of the closing tag or the first whitespace character). The parens
> around X in the expression indicate a group, so it can be extracted from
> the results. If there ARE any whitespace characters between the opening and
> closing html tags, the expression will not match because the first
> character after the capture group is expected to be a '<' character.
>
> The system to which we're trying to upload is a t-systems OTC (Open
> Telecom Cloud) S3 service. We've heard OTC is based on Huawei Object
> Storage. It's possible it's not a perfect S3 implementation and this is the
> first time we've tried to hit it with a multi-part upload. It's possible
> the S3 service is sending an UploadId with an embedded whitespace
> character, which would cause the match to fail, and the capture group to
> return null. Although it seems stupid to do so, I don't see anything in the
> Amazon spec about not using whitespace in the upload id. To make a space
> work properly, you'd have to URL-encode the uploadId when using it in
> subsequent PUT request parameters.
>
> Further thoughts?
>
> John
>
> On Thu, Sep 9, 2021 at 5:15 PM Andrew Gaul <ga...@apache.org> wrote:
>
>> On Thu, Sep 09, 2021 at 07:37:49PM -0000, John Calcote wrote:
>> > java.lang.NullPointerException: Null id
>> >         at
>> org.jclouds.blobstore.domain.AutoValue_MultipartUpload.<init>(AutoValue_MultipartUpload.java:32)
>> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.blobstore.domain.MultipartUpload.create(MultipartUpload.java:35)
>> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.s3.blobstore.S3BlobStore.initiateMultipartUpload(S3BlobStore.java:371)
>> ~[s3-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356)
>> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:349)
>> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>> >         at
>> org.jclouds.s3.blobstore.S3BlobStore.putBlob(S3BlobStore.java:262)
>> ~[s3-2.3.0.jar:2.3.0]
>>
>> UploadIdFromHttpResponseViaRegex has a suspicious regular expression:
>>
>>     Pattern.compile("<UploadId>([\\S&&[^<]]+)</UploadId>")
>>
>> Do you use AWS or another S3 object store?  I suspect that this regex
>> fails to match in some corner case.  Could you simplify it and submit a
>> GitHub PR?
>>
>> --
>> Andrew Gaul
>> http://gaul.org/
>>
>

Re: S3 multipart upload throws NPE occasionally

Posted by John Calcote <jo...@gmail.com>.
Hi Andrew,

Thanks for the quick response, but I don't think there's anything wrong
with that regex. The expression - "<UploadId>([\\S&&[^<]]+)</UploadId>" -
is pretty simple once you understand what the double ampersand does - it's
an intersection operator. This regex means: Match the string
<UploadId>X</UploadId> where X can be any list of non-whitespace characters
(\\S) except the '<' character (which, of course, allows the expression to
pick up all the non-whitespace characters between the opening tag and the
start of the closing tag or the first whitespace character). The parens
around X in the expression indicate a group, so it can be extracted from
the results. If there ARE any whitespace characters between the opening and
closing html tags, the expression will not match because the first
character after the capture group is expected to be a '<' character.

The system to which we're trying to upload is a t-systems OTC (Open Telecom
Cloud) S3 service. We've heard OTC is based on Huawei Object Storage. It's
possible it's not a perfect S3 implementation and this is the first time
we've tried to hit it with a multi-part upload. It's possible the S3
service is sending an UploadId with an embedded whitespace character, which
would cause the match to fail, and the capture group to return null.
Although it seems stupid to do so, I don't see anything in the Amazon spec
about not using whitespace in the upload id. To make a space work properly,
you'd have to URL-encode the uploadId when using it in subsequent PUT
request parameters.

Further thoughts?

John

On Thu, Sep 9, 2021 at 5:15 PM Andrew Gaul <ga...@apache.org> wrote:

> On Thu, Sep 09, 2021 at 07:37:49PM -0000, John Calcote wrote:
> > java.lang.NullPointerException: Null id
> >         at
> org.jclouds.blobstore.domain.AutoValue_MultipartUpload.<init>(AutoValue_MultipartUpload.java:32)
> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
> >         at
> org.jclouds.blobstore.domain.MultipartUpload.create(MultipartUpload.java:35)
> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
> >         at
> org.jclouds.s3.blobstore.S3BlobStore.initiateMultipartUpload(S3BlobStore.java:371)
> ~[s3-2.3.0.jar:2.3.0]
> >         at
> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356)
> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
> >         at
> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:349)
> ~[jclouds-blobstore-2.3.0.jar:2.3.0]
> >         at
> org.jclouds.s3.blobstore.S3BlobStore.putBlob(S3BlobStore.java:262)
> ~[s3-2.3.0.jar:2.3.0]
>
> UploadIdFromHttpResponseViaRegex has a suspicious regular expression:
>
>     Pattern.compile("<UploadId>([\\S&&[^<]]+)</UploadId>")
>
> Do you use AWS or another S3 object store?  I suspect that this regex
> fails to match in some corner case.  Could you simplify it and submit a
> GitHub PR?
>
> --
> Andrew Gaul
> http://gaul.org/
>

Re: S3 multipart upload throws NPE occasionally

Posted by Andrew Gaul <ga...@apache.org>.
On Thu, Sep 09, 2021 at 07:37:49PM -0000, John Calcote wrote:
> java.lang.NullPointerException: Null id
>         at org.jclouds.blobstore.domain.AutoValue_MultipartUpload.<init>(AutoValue_MultipartUpload.java:32) ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>         at org.jclouds.blobstore.domain.MultipartUpload.create(MultipartUpload.java:35) ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>         at org.jclouds.s3.blobstore.S3BlobStore.initiateMultipartUpload(S3BlobStore.java:371) ~[s3-2.3.0.jar:2.3.0]
>         at org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356) ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>         at org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:349) ~[jclouds-blobstore-2.3.0.jar:2.3.0]
>         at org.jclouds.s3.blobstore.S3BlobStore.putBlob(S3BlobStore.java:262) ~[s3-2.3.0.jar:2.3.0]

UploadIdFromHttpResponseViaRegex has a suspicious regular expression:

    Pattern.compile("<UploadId>([\\S&&[^<]]+)</UploadId>")

Do you use AWS or another S3 object store?  I suspect that this regex
fails to match in some corner case.  Could you simplify it and submit a
GitHub PR?

-- 
Andrew Gaul
http://gaul.org/