You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2016/01/14 00:14:40 UTC
[jira] [Comment Edited] (HADOOP-12635) Adding Append API support for WASB

    [ https://issues.apache.org/jira/browse/HADOOP-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096976#comment-15096976 ] 

Chris Nauroth edited comment on HADOOP-12635 at 1/13/16 11:14 PM:
------------------------------------------------------------------

Thank you for patch v004.  Here is some additional code review feedback.

The {{BlockBlobAppendStream#ioQueue}} member seems unnecessary.  It is only used for initialization of {{ioThreadPool}}, and after that, all remaining logic interacts with {{ioThreadPool}} instead of {{ioQueue}}.  The {{ioQueue}} is something you can create locally inside {{initialize}}.

In {{BlockBlobAppendStream#close}}:
{code}
    if (closed) {
      leaseFreed = true;
      return;
    }
{code}

Is it necessary to set {{leaseFreed}} here, or can the line be removed?  I think it can be removed, because initial execution of {{close}} would have called {{cleanup}}, which sets {{leaseFreed}} to {{true}}.

{{BlockBlobAppendStream#generateBlockId}} has a small typo in an exception message: "invaid" should be "invalid".

I looked at the code for {{com.microsoft.azure.storage.core.Base64}} and {{com.microsoft.azure.storage.core.Utility}}.  Both have the warning "RESERVED FOR INTERNAL USE" at the top.  I'd prefer to avoid calling Azure Storage SDK internals that are not considered public API, and therefore might change incompatibly between releases.  For base-64 encoding, the rest of the Hadoop code uses {{org.apache.commons.codec.binary.Base64}}.  If you'd like to see sample usage, one example is in {{SaslRpcServer}}.  For {{Utility#getBytesFromLong}}, this looks like something trivial to clone a copy into the {{BlockBlobAppendStream}} class, with a comment added to indicate the code was copied from there.

{code}
    /**
     * Returns a byte array that represents the data of a <code>long</code> value.
     * 
     * @param value
     *            The value from which the byte array will be returned.
     * 
     * @return A byte array that represents the data of the specified <code>long</code> value.
     */
    public static byte[] getBytesFromLong(final long value) {
        final byte[] tempArray = new byte[8];

        for (int m = 0; m < 8; m++) {
            tempArray[7 - m] = (byte) ((value >> (8 * m)) & 0xFF);
        }

        return tempArray;
    }
{code}

{{PageBlobOutputStream#close}} has some code that dumps stack traces of stuck threads if there is an I/O timeout.  This has been helpful for troubleshooting.  I recommend doing the same for {{BlockBlobAppendStream#close}}, possibly refactoring to share the code.

When you create {{ioThreadPool}}, I recommend supplying a custom {{ThreadFactory}} that sets the name of returned threads to something like "BlockBlobAppendStream-<key>-<sequential ID>".  That will help identify these threads while troubleshooting with {{jstack}} and similar tools.  I see {{PageBlobOutputStream}} isn't doing this right now either.  (No need to fix it in scope of this patch.  It can be fixed later.)

In {{BlockBlobAppendStream.WriteRequest#run}}:

{code}
          try {
            Thread.sleep(BLOCK_UPLOAD_RETRY_INTERVAL);
          } catch(InterruptedException ie) {
            // Swalling the exception here
          }
{code}

Swallowing {{InterruptedException}} is an anti-pattern that is unfortunately prevalent throughout the existing Hadoop codebase, but let's not continue doing it for new code.  Let's restore interrupted status by calling {{Thread.currentThread().interrupt()}} and {{break}} out of the loop.

In {{NativeAzureFileSystem#append}}, the exception handling has the same kind of problem that we noticed during the HADOOP-12551 code review.  If the caught exception is a {{StorageException}}, but not a {{FileNotFoundException}}, then the original exception is swallowed instead of propagated to the caller.

Please address the Findbugs warning in {{BlockBlobAppendStream#updateBlobAppendMetadata}}.  It looks like you'll need to move the sleep outside the {{synchronized}} block.



was (Author: cnauroth):
Thank you for patch v004.  Here is some additional code review feedback.

The {{BlockBlobAppendStream#ioQueue}} member seems unnecessary.  It is only used for initialization of {{ioThreadPool}}, and after that, all remaining logic interacts with {{ioThreadPool}} instead of {{ioQueue}}.  The {{ioQueue}} is something you can create locally inside {{initialize}}.

In {{BlockBlobAppendStream#close}}:
{code}
    if (closed) {
      leaseFreed = true;
      return;
    }
{code}

Is it necessary to set {{leaseFreed}} here, or can the line be removed?  I think it can be removed, because initial execution of {{close}} would have called {{cleanup}}, which sets {{leaseFreed}} to {{true}}.

{{BlockBlobAppendStream#generateBlockId}} has a small typo in an exception message: "invaid" should be "invalid".

I looked at the code for {{com.microsoft.azure.storage.core.Base64}} and {{com.microsoft.azure.storage.core.Utility}}.  Both have the warning "RESERVED FOR INTERNAL USE" at the top.  I'd prefer to avoid calling Azure Storage SDK internals that are not considered public API, and therefore might change incompatibly between releases.  For base-64 encoding, the rest of the Hadoop code uses {{org.apache.commons.codec.binary.Base64}}.  If you'd like to see sample usage, one example is in {{SaslRpcServer}}.  For {{Utility#getBytesFromLong}}, this looks like something trivial to clone a copy into the {{BlockBlobAppendStream}} class, with a comment added to indicate the code was copied from there.

{code}
    /**
     * Returns a byte array that represents the data of a <code>long</code> value.
     * 
     * @param value
     *            The value from which the byte array will be returned.
     * 
     * @return A byte array that represents the data of the specified <code>long</code> value.
     */
    public static byte[] getBytesFromLong(final long value) {
        final byte[] tempArray = new byte[8];

        for (int m = 0; m < 8; m++) {
            tempArray[7 - m] = (byte) ((value >> (8 * m)) & 0xFF);
        }

        return tempArray;
    }
{code}

{{PageBlobOutputStream#close}} has some code that dumps stack traces of stuck threads if there is an I/O timeout.  This has been helpful for troubleshooting.  I recommend doing the same for {{BlockBlobAppendStream#close}}, possibly refactoring to share the code.

When you create {{ioThreadPool}}, I recommend supplying a custom {{ThreadFactory}} that sets the name of returned threads to something like "BlockBlobAppendStream-<key>-<sequential ID>".  That will help identify these threads while troubleshooting with {{jstack}} and similar tools.  I see {{PageBlobOutputStream}} isn't doing this right now either.  (No need to fix it in scope of this patch.  It can be fixed later.)

In {{BlockBlobAppendStream.WriteRequest#run}}:

{code}
          try {
            Thread.sleep(BLOCK_UPLOAD_RETRY_INTERVAL);
          } catch(InterruptedException ie) {
            // Swalling the exception here
          }
{code}

Swallowing {{InterruptedException}} is an anti-pattern that is unfortunately prevalent throughout the existing Hadoop codebase, but let's not continue doing it for new code.  Let's restore interrupted status by calling {{Thread.currentThread().interrupt()}} and {{break}} out of the loop.

In {{NativeAzureFileSystem#append}}, the exception handling has the same kind of problem that we noticed during the HADOOP-12551 code review.  If the caught exception is a {{StorageException}}, but not a {{FileNotFoundException}}, then the original exception is swallowed instead of propagated to the caller.

Please address the Findbugs warning in {{BlockBlobAppendStream#updateBlobAppendMetadata}}.  It looks like you'll need to move the sleeo outside the {{synchronized}} block.


> Adding Append API support for WASB
> ----------------------------------
>
>                 Key: HADOOP-12635
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12635
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: azure
>    Affects Versions: 2.7.1
>            Reporter: Dushyanth
>            Assignee: Dushyanth
>         Attachments: Append API.docx, HADOOP-12635-004.patch, HADOOP-12635.001.patch, HADOOP-12635.002.patch, HADOOP-12635.003.patch
>
>
> Currently the WASB implementation of the HDFS interface does not support Append API. This JIRA is added to design and implement the Append API support to WASB. The intended support for Append would only support a single writer.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)