You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/07/17 16:25:14 UTC

[GitHub] [hadoop-ozone] maobaolong opened a new pull request #1212: HDDS-3979. Use chunkSize as bufferSize for stream copy

maobaolong opened a new pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212


   ## What changes were proposed in this pull request?
   
   Use chunkSize as bufferSize for stream copy, so that we can reduce the readChunk count from s3g to datanodes.
   
   For example, without this PR, the bufferSize is a hardcode 4096, it means each time readChunk request can only read 1M data, but a chunk is 4MB default, if we use the chunkSize as the bufferSize, so we can read 4M data for each readChunk request, reduce the request time of O3 communicate to datanode.
   
   BTW, clarify the ObjectEndpoint code of s3g, we can use seek to enforce skip performance.
   
   ## What is the link to the Apache JIRA
   
   HDDS-3979
   
   ## How was this patch tested?
   
   Using awscli or goofys


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] maobaolong commented on a change in pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

maobaolong commented on a change in pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#discussion_r463120834



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
##########
@@ -465,6 +465,12 @@
   public static final String  OZONE_CLIENT_HTTPS_NEED_AUTH_KEY =
       "ozone.https.client.need-auth";
   public static final boolean OZONE_CLIENT_HTTPS_NEED_AUTH_DEFAULT = false;
+

Review comment:
       Thank you for give me this task,  i have create a ticket for it, and take it, https://issues.apache.org/jira/browse/HDDS-4049
   
   For this PR, lets just correct the key name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] maobaolong commented on pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

maobaolong commented on pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#issuecomment-666507055


   @elek Thanks you for your review and suggestion, i've addressed your comments, PTAL. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] maobaolong commented on pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

maobaolong commented on pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#issuecomment-662438867


   @runzhiwang Thank you for your review and `LGTM`. @bharatviswa504 Please take a look at this PR, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] maobaolong commented on a change in pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

maobaolong commented on a change in pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#discussion_r458762172



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/KeyInputStream.java
##########
@@ -325,62 +323,14 @@ public long getRemainingOfIndex(int index) throws IOException {
     return blockStreams.get(index).getRemaining();
   }
 
-  /**
-   * Copies some or all bytes from a large (over 2GB) <code>InputStream</code>
-   * to an <code>OutputStream</code>, optionally skipping input bytes.
-   * <p>
-   * Copy the method from IOUtils of commons-io to reimplement skip by seek
-   * rather than read. The reason why IOUtils of commons-io implement skip
-   * by read can be found at
-   * <a href="https://issues.apache.org/jira/browse/IO-203">IO-203</a>.
-   * </p>
-   * <p>
-   * This method uses the provided buffer, so there is no need to use a
-   * <code>BufferedInputStream</code>.
-   * </p>
-   *
-   * @param output the <code>OutputStream</code> to write to
-   * @param inputOffset : number of bytes to skip from input before copying
-   * -ve values are ignored
-   * @param length : number of bytes to copy. -ve means all
-   * @param buffer the buffer to use for the copy
-   * @return the number of bytes copied
-   * @throws NullPointerException if the input or output is null
-   * @throws IOException          if an I/O error occurs
-   */
-  public long copyLarge(final OutputStream output,
-      final long inputOffset, final long len, final byte[] buffer)
-      throws IOException {
-    if (inputOffset > 0) {
-      seek(inputOffset);
-    }
-
-    if (len == 0) {
+  @Override
+  public long skip(long n) throws IOException {

Review comment:
       Ok, done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] maobaolong commented on a change in pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

maobaolong commented on a change in pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#discussion_r458762541



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestKeyInputStream.java
##########
@@ -395,4 +334,64 @@ public void testReadChunk() throws Exception {
     }
     keyInputStream.close();
   }
+
+  @Test
+  public void testSkip() throws Exception {
+    {

Review comment:
       Sorry, my bad, already fixed it.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestKeyInputStream.java
##########
@@ -395,4 +334,64 @@ public void testReadChunk() throws Exception {
     }
     keyInputStream.close();
   }
+
+  @Test
+  public void testSkip() throws Exception {
+    {
+      XceiverClientManager.resetXceiverClientMetrics();

Review comment:
       Sorry, my bad, already fixed it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] maobaolong commented on a change in pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

maobaolong commented on a change in pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#discussion_r463113923



##########
File path: hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java
##########
@@ -114,6 +120,13 @@ public ObjectEndpoint() {
     customizableGetHeaders.add("Content-Encoding");
   }
 
+  @PostConstruct
+  public void init() {

Review comment:
       Thank you for let me know this high technology




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] runzhiwang commented on a change in pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

runzhiwang commented on a change in pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#discussion_r458757223



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestKeyInputStream.java
##########
@@ -395,4 +334,64 @@ public void testReadChunk() throws Exception {
     }
     keyInputStream.close();
   }
+
+  @Test
+  public void testSkip() throws Exception {
+    {
+      XceiverClientManager.resetXceiverClientMetrics();

Review comment:
       4 space ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] runzhiwang commented on a change in pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

runzhiwang commented on a change in pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#discussion_r458764140



##########
File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
##########
@@ -2421,6 +2421,14 @@
       information will be extracted
     </description>
   </property>
+  <property>
+    <name>ozone.client.buffer.size</name>
+    <tag>OZONE, S3GATEWAY</tag>
+    <value>4KB</value>
+    <description>
+      The size of the buffer which is for read block. (4KB by default).

Review comment:
       We can describe what size can achieve best performance.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] runzhiwang commented on a change in pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

runzhiwang commented on a change in pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#discussion_r458756944



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestKeyInputStream.java
##########
@@ -395,4 +334,64 @@ public void testReadChunk() throws Exception {
     }
     keyInputStream.close();
   }
+
+  @Test
+  public void testSkip() throws Exception {
+    {

Review comment:
       why { ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] maobaolong commented on pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

maobaolong commented on pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#issuecomment-662431050


   @runzhiwang Thanks for the review, addressed all of your suggestion, PTAL.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] runzhiwang commented on a change in pull request #1212: HDDS-3979. Use chunkSize as bufferSize for stream copy

Posted by GitBox <gi...@apache.org>.

runzhiwang commented on a change in pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#discussion_r458665555



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/KeyInputStream.java
##########
@@ -325,62 +323,14 @@ public long getRemainingOfIndex(int index) throws IOException {
     return blockStreams.get(index).getRemaining();
   }
 
-  /**
-   * Copies some or all bytes from a large (over 2GB) <code>InputStream</code>
-   * to an <code>OutputStream</code>, optionally skipping input bytes.
-   * <p>
-   * Copy the method from IOUtils of commons-io to reimplement skip by seek
-   * rather than read. The reason why IOUtils of commons-io implement skip
-   * by read can be found at
-   * <a href="https://issues.apache.org/jira/browse/IO-203">IO-203</a>.
-   * </p>
-   * <p>
-   * This method uses the provided buffer, so there is no need to use a
-   * <code>BufferedInputStream</code>.
-   * </p>
-   *
-   * @param output the <code>OutputStream</code> to write to
-   * @param inputOffset : number of bytes to skip from input before copying
-   * -ve values are ignored
-   * @param length : number of bytes to copy. -ve means all
-   * @param buffer the buffer to use for the copy
-   * @return the number of bytes copied
-   * @throws NullPointerException if the input or output is null
-   * @throws IOException          if an I/O error occurs
-   */
-  public long copyLarge(final OutputStream output,
-      final long inputOffset, final long len, final byte[] buffer)
-      throws IOException {
-    if (inputOffset > 0) {
-      seek(inputOffset);
-    }
-
-    if (len == 0) {
+  @Override
+  public long skip(long n) throws IOException {

Review comment:
       add unit test for skip.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] elek merged pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

elek merged pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] elek commented on a change in pull request #1212: HDDS-3979. Make bufferSize configurable for stream copy

Posted by GitBox <gi...@apache.org>.

elek commented on a change in pull request #1212:
URL: https://github.com/apache/hadoop-ozone/pull/1212#discussion_r462307639



##########
File path: hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java
##########
@@ -114,6 +120,13 @@ public ObjectEndpoint() {
     customizableGetHeaders.add("Content-Encoding");
   }
 
+  @PostConstruct
+  public void init() {

Review comment:
       NIT: It can be slightly better to use
   
   ```
     @Inject
     private OzoneConfiguration ozoneConfiguration;
   ```
   
   And you don't need to expose `getClient()` in the `EndpointBase`. 
   
   (Didn't try, but it should work, IMHO).

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
##########
@@ -465,6 +465,12 @@
   public static final String  OZONE_CLIENT_HTTPS_NEED_AUTH_KEY =
       "ozone.https.client.need-auth";
   public static final boolean OZONE_CLIENT_HTTPS_NEED_AUTH_DEFAULT = false;
+

Review comment:
       Two things:
   
    * `ozone.client.buffer.size` seems to be confusing. This configuration for the Ozone client of the **s3g**. As we have a global list of configuration keys, I guess we need a `s3g` prefix here. (For example `ozone.s3g.client.buffer.size`) 
    * Usually, I prefer to use annotation based configuration, if possible. (It can be harder here as S3g uses CDI injections. If it's too complex, you can ignore...)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org