You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/07/23 07:27:08 UTC

[GitHub] [incubator-doris] huozhanfeng opened a new pull request #6308: Fix bug that broker can't load s3a files

huozhanfeng opened a new pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308


   ## Proposed changes
   
   Currently, Doris supports loading OSS/S3A files by using params like fs.s3a.access.key, but there is a bug when using it to load such type files. The root cause is broker can not handle FSDataInputStream which does not implement ByteBufferReadable.
   
   See Issue #6307
   S3A input stream to support ByteBufferReadable
   https://issues.apache.org/jira/browse/HADOOP-14603
   
   ## Types of changes
   - [x] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   - [ ] Code refactor (Modify the code structure, format the code, etc...)
   - [ ] Optimization. Including functional usability improvements and performance improvements.
   - [ ] Dependency. Such as changes related to third-party components.
   - [ ] Other.
   
   ## Checklist
   - [x] I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   - [ ] I have added tests that prove my fix is effective or that my feature works
   - [ ] If these changes need document changes, I have updated the document
   - [ ] Any dependent changes have been merged
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r681770206



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       I have tested a case by using ORC format file, it can read enough bytes when `readBufferSize` is larger than 128kb. And in ORC mode, the limitation of 128k is controlled by logic as follows:
   <pre>
   file: be/src/exec/orc_scanner.cpp
   
   /**
    * Get the natural size for reads.
    * @return the number of bytes that should be read at once
    */
   uint64_t getNaturalReadSize() const override {
       return 128 * 1024;
   }
   </pre>
   
   See https://github.com/apache/orc/blob/main/c++/src/OrcFile.cc to get more details.
   
   Part of log as follows when modify logic to exceeds 128k
   <pre>
   2021-08-03 09:33:38  [ pool-2-thread-2:25353 ] - [ INFO ]  read buffer from input stream, request.length 99, readBufferSize:1048576, buffer size:99, read length:99
   2021-08-03 09:33:38  [ pool-2-thread-2:25364 ] - [ INFO ]  read buffer from input stream, request.length 67876, readBufferSize:1048576, buffer size:67876, read length:67876
   2021-08-03 09:33:38  [ pool-2-thread-2:25389 ] - [ INFO ]  read buffer from input stream, request.length 22518, readBufferSize:1048576, buffer size:22518, read length:22518
   2021-08-03 09:33:38  [ pool-2-thread-2:25417 ] - [ INFO ]  read buffer from input stream, request.length 279756, readBufferSize:1048576, buffer size:279756, read length:279756
   2021-08-03 09:33:38  [ pool-2-thread-2:25434 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   </pre>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677506934



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       Ehh...I think `ByteBuffer` can't solve such a problem, it's only related to what size of the buffer we inited and whether the buffer can read enough bytes. In this way, the `ByteBuffer` should same as `byte array`.
   
   I tested it with both `ByteBuffer` and `byte array` and the behavior are same when `readBufferSize` is larger than 128kb. All two of them can't read more the 128k data. Here is the debug code and part of the log.
   <pre>
   logger.info("read buffer from input stream, request.length " + length + ", readBufferSize:" + readBufferSize +", buffer size:" + buf.length + ", read length:" + readLength);
                   
   2021-07-27 09:57:04  [ pool-2-thread-4:31261 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31268 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31273 ] - [ INFO ]  read buffer from input stream, request.length 17612, readBufferSize:1048576, buffer size:17612, read length:17612
   2021-07-27 09:57:04  [ pool-2-thread-4:31275 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   2021-07-27 09:57:04  [ pool-2-thread-4:31277 ] - [ INFO ]  read buffer from input stream, request.length 680, readBufferSize:1048576, buffer size:680, read length:680
   </pre>
   
   I saw the default value of `readBufferSize` is 128k. Maybe there is a tricky logic in Doris.
   
   <pre>
       private int readBufferSize = 128 << 10; // 128k
       private int writeBufferSize = 128 << 10; // 128k
   </pre>
   
   I guess the root cause is `TBrokerPReadRequest.length` in RPC request is no more than 128k which is controlled by the client. I have no BE dev env now, maybe you can help to take a look😁  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r681770206



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       I have tested a case by using ORC file format file, it can read enough bytes when `readBufferSize` is larger than 128kb. And in ORC mode, the limitation of 128k is controlled by logic as follows:
   <pre>
   file: be/src/exec/orc_scanner.cpp
   
   /**
    * Get the natural size for reads.
    * @return the number of bytes that should be read at once
    */
   uint64_t getNaturalReadSize() const override {
       return 128 * 1024;
   }
   </pre>
   
   See https://github.com/apache/orc/blob/main/c++/src/OrcFile.cc to get more details.
   
   Part of log as follows
   <pre>
   2021-08-03 09:33:38  [ pool-2-thread-2:25353 ] - [ INFO ]  read buffer from input stream, request.length 99, readBufferSize:1048576, buffer size:99, read length:99
   2021-08-03 09:33:38  [ pool-2-thread-2:25364 ] - [ INFO ]  read buffer from input stream, request.length 67876, readBufferSize:1048576, buffer size:67876, read length:67876
   2021-08-03 09:33:38  [ pool-2-thread-2:25389 ] - [ INFO ]  read buffer from input stream, request.length 22518, readBufferSize:1048576, buffer size:22518, read length:22518
   2021-08-03 09:33:38  [ pool-2-thread-2:25417 ] - [ INFO ]  read buffer from input stream, request.length 279756, readBufferSize:1048576, buffer size:279756, read length:279756
   2021-08-03 09:33:38  [ pool-2-thread-2:25434 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   </pre>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#issuecomment-892278835






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677506934



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       Ehh...I think `ByteBuffer` can't solve such a problem, it's only related to what size of the buffer we inited and whether the buffer can read enough bytes. In this way, the `ByteBuffer` should same as `byte array`.
   
   I tested it with both `ByteBuffer` and `byte array` and the behaviors are same when `readBufferSize` is larger than 128kb. All two of them can't read more the 128k data. Here is the debug code and part of the log.
   <pre>
   logger.info("read buffer from input stream, request.length " + length + ", readBufferSize:" + readBufferSize +", buffer size:" + buf.length + ", read length:" + readLength);
                   
   2021-07-27 09:57:04  [ pool-2-thread-4:31261 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31268 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31273 ] - [ INFO ]  read buffer from input stream, request.length 17612, readBufferSize:1048576, buffer size:17612, read length:17612
   2021-07-27 09:57:04  [ pool-2-thread-4:31275 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   2021-07-27 09:57:04  [ pool-2-thread-4:31277 ] - [ INFO ]  read buffer from input stream, request.length 680, readBufferSize:1048576, buffer size:680, read length:680
   </pre>
   
   I saw the default value of `readBufferSize` is 128k. Maybe there is a tricky logic in Doris.
   
   <pre>
       private int readBufferSize = 128 << 10; // 128k
       private int writeBufferSize = 128 << 10; // 128k
   </pre>
   
   I guess the root cause is `TBrokerPReadRequest.length` in RPC request is no more than 128k which is controlled by the client. I have no BE dev env now, maybe you can help to take a look😁  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677506934



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       Ehh...I think `ByteBuffer` can't solve such a problem, it's only related to what size of the buffer we inited and whether the buffer can read enough bytes. In this way, the `ByteBuffer` should same as `byte array`.
   
   I tested it with both `ByteBuffer` and `byte array` and the behavior are same when `readBufferSize` is larger than 128kb. All two of them can't read more the 128k data. Here is the debug code and part of the log.
   <pre>
   logger.debug("read buffer from input stream, request.length " + length + ", readBufferSize:" + readBufferSize +", buffer size:" + buf.length + ", read length:" + readLength);
                   
   2021-07-27 09:57:04  [ pool-2-thread-4:31261 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31268 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31273 ] - [ INFO ]  read buffer from input stream, request.length 17612, readBufferSize:1048576, buffer size:17612, read length:17612
   2021-07-27 09:57:04  [ pool-2-thread-4:31275 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   2021-07-27 09:57:04  [ pool-2-thread-4:31277 ] - [ INFO ]  read buffer from input stream, request.length 680, readBufferSize:1048576, buffer size:680, read length:680
   </pre>
   
   I guess the root cause is `TBrokerPReadRequest.length` in RPC request is no more than 128k which is controlled by the client. I have no BE dev env now, maybe you can help to take a look😁  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677511211



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       I saw the default value of `readBufferSize` is 128k, maybe there is a tricky logic in Doris.
   <dev>
       private int readBufferSize = 128 << 10; // 128k
       private int writeBufferSize = 128 << 10; // 128k
   </dev>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677337957



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       Could you help to test if the `readBufferSize` is larger than 128kb, can this `pread()` method read enough bytes that we want?
   The reason why we use `bytebuffer` is that when using `byte array`, it seems that it can not read more than 128kb.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r681770206



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       I have tested a case by using ORC file format file, it can read enough bytes when `readBufferSize` is larger than 128kb. And in ORC mode, the limitation of 128k is controlled by logic as follows:
   <pre>
   file: be/src/exec/orc_scanner.cpp
   
   /**
    * Get the natural size for reads.
    * @return the number of bytes that should be read at once
    */
   uint64_t getNaturalReadSize() const override {
       return 128 * 1024;
   }
   </pre>
   
   See https://github.com/apache/orc/blob/main/c++/src/OrcFile.cc to get more details.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r681770206



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       I have tested a case by using ORC format file, it can read enough bytes when `readBufferSize` is larger than 128kb. And in ORC mode, the limitation of 128k is controlled by logic as follows:
   <pre>
   file: be/src/exec/orc_scanner.cpp
   
   /**
    * Get the natural size for reads.
    * @return the number of bytes that should be read at once
    */
   uint64_t getNaturalReadSize() const override {
       return 128 * 1024;
   }
   </pre>
   
   See https://github.com/apache/orc/blob/main/c++/src/OrcFile.cc to get more details.
   
   Part of log as follows
   <pre>
   2021-08-03 09:33:38  [ pool-2-thread-2:25353 ] - [ INFO ]  read buffer from input stream, request.length 99, readBufferSize:1048576, buffer size:99, read length:99
   2021-08-03 09:33:38  [ pool-2-thread-2:25364 ] - [ INFO ]  read buffer from input stream, request.length 67876, readBufferSize:1048576, buffer size:67876, read length:67876
   2021-08-03 09:33:38  [ pool-2-thread-2:25389 ] - [ INFO ]  read buffer from input stream, request.length 22518, readBufferSize:1048576, buffer size:22518, read length:22518
   2021-08-03 09:33:38  [ pool-2-thread-2:25417 ] - [ INFO ]  read buffer from input stream, request.length 279756, readBufferSize:1048576, buffer size:279756, read length:279756
   2021-08-03 09:33:38  [ pool-2-thread-2:25434 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   </pre>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677506934



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       Ehh...I think `ByteBuffer` can't solve such a problem, it's only related to what size of the buffer we inited and whether the buffer can read enough bytes. In this way, the `ByteBuffer` should same as `byte array`.
   
   I tested it with both `ByteBuffer` and `byte array` and the behaviors are same when `readBufferSize` is larger than 128kb. All two of them can't read more than 128k data. Here is the debug code and part of the log.
   <pre>
   logger.info("read buffer from input stream, request.length " + length + ", readBufferSize:" + readBufferSize +", buffer size:" + buf.length + ", read length:" + readLength);
                   
   2021-07-27 09:57:04  [ pool-2-thread-4:31261 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31268 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31273 ] - [ INFO ]  read buffer from input stream, request.length 17612, readBufferSize:1048576, buffer size:17612, read length:17612
   2021-07-27 09:57:04  [ pool-2-thread-4:31275 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   2021-07-27 09:57:04  [ pool-2-thread-4:31277 ] - [ INFO ]  read buffer from input stream, request.length 680, readBufferSize:1048576, buffer size:680, read length:680
   </pre>
   
   I saw the default value of `readBufferSize` is 128k. Maybe there is a tricky logic in Doris.
   
   <pre>
       private int readBufferSize = 128 << 10; // 128k
       private int writeBufferSize = 128 << 10; // 128k
   </pre>
   
   I guess the root cause is that `TBrokerPReadRequest.length` in RPC request can't exceed 128k and It's controlled by the client. I have no BE dev env now, maybe you can help to take a look😁  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677506934



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       Ehh...I think `ByteBuffer` can't solve such a problem, it's only related to what size of the buffer we inited and whether the buffer can read enough bytes. In this way, the `ByteBuffer` should same as `byte array`.
   
   I tested it with both `ByteBuffer` and `byte array` and the behaviors are same when `readBufferSize` is larger than 128kb. All two of them can't read more the 128k data. Here is the debug code and part of the log.
   <pre>
   logger.info("read buffer from input stream, request.length " + length + ", readBufferSize:" + readBufferSize +", buffer size:" + buf.length + ", read length:" + readLength);
                   
   2021-07-27 09:57:04  [ pool-2-thread-4:31261 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31268 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31273 ] - [ INFO ]  read buffer from input stream, request.length 17612, readBufferSize:1048576, buffer size:17612, read length:17612
   2021-07-27 09:57:04  [ pool-2-thread-4:31275 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   2021-07-27 09:57:04  [ pool-2-thread-4:31277 ] - [ INFO ]  read buffer from input stream, request.length 680, readBufferSize:1048576, buffer size:680, read length:680
   </pre>
   
   I saw the default value of `readBufferSize` is 128k. Maybe there is a tricky logic in Doris.
   
   <pre>
       private int readBufferSize = 128 << 10; // 128k
       private int writeBufferSize = 128 << 10; // 128k
   </pre>
   
   I guess the root cause is that `TBrokerPReadRequest.length` in RPC request can't exceed 128k and It's controlled by the client. I have no BE dev env now, maybe you can help to take a look😁  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677506934



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       Ehh...I think `ByteBuffer` can't solve such a problem, it's only related to what size of the buffer we inited and whether the buffer can read enough bytes. In this way, the `ByteBuffer` should same as `byte array`.
   
   I tested it with both `ByteBuffer` and `byte array` and the behavior are same when `readBufferSize` is larger than 128kb. All two of them can't read more the 128k data. Here is the debug code and part of the log.
   <pre>
   logger.info("read buffer from input stream, request.length " + length + ", readBufferSize:" + readBufferSize +", buffer size:" + buf.length + ", read length:" + readLength);
                   
   2021-07-27 09:57:04  [ pool-2-thread-4:31261 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31268 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31273 ] - [ INFO ]  read buffer from input stream, request.length 17612, readBufferSize:1048576, buffer size:17612, read length:17612
   2021-07-27 09:57:04  [ pool-2-thread-4:31275 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   2021-07-27 09:57:04  [ pool-2-thread-4:31277 ] - [ INFO ]  read buffer from input stream, request.length 680, readBufferSize:1048576, buffer size:680, read length:680
   </pre>
   
   I guess the root cause is `TBrokerPReadRequest.length` in RPC request is no more than 128k which is controlled by the client. I have no BE dev env now, maybe you can help to take a look😁  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677511211



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       I saw the default value of `readBufferSize` is 128k. Maybe there is a tricky logic in Doris.
   
   <pre>
       private int readBufferSize = 128 << 10; // 128k
       private int writeBufferSize = 128 << 10; // 128k
   </pre>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] huozhanfeng commented on a change in pull request #6308: [Bug] Fix bug that broker can't load OSS/S3A files

Posted by GitBox <gi...@apache.org>.
huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677506934



##########
File path: fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long length) {
                             currentStreamOffset, offset);
                 }
             }
-            ByteBuffer buf;
+            // Avoid using the ByteBuffer based read for Hadoop because some FSDataInputStream
+            // implementations are not ByteBufferReadable,
+            // See https://issues.apache.org/jira/browse/HADOOP-14603
+            byte[] buf;
             if (length > readBufferSize) {
-                buf = ByteBuffer.allocate(readBufferSize);
+                buf = new byte[readBufferSize];

Review comment:
       Ehh...I think `ByteBuffer` can't solve such a problem, it's only related to what size of the buffer we inited and whether the buffer can read enough bytes. In this way, the `ByteBuffer` should same as `byte array`.
   
   I tested it with both `ByteBuffer` and `byte array` and the behaviors are same when `readBufferSize` is larger than 128kb. All two of them can't read more the 128k data. Here is the debug code and part of the log.
   <pre>
   logger.info("read buffer from input stream, request.length " + length + ", readBufferSize:" + readBufferSize +", buffer size:" + buf.length + ", read length:" + readLength);
                   
   2021-07-27 09:57:04  [ pool-2-thread-4:31261 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31268 ] - [ INFO ]  read buffer from input stream, request.length 131072, readBufferSize:1048576, buffer size:131072, read length:131072
   2021-07-27 09:57:04  [ pool-2-thread-4:31273 ] - [ INFO ]  read buffer from input stream, request.length 17612, readBufferSize:1048576, buffer size:17612, read length:17612
   2021-07-27 09:57:04  [ pool-2-thread-4:31275 ] - [ INFO ]  read buffer from input stream, request.length 186, readBufferSize:1048576, buffer size:186, read length:186
   2021-07-27 09:57:04  [ pool-2-thread-4:31277 ] - [ INFO ]  read buffer from input stream, request.length 680, readBufferSize:1048576, buffer size:680, read length:680
   </pre>
   
   I saw the default value of `readBufferSize` is 128k. Maybe there is a tricky logic in Doris.
   
   <pre>
       private int readBufferSize = 128 << 10; // 128k
       private int writeBufferSize = 128 << 10; // 128k
   </pre>
   
   I guess the root cause is that `TBrokerPReadRequest.length` in RPC request can't exceed 128k which is controlled by the client. I have no BE dev env now, maybe you can help to take a look😁  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org