You are viewing a plain text version of this content. The canonical link for it is here.

Posted to codereview@trafodion.apache.org by selvaganesang <gi...@git.apache.org> on 2018/06/29 21:41:29 UTC

[GitHub] trafodion pull request #1626: [TRAFODION-3126] Refactored HDFS client implem...

GitHub user selvaganesang opened a pull request:

    https://github.com/apache/trafodion/pull/1626

    [TRAFODION-3126] Refactored HDFS client implementation should also su…

    …pport Alluxio file system
    
    Alluxio doesn't support direct ByteBuffer access. Circumvented
    this problem by using non-direct ByteBuffer to read hdfs files
    when it belongs to Alluxio file system.
    
    No need to change the default setting of USE_LIBHDFS for Alluxio to work.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/selvaganesang/trafodion alluxio_support

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafodion/pull/1626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1626
    
----
commit 2a6cfd1a24d05e1e243919897a11572e68c14d59
Author: selvaganesang <se...@...>
Date:   2018-06-29T21:26:49Z

    [TRAFODION-3126] Refactored HDFS client implementation should also support Alluxio file system
    
    Alluxio doesn't support direct ByteBuffer access. Circumvented
    this problem by using non-direct ByteBuffer to read hdfs files
    when it belongs to Alluxio file system.
    
    No need to change the default setting of USE_LIBHDFS for Alluxio to work.

----


---

[GitHub] trafodion pull request #1626: [TRAFODION-3126] Refactored HDFS client implem...

Posted by zellerh <gi...@git.apache.org>.

Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1626#discussion_r199663199
  
    --- Diff: core/sql/src/main/java/org/trafodion/sql/HDFSClient.java ---
    @@ -142,21 +154,32 @@
           HDFSRead() 
           {
           }
    - 
    +    
           public Object call() throws IOException 
           {
              int bytesRead;
              int totalBytesRead = 0;
              if (compressed_) {
                 bufArray_ = new byte[ioByteArraySizeInKB_ * 1024];
    -         } else 
    -         if (! buf_.hasArray()) {
    -            try {
    -              fsdis_.seek(pos_);
    -            } catch (EOFException e) {
    -              isEOF_ = 1;
    -              return new Integer(totalBytesRead);
    -            } 
    +         } 
    +         else  {
    +            // alluxio doesn't support direct ByteBuffer reads
    +            // Hence, create a non-direct ByteBuffer, read into
    +            // byteArray backing up this ByteBuffer and 
    +            // then copy the data read to direct ByteBuffer for the 
    +            // native layer to process the data
    +            if ((! alluxioNotInstalled_) && fs_ instanceof alluxio.hadoop.FileSystem) {
    +               savedBuf_ = buf_;
    +               buf_ = ByteBuffer.allocate(savedBuf_.capacity());
    --- End diff --
    
    Thanks for the explanation!


---

[GitHub] trafodion pull request #1626: [TRAFODION-3126] Refactored HDFS client implem...

Posted by selvaganesang <gi...@git.apache.org>.

Github user selvaganesang commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1626#discussion_r199657006
  
    --- Diff: core/sql/src/main/java/org/trafodion/sql/HDFSClient.java ---
    @@ -126,6 +128,16 @@
           catch (IOException ioe) {
              throw new RuntimeException("Exception in HDFSClient static block", ioe);
           }
    +      try {
    +         boolean alluxioFs = defaultFs_ instanceof alluxio.hadoop.FileSystem;
    +      }
    +      catch (Throwable rte)
    +      {
    +         // Ignore the exception. It is not needed for alluxio to be installed
    +         // for the methods of this class to work if 
    +         // alluxio filesystem is NOT required
    +         alluxioNotInstalled_ = true;
    --- End diff --
    
    You would get stack trace in the JIRA


---

[GitHub] trafodion pull request #1626: [TRAFODION-3126] Refactored HDFS client implem...

Posted by sureshsubbiah <gi...@git.apache.org>.

Github user sureshsubbiah commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1626#discussion_r199655367
  
    --- Diff: core/sql/src/main/java/org/trafodion/sql/HDFSClient.java ---
    @@ -126,6 +128,16 @@
           catch (IOException ioe) {
              throw new RuntimeException("Exception in HDFSClient static block", ioe);
           }
    +      try {
    +         boolean alluxioFs = defaultFs_ instanceof alluxio.hadoop.FileSystem;
    +      }
    +      catch (Throwable rte)
    +      {
    +         // Ignore the exception. It is not needed for alluxio to be installed
    +         // for the methods of this class to work if 
    +         // alluxio filesystem is NOT required
    +         alluxioNotInstalled_ = true;
    --- End diff --
    
    Can you please explain what error we get if the file to be read is an alluxio file and alluxio is not installed? 


---

[GitHub] trafodion pull request #1626: [TRAFODION-3126] Refactored HDFS client implem...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/trafodion/pull/1626


---

[GitHub] trafodion pull request #1626: [TRAFODION-3126] Refactored HDFS client implem...

Posted by selvaganesang <gi...@git.apache.org>.

Github user selvaganesang commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1626#discussion_r199663083
  
    --- Diff: core/sql/src/main/java/org/trafodion/sql/HDFSClient.java ---
    @@ -142,21 +154,32 @@
           HDFSRead() 
           {
           }
    - 
    +    
           public Object call() throws IOException 
           {
              int bytesRead;
              int totalBytesRead = 0;
              if (compressed_) {
                 bufArray_ = new byte[ioByteArraySizeInKB_ * 1024];
    -         } else 
    -         if (! buf_.hasArray()) {
    -            try {
    -              fsdis_.seek(pos_);
    -            } catch (EOFException e) {
    -              isEOF_ = 1;
    -              return new Integer(totalBytesRead);
    -            } 
    +         } 
    +         else  {
    +            // alluxio doesn't support direct ByteBuffer reads
    +            // Hence, create a non-direct ByteBuffer, read into
    +            // byteArray backing up this ByteBuffer and 
    +            // then copy the data read to direct ByteBuffer for the 
    +            // native layer to process the data
    +            if ((! alluxioNotInstalled_) && fs_ instanceof alluxio.hadoop.FileSystem) {
    +               savedBuf_ = buf_;
    +               buf_ = ByteBuffer.allocate(savedBuf_.capacity());
    --- End diff --
    
    HdfsClient object is constructed for every range or every HdfsScanBuf size within a range in HdfsScan.  The call method is called only once. So, it is ok to allocate.
    
    However, It is possible to optimize on this, if you push the detection of file system before HdfsClient object is constructed to HdfsScan, if the allocation becomes a bottleneck. 


---

[GitHub] trafodion pull request #1626: [TRAFODION-3126] Refactored HDFS client implem...

Posted by zellerh <gi...@git.apache.org>.

Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1626#discussion_r199650506
  
    --- Diff: core/sql/src/main/java/org/trafodion/sql/HDFSClient.java ---
    @@ -142,21 +154,32 @@
           HDFSRead() 
           {
           }
    - 
    +    
           public Object call() throws IOException 
           {
              int bytesRead;
              int totalBytesRead = 0;
              if (compressed_) {
                 bufArray_ = new byte[ioByteArraySizeInKB_ * 1024];
    -         } else 
    -         if (! buf_.hasArray()) {
    -            try {
    -              fsdis_.seek(pos_);
    -            } catch (EOFException e) {
    -              isEOF_ = 1;
    -              return new Integer(totalBytesRead);
    -            } 
    +         } 
    +         else  {
    +            // alluxio doesn't support direct ByteBuffer reads
    +            // Hence, create a non-direct ByteBuffer, read into
    +            // byteArray backing up this ByteBuffer and 
    +            // then copy the data read to direct ByteBuffer for the 
    +            // native layer to process the data
    +            if ((! alluxioNotInstalled_) && fs_ instanceof alluxio.hadoop.FileSystem) {
    +               savedBuf_ = buf_;
    +               buf_ = ByteBuffer.allocate(savedBuf_.capacity());
    --- End diff --
    
    Do we do this allocation every time we call the call() method? If so, could we avoid that and do this only once? The same applies to line 163 above.


---