You are viewing a plain text version of this content. The canonical link for it is here.
Posted to codereview@trafodion.apache.org by selvaganesang <gi...@git.apache.org> on 2018/08/01 20:15:02 UTC
[GitHub] trafodion pull request #1674: [TRAFODION-3171] Refactor Hive sequence file r...
GitHub user selvaganesang opened a pull request:
https://github.com/apache/trafodion/pull/1674
[TRAFODION-3171] Refactor Hive sequence file reading to use the new i…
…mplementation
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/selvaganesang/trafodion sequence_files
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/trafodion/pull/1674.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1674
----
commit 6165e6b1017521abed062c55931d4555b886387d
Author: selvaganesang <se...@...>
Date: 2018-08-01T20:08:46Z
[TRAFODION-3171] Refactor Hive sequence file reading to use the new implementation
----
---
[GitHub] trafodion pull request #1674: [TRAFODION-3171] Refactor Hive sequence file r...
Posted by sureshsubbiah <gi...@git.apache.org>.
Github user sureshsubbiah commented on a diff in the pull request:
https://github.com/apache/trafodion/pull/1674#discussion_r207440918
--- Diff: core/sql/src/main/java/org/trafodion/sql/HDFSClient.java ---
@@ -240,7 +256,48 @@ int compressedFileRead(int readLenRemain) throws IOException
return totalReadLen;
}
- native int copyToByteBuffer(ByteBuffer buf, int bufOffset, byte[] bufArray, int copyLen);
+ /* Trafodion adds record delimiter '\n' while copying it
+ to buffer backing up the ByteBuffer */
+
+ int sequenceFileRead(int readLenRemain) throws IOException
+ {
+ boolean eof = false;
+ byte[] byteArray;
+ int readLen;
+ int totalReadLen = 0;
+ long tempPos;
+ int lenRemain = readLenRemain;
+
+ while (!eof && lenRemain > 0) {
+ tempPos = reader_.getPosition();
+ try {
+ eof = reader_.next(key_, value_);
+ }
+ catch (java.io.EOFException e)
+ {
+ eof = true;
+ break;
+ }
+ byteArray = ((Text)value_).getBytes();
+ readLen = ((Text)value_).getLength();
+ if (readLen <= lenRemain) {
+
+ buf_.put(byteArray, 0, readLen);
+ buf_.put(recDelimiter_);
--- End diff --
Should we be worried that the 1 byte delimiter will put us past us past end of buffer as suggest in the comment in line 290? (when readLen == lenRemain)
---
[GitHub] trafodion pull request #1674: [TRAFODION-3171] Refactor Hive sequence file r...
Posted by selvaganesang <gi...@git.apache.org>.
Github user selvaganesang commented on a diff in the pull request:
https://github.com/apache/trafodion/pull/1674#discussion_r207459712
--- Diff: core/sql/executor/ExHdfsScan.cpp ---
@@ -571,6 +569,7 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
hdfsScan_ = HdfsScan::newInstance((NAHeap *)getHeap(), hdfsScanBuf_, hdfsScanBufMaxSize_,
hdfsScanTdb().hdfsIoByteArraySizeInKB_,
&hdfsFileInfoListAsArray_, beginRangeNum_, numRanges_, hdfsScanTdb().rangeTailIOSize_,
+ isSequenceFile(), hdfsScanTdb().recordDelimiter_,
--- End diff --
There is no need pass the record delimiter for text format because it is a raw read of text formatted table. It should have the record delimiter as per the hive metadata.
---
[GitHub] trafodion pull request #1674: [TRAFODION-3171] Refactor Hive sequence file r...
Posted by sureshsubbiah <gi...@git.apache.org>.
Github user sureshsubbiah commented on a diff in the pull request:
https://github.com/apache/trafodion/pull/1674#discussion_r207442360
--- Diff: core/sql/executor/ExHdfsScan.cpp ---
@@ -571,6 +569,7 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
hdfsScan_ = HdfsScan::newInstance((NAHeap *)getHeap(), hdfsScanBuf_, hdfsScanBufMaxSize_,
hdfsScanTdb().hdfsIoByteArraySizeInKB_,
&hdfsFileInfoListAsArray_, beginRangeNum_, numRanges_, hdfsScanTdb().rangeTailIOSize_,
+ isSequenceFile(), hdfsScanTdb().recordDelimiter_,
--- End diff --
If we did not pass in the recordDelimiter_ previously, do we know if previous code worked correctly for text format, when the record delimiter was something other than /n ? This PR does not seem change anything for text format reads, so if there was issue previously, it might still exist.
---
[GitHub] trafodion pull request #1674: [TRAFODION-3171] Refactor Hive sequence file r...
Posted by selvaganesang <gi...@git.apache.org>.
Github user selvaganesang commented on a diff in the pull request:
https://github.com/apache/trafodion/pull/1674#discussion_r207459245
--- Diff: core/sql/src/main/java/org/trafodion/sql/HDFSClient.java ---
@@ -240,7 +256,48 @@ int compressedFileRead(int readLenRemain) throws IOException
return totalReadLen;
}
- native int copyToByteBuffer(ByteBuffer buf, int bufOffset, byte[] bufArray, int copyLen);
+ /* Trafodion adds record delimiter '\n' while copying it
+ to buffer backing up the ByteBuffer */
+
+ int sequenceFileRead(int readLenRemain) throws IOException
+ {
+ boolean eof = false;
+ byte[] byteArray;
+ int readLen;
+ int totalReadLen = 0;
+ long tempPos;
+ int lenRemain = readLenRemain;
+
+ while (!eof && lenRemain > 0) {
+ tempPos = reader_.getPosition();
+ try {
+ eof = reader_.next(key_, value_);
+ }
+ catch (java.io.EOFException e)
+ {
+ eof = true;
+ break;
+ }
+ byteArray = ((Text)value_).getBytes();
+ readLen = ((Text)value_).getLength();
+ if (readLen <= lenRemain) {
+
+ buf_.put(byteArray, 0, readLen);
+ buf_.put(recDelimiter_);
--- End diff --
Yes. I need to consider the extra byte added for delimiter. Good catch, Will fix it.
---
[GitHub] trafodion pull request #1674: [TRAFODION-3171] Refactor Hive sequence file r...
Posted by sureshsubbiah <gi...@git.apache.org>.
Github user sureshsubbiah commented on a diff in the pull request:
https://github.com/apache/trafodion/pull/1674#discussion_r207986948
--- Diff: core/sql/executor/ExHdfsScan.cpp ---
@@ -571,6 +569,7 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
hdfsScan_ = HdfsScan::newInstance((NAHeap *)getHeap(), hdfsScanBuf_, hdfsScanBufMaxSize_,
hdfsScanTdb().hdfsIoByteArraySizeInKB_,
&hdfsFileInfoListAsArray_, beginRangeNum_, numRanges_, hdfsScanTdb().rangeTailIOSize_,
+ isSequenceFile(), hdfsScanTdb().recordDelimiter_,
--- End diff --
Thank you for explaining
---
[GitHub] trafodion pull request #1674: [TRAFODION-3171] Refactor Hive sequence file r...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/trafodion/pull/1674
---