You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by GitBox <gi...@apache.org> on 2022/05/11 09:26:01 UTC

[GitHub] [orc] coderex2522 opened a new pull request, #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function

coderex2522 opened a new pull request, #1114:
URL: https://github.com/apache/orc/pull/1114

   ### What changes were proposed in this pull request?
   The patch will optimize the seekToRow function.
   
   ### Why are the changes needed?
   The RowReader::seekToRow() function always call startNextStripe() to seek to the beginning of target stripe.
   However, when PPD is enabled many calls are still staying in the same stripe which results in many unnecessary calls. 
   
   ### How was this patch tested?
   The original UTs(TestPredicatePushdown, DictionaryEncoding::multipleStripesWithIndex, DictionaryEncoding::multipleStripesWithoutIndex) will cover the patch.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] stiga-huang commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function

Posted by GitBox <gi...@apache.org>.
stiga-huang commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r873380307


##########
c++/src/Reader.hh:
##########
@@ -187,6 +188,16 @@ namespace orc {
     friend class TestRowReader_advanceToNextRowGroup_Test;
     friend class TestRowReader_computeBatchSize_Test;
 
+    inline uint64_t lastRowOfCurrentStripe() const {

Review Comment:
   Do we use this somewhere?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r876767805


##########
c++/src/Reader.cc:
##########
@@ -391,27 +392,39 @@ namespace orc {
       return;
     }
 
-    currentStripe = seekToStripe;
-    currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
     previousRow = rowNumber;
-    startNextStripe();
+    auto rowIndexStride = footer->rowindexstride();
+    if (!isCurrentStripeInited()
+        || currentStripe != seekToStripe
+        || rowIndexStride == 0
+        || currentStripeInfo.indexlength() == 0) {
+      // current stripe is not initialized or
+      // target stripe is not current stripe or
+      // current stripe doesn't have row indexes
+      currentStripe = seekToStripe;
+      currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
+      startNextStripe();
+      if (currentStripe >= lastStripe) {
+        return;
+      }
+    } else {
+      currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
+      if (sargsApplier) {
+        // advance to selected row group if predicate pushdown is enabled
+        currentRowInStripe = advanceToNextRowGroup(currentRowInStripe,
+                                                   rowsInCurrentStripe,
+                                                   footer->rowindexstride(),
+                                                   sargsApplier->getNextSkippedRows());
+      }
+    }
 
     uint64_t rowsToSkip = currentRowInStripe;
-    auto rowIndexStride = footer->rowindexstride();
     // seek to the target row group if row indexes exists
     if (rowIndexStride > 0 && currentStripeInfo.indexlength() > 0) {
-      // when predicate push down is enabled, above call to startNextStripe()
-      // will move current row to 1st matching row group; here we only need
-      // to deal with the case when PPD is not enabled.
-      if (!sargsApplier) {
-        if (rowIndexes.empty()) {
-          loadStripeIndex();
-        }
-        auto rowGroupId = static_cast<uint32_t>(rowsToSkip / rowIndexStride);
-        if (rowGroupId != 0) {
-          seekToRowGroup(rowGroupId);
-        }
+      if (rowIndexes.empty()) {
+        loadStripeIndex();
       }
+      seekToRowGroup(static_cast<uint32_t>(rowsToSkip / rowIndexStride));

Review Comment:
   Improved the todo comment, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r871276433


##########
c++/src/Reader.hh:
##########
@@ -187,6 +188,16 @@ namespace orc {
     friend class TestRowReader_advanceToNextRowGroup_Test;
     friend class TestRowReader_computeBatchSize_Test;
 
+    inline uint64_t lastRowOfCurrentStripe() const {

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #1114:
URL: https://github.com/apache/orc/pull/1114#issuecomment-1131299073

   Although I set the milestone for Apache ORC 1.9.0, please feel free to backport this if you want in Apache ORC 1.8.0, @wgtmac and @stiga-huang .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r871279520


##########
c++/include/orc/MemoryPool.hh:
##########
@@ -72,6 +72,10 @@ namespace orc {
       return currentCapacity;
     }
 
+    const T& operator[](uint64_t i) const {
+      return buf[i];
+    }
+

Review Comment:
   The DataBuffer class already has a non-const operator() function, so adding a const operator() function should have no effect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #1114:
URL: https://github.com/apache/orc/pull/1114#issuecomment-1133785154

   This is merged via https://github.com/apache/orc/commit/11a2544f4e54b29d5b0d3ca6aeb8dde5291b2163


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] wgtmac commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
wgtmac commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r874287954


##########
c++/include/orc/MemoryPool.hh:
##########
@@ -72,6 +72,10 @@ namespace orc {
       return currentCapacity;
     }
 
+    const T& operator[](uint64_t i) const {
+      return buf[i];
+    }
+

Review Comment:
   suggest to use the trick to remove duplicates between const and non-const overloads: https://stackoverflow.com/questions/123758/how-do-i-remove-code-duplication-between-similar-const-and-non-const-member-func



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r877503729


##########
c++/include/orc/MemoryPool.hh:
##########
@@ -72,6 +72,10 @@ namespace orc {
       return currentCapacity;
     }
 
+    const T& operator[](uint64_t i) const {
+      return buf[i];
+    }
+

Review Comment:
   Which OS are you on? The build chain looks a little old to me. I expected `9.4.0` or `11.2.0`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r870900536


##########
c++/src/Reader.hh:
##########
@@ -187,6 +188,16 @@ namespace orc {
     friend class TestRowReader_advanceToNextRowGroup_Test;
     friend class TestRowReader_computeBatchSize_Test;
 
+    inline uint64_t lastRowOfCurrentStripe() const {

Review Comment:
   Please add a function description comment.



##########
c++/src/Reader.hh:
##########
@@ -187,6 +188,16 @@ namespace orc {
     friend class TestRowReader_advanceToNextRowGroup_Test;
     friend class TestRowReader_computeBatchSize_Test;
 
+    inline uint64_t lastRowOfCurrentStripe() const {
+      return currentStripe < lastStripe ?
+        firstRowOfStripe[currentStripe] + rowsInCurrentStripe :
+        footer->numberofrows();
+    }
+
+    inline bool isCurrentStripeInited() const {

Review Comment:
   Please add a function description comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r875511335


##########
c++/src/Reader.hh:
##########
@@ -187,6 +188,16 @@ namespace orc {
     friend class TestRowReader_advanceToNextRowGroup_Test;
     friend class TestRowReader_computeBatchSize_Test;
 
+    inline uint64_t lastRowOfCurrentStripe() const {

Review Comment:
   The function has remove.



##########
c++/src/Reader.cc:
##########
@@ -391,27 +392,39 @@ namespace orc {
       return;
     }
 
-    currentStripe = seekToStripe;
-    currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
     previousRow = rowNumber;
-    startNextStripe();
+    auto rowIndexStride = footer->rowindexstride();
+    if (!isCurrentStripeInited()
+        || currentStripe != seekToStripe
+        || rowIndexStride == 0
+        || currentStripeInfo.indexlength() == 0) {
+      // current stripe is not initialized or
+      // target stripe is not current stripe or
+      // current stripe doesn't have row indexes
+      currentStripe = seekToStripe;
+      currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
+      startNextStripe();
+      if (currentStripe >= lastStripe) {
+        return;
+      }
+    } else {
+      currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
+      if (sargsApplier) {
+        // advance to selected row group if predicate pushdown is enabled
+        currentRowInStripe = advanceToNextRowGroup(currentRowInStripe,
+                                                   rowsInCurrentStripe,
+                                                   footer->rowindexstride(),
+                                                   sargsApplier->getNextSkippedRows());
+      }
+    }
 
     uint64_t rowsToSkip = currentRowInStripe;
-    auto rowIndexStride = footer->rowindexstride();
     // seek to the target row group if row indexes exists
     if (rowIndexStride > 0 && currentStripeInfo.indexlength() > 0) {
-      // when predicate push down is enabled, above call to startNextStripe()
-      // will move current row to 1st matching row group; here we only need
-      // to deal with the case when PPD is not enabled.
-      if (!sargsApplier) {
-        if (rowIndexes.empty()) {
-          loadStripeIndex();
-        }
-        auto rowGroupId = static_cast<uint32_t>(rowsToSkip / rowIndexStride);
-        if (rowGroupId != 0) {
-          seekToRowGroup(rowGroupId);
-        }
+      if (rowIndexes.empty()) {
+        loadStripeIndex();
       }
+      seekToRowGroup(static_cast<uint32_t>(rowsToSkip / rowIndexStride));

Review Comment:
   Done



##########
c++/src/Reader.hh:
##########
@@ -146,6 +146,7 @@ namespace orc {
     uint64_t firstStripe;
     uint64_t currentStripe;
     uint64_t lastStripe; // the stripe AFTER the last one
+    uint64_t currentInitedStripe;

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r876666229


##########
c++/include/orc/MemoryPool.hh:
##########
@@ -72,6 +72,10 @@ namespace orc {
       return currentCapacity;
     }
 
+    const T& operator[](uint64_t i) const {
+      return buf[i];
+    }
+

Review Comment:
   Thanks. Could you share your environment information and the numbers with the above, @coderex2522 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] wgtmac commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
wgtmac commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r874292418


##########
c++/src/Reader.cc:
##########
@@ -391,27 +392,39 @@ namespace orc {
       return;
     }
 
-    currentStripe = seekToStripe;
-    currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
     previousRow = rowNumber;
-    startNextStripe();
+    auto rowIndexStride = footer->rowindexstride();
+    if (!isCurrentStripeInited()
+        || currentStripe != seekToStripe
+        || rowIndexStride == 0
+        || currentStripeInfo.indexlength() == 0) {
+      // current stripe is not initialized or
+      // target stripe is not current stripe or
+      // current stripe doesn't have row indexes
+      currentStripe = seekToStripe;
+      currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
+      startNextStripe();
+      if (currentStripe >= lastStripe) {
+        return;
+      }
+    } else {
+      currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
+      if (sargsApplier) {
+        // advance to selected row group if predicate pushdown is enabled
+        currentRowInStripe = advanceToNextRowGroup(currentRowInStripe,
+                                                   rowsInCurrentStripe,
+                                                   footer->rowindexstride(),
+                                                   sargsApplier->getNextSkippedRows());
+      }
+    }
 
     uint64_t rowsToSkip = currentRowInStripe;
-    auto rowIndexStride = footer->rowindexstride();
     // seek to the target row group if row indexes exists
     if (rowIndexStride > 0 && currentStripeInfo.indexlength() > 0) {
-      // when predicate push down is enabled, above call to startNextStripe()
-      // will move current row to 1st matching row group; here we only need
-      // to deal with the case when PPD is not enabled.
-      if (!sargsApplier) {
-        if (rowIndexes.empty()) {
-          loadStripeIndex();
-        }
-        auto rowGroupId = static_cast<uint32_t>(rowsToSkip / rowIndexStride);
-        if (rowGroupId != 0) {
-          seekToRowGroup(rowGroupId);
-        }
+      if (rowIndexes.empty()) {
+        loadStripeIndex();
       }
+      seekToRowGroup(static_cast<uint32_t>(rowsToSkip / rowIndexStride));

Review Comment:
   If somehow it fails to parse the row index or finds some missing columns in the call of loadStripeIndex(), the seekToRowGroup here cannot proceed. Maybe add a TODO here to mark a future improvement.



##########
c++/src/Reader.hh:
##########
@@ -146,6 +146,7 @@ namespace orc {
     uint64_t firstStripe;
     uint64_t currentStripe;
     uint64_t lastStripe; // the stripe AFTER the last one
+    uint64_t currentInitedStripe;

Review Comment:
   The naming is a little bit casual, maybe processingStripe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r876732619


##########
c++/include/orc/MemoryPool.hh:
##########
@@ -72,6 +72,10 @@ namespace orc {
       return currentCapacity;
     }
 
+    const T& operator[](uint64_t i) const {
+      return buf[i];
+    }
+

Review Comment:
   Test Environment: x86_64, gcc version 4.8.5
   The test results were 7.03 seconds and 3.82 seconds respectively. @dongjoon-hyun 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r877680634


##########
c++/include/orc/MemoryPool.hh:
##########
@@ -72,6 +72,10 @@ namespace orc {
       return currentCapacity;
     }
 
+    const T& operator[](uint64_t i) const {
+      return buf[i];
+    }
+

Review Comment:
   I retested it with gcc 9.2.1 and the results were similar. And both tests were on Linux OS x86_64.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r875444629


##########
c++/include/orc/MemoryPool.hh:
##########
@@ -72,6 +72,10 @@ namespace orc {
       return currentCapacity;
     }
 
+    const T& operator[](uint64_t i) const {
+      return buf[i];
+    }
+

Review Comment:
   I write a simple demo for testing the implement overload that use const function to  implement non-const function. 
   The complex implement(return const_cast<T&>(static_cast<const DataBuffer<T>&>(*this)[i])) will cost twice the overhead of the simple implement(return buf[i]).
   ![simple_demo](https://user-images.githubusercontent.com/13637742/168953847-3beb0994-702b-4239-81da-9512c7d56dff.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r876668302


##########
c++/src/Reader.cc:
##########
@@ -391,27 +392,39 @@ namespace orc {
       return;
     }
 
-    currentStripe = seekToStripe;
-    currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
     previousRow = rowNumber;
-    startNextStripe();
+    auto rowIndexStride = footer->rowindexstride();
+    if (!isCurrentStripeInited()
+        || currentStripe != seekToStripe
+        || rowIndexStride == 0
+        || currentStripeInfo.indexlength() == 0) {
+      // current stripe is not initialized or
+      // target stripe is not current stripe or
+      // current stripe doesn't have row indexes
+      currentStripe = seekToStripe;
+      currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
+      startNextStripe();
+      if (currentStripe >= lastStripe) {
+        return;
+      }
+    } else {
+      currentRowInStripe = rowNumber - firstRowOfStripe[currentStripe];
+      if (sargsApplier) {
+        // advance to selected row group if predicate pushdown is enabled
+        currentRowInStripe = advanceToNextRowGroup(currentRowInStripe,
+                                                   rowsInCurrentStripe,
+                                                   footer->rowindexstride(),
+                                                   sargsApplier->getNextSkippedRows());
+      }
+    }
 
     uint64_t rowsToSkip = currentRowInStripe;
-    auto rowIndexStride = footer->rowindexstride();
     // seek to the target row group if row indexes exists
     if (rowIndexStride > 0 && currentStripeInfo.indexlength() > 0) {
-      // when predicate push down is enabled, above call to startNextStripe()
-      // will move current row to 1st matching row group; here we only need
-      // to deal with the case when PPD is not enabled.
-      if (!sargsApplier) {
-        if (rowIndexes.empty()) {
-          loadStripeIndex();
-        }
-        auto rowGroupId = static_cast<uint32_t>(rowsToSkip / rowIndexStride);
-        if (rowGroupId != 0) {
-          seekToRowGroup(rowGroupId);
-        }
+      if (rowIndexes.empty()) {
+        loadStripeIndex();
       }
+      seekToRowGroup(static_cast<uint32_t>(rowsToSkip / rowIndexStride));

Review Comment:
   nit. 
   For `TODO` comment, an IDed TODO style is recommended in order not to forget about it.
   It means to file an official ORC JIRA and use `TODO(ORC-XXX)` style.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun closed pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function in the case of seeking within the same stripe
URL: https://github.com/apache/orc/pull/1114


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r871276642


##########
c++/src/Reader.hh:
##########
@@ -187,6 +188,16 @@ namespace orc {
     friend class TestRowReader_advanceToNextRowGroup_Test;
     friend class TestRowReader_computeBatchSize_Test;
 
+    inline uint64_t lastRowOfCurrentStripe() const {
+      return currentStripe < lastStripe ?
+        firstRowOfStripe[currentStripe] + rowsInCurrentStripe :
+        footer->numberofrows();
+    }
+
+    inline bool isCurrentStripeInited() const {

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r871276433


##########
c++/src/Reader.hh:
##########
@@ -187,6 +188,16 @@ namespace orc {
     friend class TestRowReader_advanceToNextRowGroup_Test;
     friend class TestRowReader_computeBatchSize_Test;
 
+    inline uint64_t lastRowOfCurrentStripe() const {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1114:
URL: https://github.com/apache/orc/pull/1114#discussion_r870900068


##########
c++/include/orc/MemoryPool.hh:
##########
@@ -72,6 +72,10 @@ namespace orc {
       return currentCapacity;
     }
 
+    const T& operator[](uint64_t i) const {
+      return buf[i];
+    }
+

Review Comment:
   Does this part have a big impact?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on pull request #1114: ORC-1170:[C++] Optimize the RowReader::seekToRow function

Posted by GitBox <gi...@apache.org>.
coderex2522 commented on PR #1114:
URL: https://github.com/apache/orc/pull/1114#issuecomment-1127572066

   > LGTM. It'd be nice if we can have a more specifit commit tittle. We are optimzing the case seeking within the same stripe. Seeking within the same RowGroup is not optimized in this PR. So having a title like "[ORC-1170](https://issues.apache.org/jira/browse/ORC-1170):[C++] Optimize seeking within the same stripe" seems better to me.
   
   I think seeking within the same stripe  is inclusive of  seeking withing the same RowGroup. Seeking withing the same RowGroup will also call function startNextStripe under the original implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org