You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/09/07 23:26:48 UTC

[GitHub] [orc] rip-nsk opened a new pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

rip-nsk opened a new pull request #903:
URL: https://github.com/apache/orc/pull/903


   ### What changes were proposed in this pull request?
   ORC-990: [C++] fix RowReaderImpl::seekToRowGroup
   
   ### Why are the changes needed?
   Reallocation of positions vector may invalidate contained iterators.
   
   ### How was this patch tested?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] rip-nsk commented on a change in pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
rip-nsk commented on a change in pull request #903:
URL: https://github.com/apache/orc/pull/903#discussion_r704870004



##########
File path: c++/src/Reader.cc
##########
@@ -385,7 +385,7 @@ namespace orc {
 
   void RowReaderImpl::seekToRowGroup(uint32_t rowGroupEntryId) {
     // store positions for selected columns
-    std::vector<std::list<uint64_t>> positions;
+    std::list<std::list<uint64_t>> positions;

Review comment:
       according to the specs (https://en.cppreference.com/w/cpp/container#Iterator_invalidation)
   vector iterators are invalid if insertion changed capacity.
   May be initial capacity of vector changed in new version msvc.
   You probably can reproduce this issue using patch like this:
   
   ```
   --- a/c++/src/Reader.cc
   +++ b/c++/src/Reader.cc
   @@ -401,6 +401,7 @@ namespace orc {
          for (int pos = 0; pos != entry.positions_size(); ++pos) {
            position.push_back(entry.positions(pos));
          }
   +      auto size = position.size(); position.resize(100500); position.resize(size);
          positionProviders.insert(std::make_pair(colId, PositionProvider(position)));
        }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] rip-nsk commented on a change in pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
rip-nsk commented on a change in pull request #903:
URL: https://github.com/apache/orc/pull/903#discussion_r704870004



##########
File path: c++/src/Reader.cc
##########
@@ -385,7 +385,7 @@ namespace orc {
 
   void RowReaderImpl::seekToRowGroup(uint32_t rowGroupEntryId) {
     // store positions for selected columns
-    std::vector<std::list<uint64_t>> positions;
+    std::list<std::list<uint64_t>> positions;

Review comment:
       according to the specs (https://en.cppreference.com/w/cpp/container#Iterator_invalidation)
   vector iterators are invalid if insertion changed capacity.
   May be initial capacity of vector changed in new version msvc.
   You probably can reproduce this issue using the following patch:
   `
   --- a/c++/src/Reader.cc
   +++ b/c++/src/Reader.cc
   @@ -401,6 +401,7 @@ namespace orc {
          for (int pos = 0; pos != entry.positions_size(); ++pos) {
            position.push_back(entry.positions(pos));
          }
   +      auto size = position.size(); position.resize(100500); position.resize(size);
          positionProviders.insert(std::make_pair(colId, PositionProvider(position)));
        }
   `




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] rip-nsk commented on a change in pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
rip-nsk commented on a change in pull request #903:
URL: https://github.com/apache/orc/pull/903#discussion_r704870004



##########
File path: c++/src/Reader.cc
##########
@@ -385,7 +385,7 @@ namespace orc {
 
   void RowReaderImpl::seekToRowGroup(uint32_t rowGroupEntryId) {
     // store positions for selected columns
-    std::vector<std::list<uint64_t>> positions;
+    std::list<std::list<uint64_t>> positions;

Review comment:
       according to the specs (https://en.cppreference.com/w/cpp/container#Iterator_invalidation)
   vector iterators are invalid if insertion changed capacity.
   May be initial capacity of vector changed in new version msvc.
   You probably can reproduce this issue using the following patch:
   
   ```
   --- a/c++/src/Reader.cc
   +++ b/c++/src/Reader.cc
   @@ -401,6 +401,7 @@ namespace orc {
          for (int pos = 0; pos != entry.positions_size(); ++pos) {
            position.push_back(entry.positions(pos));
          }
   +      auto size = position.size(); position.resize(100500); position.resize(size);
          positionProviders.insert(std::make_pair(colId, PositionProvider(position)));
        }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #903:
URL: https://github.com/apache/orc/pull/903#discussion_r704786558



##########
File path: c++/src/Reader.cc
##########
@@ -385,7 +385,7 @@ namespace orc {
 
   void RowReaderImpl::seekToRowGroup(uint32_t rowGroupEntryId) {
     // store positions for selected columns
-    std::vector<std::list<uint64_t>> positions;
+    std::list<std::list<uint64_t>> positions;

Review comment:
       Just a question: We have a Window test coverage with AppVeyor(Windows, MSVC 19.0.24241.7) which is healthy. Do you think the failure is due to the behavior change in somewhere of MSVC 19.12.x?
   - https://ci.appveyor.com/project/ApacheSoftwareFoundation/orc/builds/40680172




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] rip-nsk commented on a change in pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
rip-nsk commented on a change in pull request #903:
URL: https://github.com/apache/orc/pull/903#discussion_r704870004



##########
File path: c++/src/Reader.cc
##########
@@ -385,7 +385,7 @@ namespace orc {
 
   void RowReaderImpl::seekToRowGroup(uint32_t rowGroupEntryId) {
     // store positions for selected columns
-    std::vector<std::list<uint64_t>> positions;
+    std::list<std::list<uint64_t>> positions;

Review comment:
       according to the specs (https://en.cppreference.com/w/cpp/container#Iterator_invalidation)
   vector iterators are invalid if insertion changed capacity.
   May be initial capacity of vector changed in new version msvc.
   You probably can reproduce this issue using the following patch:
   
   ```
   --- a/c++/src/Reader.cc
   +++ b/c++/src/Reader.cc
   @@ -401,6 +401,7 @@ namespace orc {
          for (int pos = 0; pos != entry.positions_size(); ++pos) {
            position.push_back(entry.positions(pos));
          }
   +      auto size = position.size(); position.resize(100500); position.resize(size);
          positionProviders.insert(std::make_pair(colId, PositionProvider(position)));
        }
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #903:
URL: https://github.com/apache/orc/pull/903#issuecomment-917032813


   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] rip-nsk commented on a change in pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
rip-nsk commented on a change in pull request #903:
URL: https://github.com/apache/orc/pull/903#discussion_r704063853



##########
File path: c++/src/Reader.cc
##########
@@ -385,7 +385,7 @@ namespace orc {
 
   void RowReaderImpl::seekToRowGroup(uint32_t rowGroupEntryId) {
     // store positions for selected columns
-    std::vector<std::list<uint64_t>> positions;
+    std::list<std::list<uint64_t>> positions;

Review comment:
       In my environment (Windows, MSVC 19.12.25835.0), 
   test DictionaryEncoding.multipleStripesWithIndex failed due to invalid iterators in positionProviders.
   
         // copy index positions for a specific column
         positions.push_back({});
         auto& position = positions.back();
         for (int pos = 0; pos != entry.positions_size(); ++pos) {
           position.push_back(entry.positions(pos));             // iterators can be invalidated here due to reallocation
         }
         positionProviders.insert(std::make_pair(colId, PositionProvider(position))); //    iterator taken here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #903:
URL: https://github.com/apache/orc/pull/903#discussion_r704032816



##########
File path: c++/src/Reader.cc
##########
@@ -385,7 +385,7 @@ namespace orc {
 
   void RowReaderImpl::seekToRowGroup(uint32_t rowGroupEntryId) {
     // store positions for selected columns
-    std::vector<std::list<uint64_t>> positions;
+    std::list<std::list<uint64_t>> positions;

Review comment:
       Could you elaborate a little more about your PR description?
   > Reallocation of positions vector may invalidate contained iterators.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #903:
URL: https://github.com/apache/orc/pull/903#issuecomment-915910329


   @wgtmac , please land this to the required branches for the next releases~
   
   Thank you, @rip-nsk and @wgtmac .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] rip-nsk commented on a change in pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
rip-nsk commented on a change in pull request #903:
URL: https://github.com/apache/orc/pull/903#discussion_r704063853



##########
File path: c++/src/Reader.cc
##########
@@ -385,7 +385,7 @@ namespace orc {
 
   void RowReaderImpl::seekToRowGroup(uint32_t rowGroupEntryId) {
     // store positions for selected columns
-    std::vector<std::list<uint64_t>> positions;
+    std::list<std::list<uint64_t>> positions;

Review comment:
       In my environment (Windows, MSVC 19.12.25835.0), 
   test DictionaryEncoding.multipleStripesWithIndex failed due to invalid iterators in positionProviders.
   
         // copy index positions for a specific column
         positions.push_back({});
         auto& position = positions.back();
         for (int pos = 0; pos != entry.positions_size(); ++pos) {
           position.push_back(entry.positions(pos));             // iterators may be invalidated here due to reallocation
         }
         positionProviders.insert(std::make_pair(colId, PositionProvider(position))); //    iterator taken here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #903:
URL: https://github.com/apache/orc/pull/903#discussion_r704069777



##########
File path: c++/src/Reader.cc
##########
@@ -385,7 +385,7 @@ namespace orc {
 
   void RowReaderImpl::seekToRowGroup(uint32_t rowGroupEntryId) {
     // store positions for selected columns
-    std::vector<std::list<uint64_t>> positions;
+    std::list<std::list<uint64_t>> positions;

Review comment:
       Thank you for the details. Could you add that into the PR description? It will be a permanent commit message.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac merged pull request #903: ORC-990: [C++] fix RowReaderImpl::seekToRowGroup

Posted by GitBox <gi...@apache.org>.
wgtmac merged pull request #903:
URL: https://github.com/apache/orc/pull/903


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org