You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/01/26 19:18:29 UTC

[GitHub] [orc] pavibhai opened a new pull request #636: ORC-743: Convert SearchArguments to Filter to take advantage of LazyIO

pavibhai opened a new pull request #636:
URL: https://github.com/apache/orc/pull/636


   ### What changes were proposed in this pull request?
   Added conversion of SArg into filters to take advantage of the LazyIO introduced by ORC-742
       * Created Vector filters for leaf, And, Or, Batch
       * Code generation for data type and operator filters
       * Test code generation for data type and operator filters
       * Benchmark tests for the filters in the bench module
   
   ### Why are the changes needed?
   With ORC-742, FOLLOW (non-filter) columns are evaluated lazily. Spark and Hive are already passing down SearchArguments to ORC. This change allows the conversion of the SearchArguments to a filter to gain the improvements of ORC-742.
   
   ### How was this patch tested?
   * Unit tests were added
   * Performance tests were added to the Bench module
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pavibhai closed pull request #636: ORC-743: Convert SearchArguments to Filter to take advantage of LazyIO

Posted by GitBox <gi...@apache.org>.
pavibhai closed pull request #636:
URL: https://github.com/apache/orc/pull/636


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #636: ORC-743: Convert SearchArguments to Filter to take advantage of LazyIO

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #636:
URL: https://github.com/apache/orc/pull/636#issuecomment-769554214


   Yes, let's spin off `benchmark` part. We can proceed that in a separate ORC JIRA later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pavibhai commented on pull request #636: ORC-743: Convert SearchArguments to Filter to take advantage of LazyIO

Posted by GitBox <gi...@apache.org>.
pavibhai commented on pull request #636:
URL: https://github.com/apache/orc/pull/636#issuecomment-768515416


   > Could you resolve the conflicts by rebasing to the master branch?
   
   done


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #636: ORC-743: Convert SearchArguments to Filter to take advantage of LazyIO

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #636:
URL: https://github.com/apache/orc/pull/636#issuecomment-768459260


   Could you resolve the conflicts by rebasing to the master branch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pavibhai commented on pull request #636: ORC-743: Convert SearchArguments to Filter to take advantage of LazyIO

Posted by GitBox <gi...@apache.org>.
pavibhai commented on pull request #636:
URL: https://github.com/apache/orc/pull/636#issuecomment-769049861


   > In general, this patch looks too risky due to the massive size, `+9,192 −669`.
   
   I agree that this is a large patch, to give a little more details, this is what the distribution of changes looks like:
   
   Percentage|Directory
   -----------|----------
   16.3% | java/bench/core/src/java/org/apache/orc/bench/core/filter/
   6.2% | java/bench/core/src/test/org/apache/orc/bench/core/filter/
   4.8% | java/bench/
   12.9% | java/core/src/gen/filters/
   6.7% | java/core/src/java/org/apache/orc/filter/impl/
   3.8% | java/core/src/java/org/apache/orc/filter/
   6.2% | java/core/src/java/org/apache/orc/util/
   16.1% | java/core/src/test/org/apache/orc/filter/impl/
   8.0% | java/core/src/test/org/apache/orc/
   7.9% | java/gen/src/main/java/org/apache/orc/gen/
   4.5% | java/mapreduce/src/test/org/apache/orc/mapreduce/
   6.1% | java/
   
   * 27.3% of the lines changes are in the bench module
   * 28.6% of the remaining changes are in tests, primarily from addition of new tests for evaluating the SArg conversions
   
   I am open to suggestions on better to submit this change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pavibhai edited a comment on pull request #636: ORC-743: Convert SearchArguments to Filter to take advantage of LazyIO

Posted by GitBox <gi...@apache.org>.
pavibhai edited a comment on pull request #636:
URL: https://github.com/apache/orc/pull/636#issuecomment-768515416


   > Could you resolve the conflicts by rebasing to the master branch?
   
   sure, done


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pavibhai commented on pull request #636: ORC-743: Convert SearchArguments to Filter to take advantage of LazyIO

Posted by GitBox <gi...@apache.org>.
pavibhai commented on pull request #636:
URL: https://github.com/apache/orc/pull/636#issuecomment-859965830


   Closing this request and opening a new one with the latest changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org