You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/07/20 15:30:01 UTC

[GitHub] [orc] belugabehr opened a new pull request #756: ORC-854: Optimize ReadFully for Full Reads

belugabehr opened a new pull request #756:
URL: https://github.com/apache/orc/pull/756


   ### What changes were proposed in this pull request?
   Optimize ORC's readFully method for a condition where the buffer is always fully read.
   
   
   ### Why are the changes needed?
   Better performance reading Float/Long/Double values.
   
   
   ### How was this patch tested?
   No change in functionality. Use existing unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac merged pull request #756: ORC-854: Optimize ReadFully for Full Reads

Posted by GitBox <gi...@apache.org>.
wgtmac merged pull request #756:
URL: https://github.com/apache/orc/pull/756


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] kbendick commented on pull request #756: ORC-854: Optimize ReadFully for Full Reads

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #756:
URL: https://github.com/apache/orc/pull/756#issuecomment-886999110


   > > Are we sure that the buffers / InputStreams passed to readFully do not contain data after their usage?
   > 
   > In this context, the "readFully" refers to the the buffer that is passed in, not that the `InputStream` itself is read fully. That is, the buffer is fully populated (unless EOF is seen first). The typical `read` function of `InputStream` can return any number of bytes, so you have to be able to handle that, and continuously loop to extract enough data to fully fill the buffer.
   
   Ah I misspoke. I was indeed referring to the passed in buffers. That's what I get for reviewing code really late at night.
   
   Looking at it with fresher eyes, it does in fact look like there is no behavioral change between the two. My actual concern was that the buffers passed in were not necessarily intended to be read from the beggining / their current location / mark (and hence the offset parameter). That was my actual concern. But seeing where the code had to be changed, it doesn't appear as though anywhere it's used the offset is anything but zero, so this is functionally the same.
   
   Thanks @belugabehr!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on pull request #756: ORC-854: Optimize ReadFully for Full Reads

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #756:
URL: https://github.com/apache/orc/pull/756#issuecomment-886682195


   > Also, since we've been adding a lot of smaller optimizations recently (which can indeed have a big effect in some cases), would it be a good idea to establish something like JMH benchmarks or something to ensure that the optimizations make sense? I know JMH isn't perfect, and many of the optimization PRs have been visually apparent to be optimizing, but throwing that out there to see what others think.
   
   I think something like ORC should definitely have that, and looking at the code, there might even be some JMH tests in there, but we would need additional documentation on how to do that.
   
   For now, I am using the performance benchmarks currently available, using simple timers and/or examining under a profiler.  I've come up with quite a few ideas for PRs, but not all of them have shown much perf improvement and I don't bother submitting them. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on pull request #756: ORC-854: Optimize ReadFully for Full Reads

Posted by GitBox <gi...@apache.org>.
wgtmac commented on pull request #756:
URL: https://github.com/apache/orc/pull/756#issuecomment-889979786


   > @kbendick @pgaref Good to merge? :)
   
   I have merged it. Thanks @belugabehr for contribution and @kbendick @pgaref for review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on pull request #756: ORC-854: Optimize ReadFully for Full Reads

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #756:
URL: https://github.com/apache/orc/pull/756#issuecomment-889175828


   @kbendick @pgaref Good to merge? :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org