You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/02/01 07:29:48 UTC

[GitHub] [parquet-mr] wgtmac commented on pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

wgtmac commented on PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#issuecomment-1411584288

   > > > Sorry for the delay. I have left some comments and the implementation is overall looking good. Thanks @jiangjiguang for your effort!
   > > > My main concern is the extensibility to support other instruction sets. In addition, it seems to me that the java vector api is still incubating. As I am not a java expert, do we have the risk of unstable API?
   > > 
   > > 
   > > @wgtmac Jatin is a java expert, @jatin-bhateja Can you help give an answer? thanks.
   > 
   > Hi @wgtmac , our patch vectorizes unpacking algorithm for various decode bit sizes, entire new functionality is exposed through a plugin interface **ParquetReadRouter**, in order to prevent any performance regressions over other targets we have enabled the new functionality only for X86 targets with valid features, this limitation can be removed over time.
   > 
   > VectorAPI made its appearance in JDK16 and has been maturing since then with each successive release. I do not have a firm timeline for you at this point on its incubation exit and being exposed as a preview feature. Intent here is to enable parquet-mr community developers to make use of the plugin in parquet reader and provide us with early feedback, we are also in process of vectorizing packer algorithm.
   > 
   > Being a large project we plan to do this incrementally, we seek your guidance in pushing this patch through either on master or a separate development branch.
   
   Thanks for your explanation @jatin-bhateja! 
   
   So when vector API is finalized in the future java release, we may need to change the VM options to enable it accordingly.
   
   BTW, I may not be able to verify the generated code line by line. Please advice the best practice to test them according to your experience. Thanks @jatin-bhateja  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org