You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/06/06 22:16:00 UTC

[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

    [ https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729858#comment-17729858 ] 

ASF GitHub Bot commented on PARQUET-2171:
-----------------------------------------

mukund-thakur commented on PR #1103:
URL: https://github.com/apache/parquet-mr/pull/1103#issuecomment-1579527132

   > If I did an initial alpha asf release of https://github.com/steveloughran/fs-api-shim, what would it take to move this implementation to it?
   
   I think we can include the shim as maven dependency and make the changes here parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/ParquetVectoredIOUtil.java . once you release I can try that asap as a POC.  
   




> Implement vectored IO in parquet file format
> --------------------------------------------
>
>                 Key: PARQUET-2171
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2171
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Mukund Thakur
>            Priority: Major
>
> We recently added a new feature called vectored IO in Hadoop for improving read performance for seek heavy readers. Spark Jobs and others which uses parquet will greatly benefit from this api. Details can be found here 
> [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5]
> https://issues.apache.org/jira/browse/HADOOP-18103
> https://issues.apache.org/jira/browse/HADOOP-11867



--
This message was sent by Atlassian Jira
(v8.20.10#820010)