You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Steve M. Kim (Jira)" <ji...@apache.org> on 2021/01/09 03:51:00 UTC

[jira] [Comment Edited] (ARROW-6720) [JAVA][C++]Support Parquet Read and Write in Java

    [ https://issues.apache.org/jira/browse/ARROW-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261745#comment-17261745 ] 

Steve M. Kim edited comment on ARROW-6720 at 1/9/21, 3:50 AM:
--------------------------------------------------------------

I think that this proposed feature interacts with
 * ARROW-7272
 * ARROW-7808

Once we have the ability to view the same Arrow buffers as a RecordBatch/Table across both Java and C+, we just need to provide Java methods that invoke the Parquet reader and writer functionality in C+.


was (Author: chairmank):
I think that this proposed feature interacts with
 * ARROW-7272
 * ARROW-7808

Once we have the ability to view the same Arrow buffers as a RecordBatch/Table across both Java and C++, we just need to provide Java methods that invoke the Parquet reader and writer functionality in C++.

> [JAVA][C++]Support Parquet Read and Write in Java
> -------------------------------------------------
>
>                 Key: ARROW-6720
>                 URL: https://issues.apache.org/jira/browse/ARROW-6720
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Java
>    Affects Versions: 0.15.0
>            Reporter: Chendi.Xue
>            Assignee: Chendi.Xue
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.0.0
>
>          Time Spent: 38.5h
>  Remaining Estimate: 0h
>
> We added a new java interface to support parquet read and write from hdfs or local file.
> The purpose of this implementation is that when we loading and dumping parquet data in Java, we can only use rowBased put and get methods. Since arrow already has C++ implementation to load and dump parquet, so we wrapped those codes as Java APIs.
> After test, we noticed in our workload, performance improved more than 2x comparing with rowBased load and dump. So we want to contribute codes to arrow.
> since this is a total independent change, there is no codes change to current arrow codes. We added two folders as listed:  java/adapter/parquet and cpp/src/jni/parquet



--
This message was sent by Atlassian Jira
(v8.3.4#803005)