You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Hongze Zhang <no...@126.com> on 2021/01/29 06:06:59 UTC

Review request for ARROW-7808's PR (Dataset Java API)

Hi All,


Sorry to send a request to all but just would like to ask if anyone could be able to help finish the review for PR#7030[1].


As of now the PR contains following parts:


1. Base dataset API for Java language (which follows the shape of C++ API)
2. A JNI-based implementation of FileSystemDataset that routes calls to C++ side FileSystemDataset internally;
3. Parquet support only now.


Not in this PR yet:


1. Filter suppport. We used to have some discussions for this, for example[2];
2. Other data format support like CSV.


New changes since the last review:


1. Use namespace arrow::dataset::jni for members in jni_wrapper.cpp to avoid name collision;
2. Moved ReservationListenableMemoryPool from memory_pool.cc to jni_wrapper.cpp, as it's the only use case so far.


By the way I'll say many thanks to those who already gave review comments on it. It's not a very small PR and your suggestions helped me a lot on making things better.


Thanks in advance!


Best,
Hongze


[1] https://github.com/apache/arrow/pull/7030
[2] https://lists.apache.org/thread.html/rb6613e1e0bd205a9e1979e99ee87a4b8ef6d2dc2e0e417c7abdea8a6%40%3Cdev.arrow.apache.org%3E

Re: Review request for ARROW-7808's PR (Dataset Java API)

Posted by Micah Kornfield <em...@gmail.com>.
I did another review mostly focused on the new allocation code. Thank you,
I think this is close to mergeable once comments are addressed.  If other
Java committers/contributors want to have a look, please do so soon.

On Thu, Jan 28, 2021 at 10:14 PM Hongze Zhang <no...@126.com> wrote:

> Hi All,
>
>
> Sorry to send a request to all but just would like to ask if anyone could
> be able to help finish the review for PR#7030[1].
>
>
> As of now the PR contains following parts:
>
>
> 1. Base dataset API for Java language (which follows the shape of C++ API)
> 2. A JNI-based implementation of FileSystemDataset that routes calls to
> C++ side FileSystemDataset internally;
> 3. Parquet support only now.
>
>
> Not in this PR yet:
>
>
> 1. Filter suppport. We used to have some discussions for this, for
> example[2];
> 2. Other data format support like CSV.
>
>
> New changes since the last review:
>
>
> 1. Use namespace arrow::dataset::jni for members in jni_wrapper.cpp to
> avoid name collision;
> 2. Moved ReservationListenableMemoryPool from memory_pool.cc to
> jni_wrapper.cpp, as it's the only use case so far.
>
>
> By the way I'll say many thanks to those who already gave review comments
> on it. It's not a very small PR and your suggestions helped me a lot on
> making things better.
>
>
> Thanks in advance!
>
>
> Best,
> Hongze
>
>
> [1] https://github.com/apache/arrow/pull/7030
> [2]
> https://lists.apache.org/thread.html/rb6613e1e0bd205a9e1979e99ee87a4b8ef6d2dc2e0e417c7abdea8a6%40%3Cdev.arrow.apache.org%3E