You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Chang Chen <ba...@gmail.com> on 2019/11/14 02:04:52 UTC

Parquet cpp status

Hi

I am trying to find doc about current parquet-cpp current status.  i
googled it, but i didn't find any useful information.

here are what i concerned about:
#1  column indexes (https://issues.apache.org/jira/browse/PARQUET-1201),
the corresponding java implementation already supported it last year,
though it wasn't pushed to repo.
#2  A vectorized column reader interface which can be integrated in JAVA.
#3 the feature was illustrated here(
https://www.dremio.com/webinars/columnar-roadmap-apache-parquet-and-arrow/),
a better predict push down algorithm.

Thanks

Re: Parquet cpp status

Posted by Francois Saint-Jacques <fs...@gmail.com>.
The parquet c++ implementation has all the facilities to expose the
required information to implement predicate pushdown. The experimental
Dataset API does make use of this with parquet. See [1] for an example
of the API. Or a real-life usage with the nyc-tlc taxi dataset [2].
The relevant implementation that takes care of pushdown predicate is
found in [3].

[1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/dataset_test.cc#L289-L409
[2] https://github.com/apache/arrow/blob/master/cpp/examples/arrow/dataset-parquet-scan-example.cc
[3] https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/file_parquet.cc

On Fri, Nov 15, 2019 at 1:08 AM Micah Kornfield <em...@gmail.com> wrote:
>
> #1 if there isn't a JIRA I would guess no-one is working on it (Note I
> would expect at least the initial work to be in aParquet JIRA item, and
> this is probably a discussion for that mailing list).
> #2. There are some open PR to expose the parquet reader through JNI to java
> [1]
> #3. Its possible Dremio has some code that does this.   I'm not sure what
> the current status of predicate pushdown in the C++ code base is.
>
>
> [1] https://github.com/apache/arrow/pull/5719
>
>
> On Wed, Nov 13, 2019 at 6:05 PM Chang Chen <ba...@gmail.com> wrote:
>
> > Hi
> >
> > I am trying to find doc about current parquet-cpp current status.  i
> > googled it, but i didn't find any useful information.
> >
> > here are what i concerned about:
> > #1  column indexes (https://issues.apache.org/jira/browse/PARQUET-1201),
> > the corresponding java implementation already supported it last year,
> > though it wasn't pushed to repo.
> > #2  A vectorized column reader interface which can be integrated in JAVA.
> > #3 the feature was illustrated here(
> > https://www.dremio.com/webinars/columnar-roadmap-apache-parquet-and-arrow/
> > ),
> > a better predict push down algorithm.
> >
> > Thanks
> >

Re: Parquet cpp status

Posted by Micah Kornfield <em...@gmail.com>.
#1 if there isn't a JIRA I would guess no-one is working on it (Note I
would expect at least the initial work to be in aParquet JIRA item, and
this is probably a discussion for that mailing list).
#2. There are some open PR to expose the parquet reader through JNI to java
[1]
#3. Its possible Dremio has some code that does this.   I'm not sure what
the current status of predicate pushdown in the C++ code base is.


[1] https://github.com/apache/arrow/pull/5719


On Wed, Nov 13, 2019 at 6:05 PM Chang Chen <ba...@gmail.com> wrote:

> Hi
>
> I am trying to find doc about current parquet-cpp current status.  i
> googled it, but i didn't find any useful information.
>
> here are what i concerned about:
> #1  column indexes (https://issues.apache.org/jira/browse/PARQUET-1201),
> the corresponding java implementation already supported it last year,
> though it wasn't pushed to repo.
> #2  A vectorized column reader interface which can be integrated in JAVA.
> #3 the feature was illustrated here(
> https://www.dremio.com/webinars/columnar-roadmap-apache-parquet-and-arrow/
> ),
> a better predict push down algorithm.
>
> Thanks
>