You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Dongjoon Hyun <do...@gmail.com> on 2021/09/28 04:36:01 UTC

Apache ORC 1.7.0 Adoption Status

Hi, All.

The following is the Apache ORC 1.7.0 release and adoption status (as of
today).

2021-09-15: Apache ORC 1.7.0 is released
2021-09-20: Apache Spark (dongjoon,
https://github.com/apache/spark/pull/34045)
2021-09-20: Apache Iceberg (william,
https://github.com/apache/iceberg/pull/3160)
2021-09-21: Apache Arrow (william,
https://github.com/apache/arrow/pull/11194)
2021-09-27: Apache Druid (william,
https://github.com/apache/druid/pull/11726)
ON-GOING  : Apache Hive (william/pgaref,
https://github.com/apache/hive/pull/2615)
FAILED    : Apache Flink (dongjoon,
https://github.com/apache/flink/pull/16644)
            Flink has an old fork of `PhysicalWriterImpl` based on Apache
ORC 1.5.6.

Thank you all for your efforts!

Dongjoon

Re: Apache ORC 1.7.0 Adoption Status

Posted by David <da...@gmail.com>.
Hello,

Thanks for the clarification, sorry for the confusion and SPAM.

I did indeed believe that these changes went out in ORC 1.7.0.

I'll check in again for ORC 1.8.0 :)

Thanks,
David

On Mon, Oct 4, 2021 at 10:42 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> It seems that you are confused about Apache ORC 1.7.0 release.
>
> None of the above JIRAs (except ORC-848) are part of Apache ORC 1.7.0.
>
> The `main` branch is still under active development and any patch could be
> reverted before Apache ORC 1.8.0 release.
>
> For ORC-848, it's a single patch. It would be appreciated if you could show
> us your contribution by the micro-benchmark.
>
> Thanks,
> Dongjoon
>
>
> On Mon, Oct 4, 2021 at 5:46 PM David <da...@gmail.com> wrote:
>
> > Hello,
> >
> > I do not know that there there is any one JIRA/PR that would show
> > remarkable results, but perhaps the collection would:
> >
> > ORC-829, ORC-831, ORC-830, ORC-842, ORC-854, ORC-853, ORC-852, ORC-848,
> > ORC-847, ORC-837, ORC-836, ORC-835, ORC-834,
> >
> > Thanks,
> > David
> >
> > On Mon, Oct 4, 2021 at 7:43 PM Dongjoon Hyun <do...@gmail.com>
> > wrote:
> >
> > > Which JIRAs do you mean specifically in Apache ORC 1.7.0?
> > >
> > > Could you elaborate your contributions and improvement specifically in
> > > Apache ORC 1.7?
> > >
> > > I believe you are the best person to share it with us. :)
> > >
> > > Best,
> > > Dongjoon.
> > >
> > > On Mon, Oct 4, 2021 at 1:36 PM David <da...@gmail.com> wrote:
> > >
> > > > Hello Dongjoon,
> > > >
> > > > It is I "belugabehr" !!! :) a.k.a dmollitor@apache.org
> > > >
> > > > I made quite a few optimizations that did not change the API.  Just
> > > curious
> > > > to see if anyone, in the "real world," derived benefit from my work.
> > > >
> > > > Thanks!
> > > >
> > > > On Mon, Oct 4, 2021 at 3:40 PM Dongjoon Hyun <
> dongjoon.hyun@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi, David.
> > > > >
> > > > > It always depends on your use case.
> > > > > Did you try the following new features?
> > > > >
> > > > > https://orc.apache.org/docs/releases.html
> > > > >
> > > > > ORC-742 LazyIO of non-filter columns
> > > > > ORC-577 Support row-level filtering
> > > > > ORC-751 Implement Predicate Pushdown in C++ Reader
> > > > > ORC-780 Support LZ4 Compression in C++ Writer
> > > > >
> > > > > Best,
> > > > > Dongjoon.
> > > > >
> > > > >
> > > > > On Mon, Oct 4, 2021 at 7:01 AM David <da...@gmail.com> wrote:
> > > > >
> > > > > > Hello Gang,
> > > > > >
> > > > > > I have invested some time in squeezing out performance for ORC
> 1.7.
> > > > > >
> > > > > > Just curious if there are any measurable improvements out there.
> > > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > On Tue, Sep 28, 2021, 12:36 AM Dongjoon Hyun <
> > > dongjoon.hyun@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi, All.
> > > > > > >
> > > > > > > The following is the Apache ORC 1.7.0 release and adoption
> status
> > > (as
> > > > > of
> > > > > > > today).
> > > > > > >
> > > > > > > 2021-09-15: Apache ORC 1.7.0 is released
> > > > > > > 2021-09-20: Apache Spark (dongjoon,
> > > > > > > https://github.com/apache/spark/pull/34045)
> > > > > > > 2021-09-20: Apache Iceberg (william,
> > > > > > > https://github.com/apache/iceberg/pull/3160)
> > > > > > > 2021-09-21: Apache Arrow (william,
> > > > > > > https://github.com/apache/arrow/pull/11194)
> > > > > > > 2021-09-27: Apache Druid (william,
> > > > > > > https://github.com/apache/druid/pull/11726)
> > > > > > > ON-GOING  : Apache Hive (william/pgaref,
> > > > > > > https://github.com/apache/hive/pull/2615)
> > > > > > > FAILED    : Apache Flink (dongjoon,
> > > > > > > https://github.com/apache/flink/pull/16644)
> > > > > > >             Flink has an old fork of `PhysicalWriterImpl` based
> > on
> > > > > Apache
> > > > > > > ORC 1.5.6.
> > > > > > >
> > > > > > > Thank you all for your efforts!
> > > > > > >
> > > > > > > Dongjoon
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Apache ORC 1.7.0 Adoption Status

Posted by Dongjoon Hyun <do...@gmail.com>.
It seems that you are confused about Apache ORC 1.7.0 release.

None of the above JIRAs (except ORC-848) are part of Apache ORC 1.7.0.

The `main` branch is still under active development and any patch could be
reverted before Apache ORC 1.8.0 release.

For ORC-848, it's a single patch. It would be appreciated if you could show
us your contribution by the micro-benchmark.

Thanks,
Dongjoon


On Mon, Oct 4, 2021 at 5:46 PM David <da...@gmail.com> wrote:

> Hello,
>
> I do not know that there there is any one JIRA/PR that would show
> remarkable results, but perhaps the collection would:
>
> ORC-829, ORC-831, ORC-830, ORC-842, ORC-854, ORC-853, ORC-852, ORC-848,
> ORC-847, ORC-837, ORC-836, ORC-835, ORC-834,
>
> Thanks,
> David
>
> On Mon, Oct 4, 2021 at 7:43 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> > Which JIRAs do you mean specifically in Apache ORC 1.7.0?
> >
> > Could you elaborate your contributions and improvement specifically in
> > Apache ORC 1.7?
> >
> > I believe you are the best person to share it with us. :)
> >
> > Best,
> > Dongjoon.
> >
> > On Mon, Oct 4, 2021 at 1:36 PM David <da...@gmail.com> wrote:
> >
> > > Hello Dongjoon,
> > >
> > > It is I "belugabehr" !!! :) a.k.a dmollitor@apache.org
> > >
> > > I made quite a few optimizations that did not change the API.  Just
> > curious
> > > to see if anyone, in the "real world," derived benefit from my work.
> > >
> > > Thanks!
> > >
> > > On Mon, Oct 4, 2021 at 3:40 PM Dongjoon Hyun <do...@gmail.com>
> > > wrote:
> > >
> > > > Hi, David.
> > > >
> > > > It always depends on your use case.
> > > > Did you try the following new features?
> > > >
> > > > https://orc.apache.org/docs/releases.html
> > > >
> > > > ORC-742 LazyIO of non-filter columns
> > > > ORC-577 Support row-level filtering
> > > > ORC-751 Implement Predicate Pushdown in C++ Reader
> > > > ORC-780 Support LZ4 Compression in C++ Writer
> > > >
> > > > Best,
> > > > Dongjoon.
> > > >
> > > >
> > > > On Mon, Oct 4, 2021 at 7:01 AM David <da...@gmail.com> wrote:
> > > >
> > > > > Hello Gang,
> > > > >
> > > > > I have invested some time in squeezing out performance for ORC 1.7.
> > > > >
> > > > > Just curious if there are any measurable improvements out there.
> > > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > On Tue, Sep 28, 2021, 12:36 AM Dongjoon Hyun <
> > dongjoon.hyun@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi, All.
> > > > > >
> > > > > > The following is the Apache ORC 1.7.0 release and adoption status
> > (as
> > > > of
> > > > > > today).
> > > > > >
> > > > > > 2021-09-15: Apache ORC 1.7.0 is released
> > > > > > 2021-09-20: Apache Spark (dongjoon,
> > > > > > https://github.com/apache/spark/pull/34045)
> > > > > > 2021-09-20: Apache Iceberg (william,
> > > > > > https://github.com/apache/iceberg/pull/3160)
> > > > > > 2021-09-21: Apache Arrow (william,
> > > > > > https://github.com/apache/arrow/pull/11194)
> > > > > > 2021-09-27: Apache Druid (william,
> > > > > > https://github.com/apache/druid/pull/11726)
> > > > > > ON-GOING  : Apache Hive (william/pgaref,
> > > > > > https://github.com/apache/hive/pull/2615)
> > > > > > FAILED    : Apache Flink (dongjoon,
> > > > > > https://github.com/apache/flink/pull/16644)
> > > > > >             Flink has an old fork of `PhysicalWriterImpl` based
> on
> > > > Apache
> > > > > > ORC 1.5.6.
> > > > > >
> > > > > > Thank you all for your efforts!
> > > > > >
> > > > > > Dongjoon
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Apache ORC 1.7.0 Adoption Status

Posted by David <da...@gmail.com>.
Hello,

I do not know that there there is any one JIRA/PR that would show
remarkable results, but perhaps the collection would:

ORC-829, ORC-831, ORC-830, ORC-842, ORC-854, ORC-853, ORC-852, ORC-848,
ORC-847, ORC-837, ORC-836, ORC-835, ORC-834,

Thanks,
David

On Mon, Oct 4, 2021 at 7:43 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> Which JIRAs do you mean specifically in Apache ORC 1.7.0?
>
> Could you elaborate your contributions and improvement specifically in
> Apache ORC 1.7?
>
> I believe you are the best person to share it with us. :)
>
> Best,
> Dongjoon.
>
> On Mon, Oct 4, 2021 at 1:36 PM David <da...@gmail.com> wrote:
>
> > Hello Dongjoon,
> >
> > It is I "belugabehr" !!! :) a.k.a dmollitor@apache.org
> >
> > I made quite a few optimizations that did not change the API.  Just
> curious
> > to see if anyone, in the "real world," derived benefit from my work.
> >
> > Thanks!
> >
> > On Mon, Oct 4, 2021 at 3:40 PM Dongjoon Hyun <do...@gmail.com>
> > wrote:
> >
> > > Hi, David.
> > >
> > > It always depends on your use case.
> > > Did you try the following new features?
> > >
> > > https://orc.apache.org/docs/releases.html
> > >
> > > ORC-742 LazyIO of non-filter columns
> > > ORC-577 Support row-level filtering
> > > ORC-751 Implement Predicate Pushdown in C++ Reader
> > > ORC-780 Support LZ4 Compression in C++ Writer
> > >
> > > Best,
> > > Dongjoon.
> > >
> > >
> > > On Mon, Oct 4, 2021 at 7:01 AM David <da...@gmail.com> wrote:
> > >
> > > > Hello Gang,
> > > >
> > > > I have invested some time in squeezing out performance for ORC 1.7.
> > > >
> > > > Just curious if there are any measurable improvements out there.
> > > >
> > > >
> > > > Thanks.
> > > >
> > > > On Tue, Sep 28, 2021, 12:36 AM Dongjoon Hyun <
> dongjoon.hyun@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi, All.
> > > > >
> > > > > The following is the Apache ORC 1.7.0 release and adoption status
> (as
> > > of
> > > > > today).
> > > > >
> > > > > 2021-09-15: Apache ORC 1.7.0 is released
> > > > > 2021-09-20: Apache Spark (dongjoon,
> > > > > https://github.com/apache/spark/pull/34045)
> > > > > 2021-09-20: Apache Iceberg (william,
> > > > > https://github.com/apache/iceberg/pull/3160)
> > > > > 2021-09-21: Apache Arrow (william,
> > > > > https://github.com/apache/arrow/pull/11194)
> > > > > 2021-09-27: Apache Druid (william,
> > > > > https://github.com/apache/druid/pull/11726)
> > > > > ON-GOING  : Apache Hive (william/pgaref,
> > > > > https://github.com/apache/hive/pull/2615)
> > > > > FAILED    : Apache Flink (dongjoon,
> > > > > https://github.com/apache/flink/pull/16644)
> > > > >             Flink has an old fork of `PhysicalWriterImpl` based on
> > > Apache
> > > > > ORC 1.5.6.
> > > > >
> > > > > Thank you all for your efforts!
> > > > >
> > > > > Dongjoon
> > > > >
> > > >
> > >
> >
>

Re: Apache ORC 1.7.0 Adoption Status

Posted by Dongjoon Hyun <do...@gmail.com>.
Which JIRAs do you mean specifically in Apache ORC 1.7.0?

Could you elaborate your contributions and improvement specifically in
Apache ORC 1.7?

I believe you are the best person to share it with us. :)

Best,
Dongjoon.

On Mon, Oct 4, 2021 at 1:36 PM David <da...@gmail.com> wrote:

> Hello Dongjoon,
>
> It is I "belugabehr" !!! :) a.k.a dmollitor@apache.org
>
> I made quite a few optimizations that did not change the API.  Just curious
> to see if anyone, in the "real world," derived benefit from my work.
>
> Thanks!
>
> On Mon, Oct 4, 2021 at 3:40 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> > Hi, David.
> >
> > It always depends on your use case.
> > Did you try the following new features?
> >
> > https://orc.apache.org/docs/releases.html
> >
> > ORC-742 LazyIO of non-filter columns
> > ORC-577 Support row-level filtering
> > ORC-751 Implement Predicate Pushdown in C++ Reader
> > ORC-780 Support LZ4 Compression in C++ Writer
> >
> > Best,
> > Dongjoon.
> >
> >
> > On Mon, Oct 4, 2021 at 7:01 AM David <da...@gmail.com> wrote:
> >
> > > Hello Gang,
> > >
> > > I have invested some time in squeezing out performance for ORC 1.7.
> > >
> > > Just curious if there are any measurable improvements out there.
> > >
> > >
> > > Thanks.
> > >
> > > On Tue, Sep 28, 2021, 12:36 AM Dongjoon Hyun <do...@gmail.com>
> > > wrote:
> > >
> > > > Hi, All.
> > > >
> > > > The following is the Apache ORC 1.7.0 release and adoption status (as
> > of
> > > > today).
> > > >
> > > > 2021-09-15: Apache ORC 1.7.0 is released
> > > > 2021-09-20: Apache Spark (dongjoon,
> > > > https://github.com/apache/spark/pull/34045)
> > > > 2021-09-20: Apache Iceberg (william,
> > > > https://github.com/apache/iceberg/pull/3160)
> > > > 2021-09-21: Apache Arrow (william,
> > > > https://github.com/apache/arrow/pull/11194)
> > > > 2021-09-27: Apache Druid (william,
> > > > https://github.com/apache/druid/pull/11726)
> > > > ON-GOING  : Apache Hive (william/pgaref,
> > > > https://github.com/apache/hive/pull/2615)
> > > > FAILED    : Apache Flink (dongjoon,
> > > > https://github.com/apache/flink/pull/16644)
> > > >             Flink has an old fork of `PhysicalWriterImpl` based on
> > Apache
> > > > ORC 1.5.6.
> > > >
> > > > Thank you all for your efforts!
> > > >
> > > > Dongjoon
> > > >
> > >
> >
>

Re: Apache ORC 1.7.0 Adoption Status

Posted by David <da...@gmail.com>.
Hello Dongjoon,

It is I "belugabehr" !!! :) a.k.a dmollitor@apache.org

I made quite a few optimizations that did not change the API.  Just curious
to see if anyone, in the "real world," derived benefit from my work.

Thanks!

On Mon, Oct 4, 2021 at 3:40 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> Hi, David.
>
> It always depends on your use case.
> Did you try the following new features?
>
> https://orc.apache.org/docs/releases.html
>
> ORC-742 LazyIO of non-filter columns
> ORC-577 Support row-level filtering
> ORC-751 Implement Predicate Pushdown in C++ Reader
> ORC-780 Support LZ4 Compression in C++ Writer
>
> Best,
> Dongjoon.
>
>
> On Mon, Oct 4, 2021 at 7:01 AM David <da...@gmail.com> wrote:
>
> > Hello Gang,
> >
> > I have invested some time in squeezing out performance for ORC 1.7.
> >
> > Just curious if there are any measurable improvements out there.
> >
> >
> > Thanks.
> >
> > On Tue, Sep 28, 2021, 12:36 AM Dongjoon Hyun <do...@gmail.com>
> > wrote:
> >
> > > Hi, All.
> > >
> > > The following is the Apache ORC 1.7.0 release and adoption status (as
> of
> > > today).
> > >
> > > 2021-09-15: Apache ORC 1.7.0 is released
> > > 2021-09-20: Apache Spark (dongjoon,
> > > https://github.com/apache/spark/pull/34045)
> > > 2021-09-20: Apache Iceberg (william,
> > > https://github.com/apache/iceberg/pull/3160)
> > > 2021-09-21: Apache Arrow (william,
> > > https://github.com/apache/arrow/pull/11194)
> > > 2021-09-27: Apache Druid (william,
> > > https://github.com/apache/druid/pull/11726)
> > > ON-GOING  : Apache Hive (william/pgaref,
> > > https://github.com/apache/hive/pull/2615)
> > > FAILED    : Apache Flink (dongjoon,
> > > https://github.com/apache/flink/pull/16644)
> > >             Flink has an old fork of `PhysicalWriterImpl` based on
> Apache
> > > ORC 1.5.6.
> > >
> > > Thank you all for your efforts!
> > >
> > > Dongjoon
> > >
> >
>

Re: Apache ORC 1.7.0 Adoption Status

Posted by Dongjoon Hyun <do...@gmail.com>.
Hi, David.

It always depends on your use case.
Did you try the following new features?

https://orc.apache.org/docs/releases.html

ORC-742 LazyIO of non-filter columns
ORC-577 Support row-level filtering
ORC-751 Implement Predicate Pushdown in C++ Reader
ORC-780 Support LZ4 Compression in C++ Writer

Best,
Dongjoon.


On Mon, Oct 4, 2021 at 7:01 AM David <da...@gmail.com> wrote:

> Hello Gang,
>
> I have invested some time in squeezing out performance for ORC 1.7.
>
> Just curious if there are any measurable improvements out there.
>
>
> Thanks.
>
> On Tue, Sep 28, 2021, 12:36 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> > Hi, All.
> >
> > The following is the Apache ORC 1.7.0 release and adoption status (as of
> > today).
> >
> > 2021-09-15: Apache ORC 1.7.0 is released
> > 2021-09-20: Apache Spark (dongjoon,
> > https://github.com/apache/spark/pull/34045)
> > 2021-09-20: Apache Iceberg (william,
> > https://github.com/apache/iceberg/pull/3160)
> > 2021-09-21: Apache Arrow (william,
> > https://github.com/apache/arrow/pull/11194)
> > 2021-09-27: Apache Druid (william,
> > https://github.com/apache/druid/pull/11726)
> > ON-GOING  : Apache Hive (william/pgaref,
> > https://github.com/apache/hive/pull/2615)
> > FAILED    : Apache Flink (dongjoon,
> > https://github.com/apache/flink/pull/16644)
> >             Flink has an old fork of `PhysicalWriterImpl` based on Apache
> > ORC 1.5.6.
> >
> > Thank you all for your efforts!
> >
> > Dongjoon
> >
>

Re: Apache ORC 1.7.0 Adoption Status

Posted by David <da...@gmail.com>.
Hello Gang,

I have invested some time in squeezing out performance for ORC 1.7.

Just curious if there are any measurable improvements out there.


Thanks.

On Tue, Sep 28, 2021, 12:36 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Hi, All.
>
> The following is the Apache ORC 1.7.0 release and adoption status (as of
> today).
>
> 2021-09-15: Apache ORC 1.7.0 is released
> 2021-09-20: Apache Spark (dongjoon,
> https://github.com/apache/spark/pull/34045)
> 2021-09-20: Apache Iceberg (william,
> https://github.com/apache/iceberg/pull/3160)
> 2021-09-21: Apache Arrow (william,
> https://github.com/apache/arrow/pull/11194)
> 2021-09-27: Apache Druid (william,
> https://github.com/apache/druid/pull/11726)
> ON-GOING  : Apache Hive (william/pgaref,
> https://github.com/apache/hive/pull/2615)
> FAILED    : Apache Flink (dongjoon,
> https://github.com/apache/flink/pull/16644)
>             Flink has an old fork of `PhysicalWriterImpl` based on Apache
> ORC 1.5.6.
>
> Thank you all for your efforts!
>
> Dongjoon
>