You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Purshotam Shah <pu...@verizonmedia.com.INVALID> on 2021/08/26 05:51:28 UTC

Re: [E] Re: Performance of hive 1.2 with ORC-1.5.x

Hi Dongjoon,

Thanks for your reply.

Yes, we are planning to build hive-1.2 with Apache ORC 1.5.12.
We realized that it's a lot of work as we have to merge multiple patches.
Hive-1.2 has been very stable for us. We are wondering if it's worth
building hive-1.2 with Apache ORC 1.5.12. We can't migrate to hive-2.x or
hive-3.x, as we have built some of our features on top of hive-1.2.

We looked at orc commit logs and didn't find much information on
performance improvement.  This is where we need some input.
Do you think that we will get some performance improvement? If yes, it will
be nice if you can share some details.

Yes, if we decided to proceed with hive-1.2 with ORC 1.5.12, we would run
some performance tests.

Thanks,


On Wed, Aug 25, 2021 at 11:08 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> What is the baseline for your comparison?
>
> FYI, the community status is like the following.
>
> - Apache Hive 1.2.2 is not using Apache ORC.
> - Apache Hive 2.3.9 is using Apache ORC 1.3.4.
> - Apache Hive 3.1.2 is using Apache ORC 1.5.6.
> - Apache Hive 4.0.0-SNAPSHOT is using Apache ORC 1.6.9.
>
> So, specifically, are you going to build from the Hive 1.2 source with
> Apache ORC 1.5.12 and compare it with Apache Hive 1.2.2?
>
> Dongjoon.
>
>
> On Tue, Aug 24, 2021 at 11:54 PM Purshotam Shah
> <pu...@verizonmedia.com.invalid> wrote:
>
> > Hi,
> >
> > We have been running hive 1.2  successfully for few years. Hive-1.2 has
> > been very stable for us.
> >
> > We are planning to migrate to apache orc-1.5.12 thinking that we might
> get
> > better performance.
> > The plan is to keep hive-1.2 and replace orc with apache orc-1.5.12.
> >
> > We looked at the orc commit logs and didn't find much information on
> > performance improvement.
> >
> > Would you mind sharing some of the performance improvements we might get
> > after upgrading to  orc-1.5.12 with hive-1.2?
> >
> > Thanks,
> >
>

Re: [E] Re: Performance of hive 1.2 with ORC-1.5.x

Posted by Purshotam Shah <pu...@verizonmedia.com.INVALID>.
Thank you, Dongjoon. This is very helpful.
For now, we will be migrating to ORC 1.5 and then upgrade to the
latest stable version later on.
Do you know if there are any backward compatibility issues with Hive 1.2
ORC and Apache ORC 1.5.x?

We should be able to rollback to Hive 1.2 ORC if there is an issue with Apache
ORC.
Hive 1.2 ORC should be able to read the ORC files created by Apache ORC.

Thanks,


On Sun, Aug 29, 2021 at 10:03 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> You will not get any improvement if you stick to the old functionality
> only.
>
> Here is Owen and my presentation in 2007.
> It's a very old one, but it seems to match your environment.
>
>     Performance Update: When Apache ORC Met Apache Spark
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slideshare.net_Hadoop-5FSummit_performance-2Dupdate-2Dwhen-2Dapache-2Dorc-2Dmet-2Dapache-2Dspark-2D81023199&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GGoNln24_g_IjJan3oZ6_8I_OgDQb6xodJ1dfH-x4HA&m=cqrVFLszgxaB2iVqNPuBTurYi3ZroVOHo-wByHRlgUU&s=sOiVSAEwkU5fljJdn7faPFp3TYeAKeEsebVcs6wP65w&e=
>
> The slide will show you the performance improvement
> which Apache Spark 2.3 saw during migrating from Hive 1.2 ORC
> to Apache ORC. In addition, Apache Spark 3.0 migrated from
> Apache Hive 1.2 to 2.3 completely.
>
>     Use Apache Hive 2.3 dependency by default (SPARK-30034)
>
> Currently, Apache Spark provides 3 ORC readers.
>     - Apache-ORC-based native Vectorized Reader
>     - Apache-ORC-based native MR Reader
>     - Apache-Hive-2.3-based MR Reader
>
> So, the question is 'Are you using the latest Apache ORC functionality`
> now?
>
> Dongjoon.
>
> PS. Your environment will be behind again
>        if you are using Apache ORC 1.5.
>        For example, it's because of ORC-744 LazyIO
>
>
>
> On Fri, Aug 27, 2021 at 4:24 PM Purshotam Shah
> <pu...@verizonmedia.com.invalid> wrote:
>
> > Thank you, Owen and Dongjoon, for your reply.
> >
> > Owen, you are right about work involved in supporting apache orc with
> > hive-1.2.
> > We did try merging patches to support apache orc, but it doesn't work out
> > as there were too many changes.
> > We rewrote our code to migrate from hive orc to apache orc, which is 90%
> > successful.  Some work is still pending. It was a lot of work, and we
> > haven't run the complete regression to check if it's breaking anything.
> >
> > Since it involves too many works and we have concerns about stability, we
> > wonder if it's worth move.
> >
> > We also believe that we should good performance improvements with
> ORC-1.5,
> > but we didn't find much information on performance improvements when we
> > look at commits.
> >
> > We can't move to hive.2-x, as we built some of our features on hive-1.2.
> >
> > Thanks,
> >
> >
> > On Thu, Aug 26, 2021 at 9:44 PM Dongjoon Hyun <do...@gmail.com>
> > wrote:
> >
> > > A correction: ORC-672 landed at branch-1.5 too.
> > >
> > > > Some issues like ORC-672 didn't land at branch-1.5 at all.
> > >
> > >
> > > On Thu, Aug 26, 2021 at 8:31 PM Dongjoon Hyun <dongjoon.hyun@gmail.com
> >
> > > wrote:
> > >
> > > > I agree with Owen.
> > > >
> > > > BTW, Purshotam,
> > > > Why not Apache ORC 1.6.10 (or 1.7.0) instead of ORC 1.5.12?
> > > >
> > > > Apache ORC 1.5.12 was released one-year ago
> > > > and also has known bug fixes.
> > > >
> > > > Some issues like ORC-672 didn't land at branch-1.5 at all.
> > > >
> > > > In addition, Apache ORC 1.7.0 is coming soon.
> > > >
> > > > After the 1.7.0 release, we will mark 1.6.11 as `Stable`
> > > > and 1.5.x as `Archived` in our release cycle.
> > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__orc.apache.org_docs_releases.html&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GGoNln24_g_IjJan3oZ6_8I_OgDQb6xodJ1dfH-x4HA&m=zOUtBFYoyxRfNY3bUutsPzo9UcO4STC3flyOSN4cgo0&s=WMv9e7mBV1CqCIXB_8h3EjKyB7dDgcRMrU9-8Bqt-Us&e=
> > > >
> > > > We may have a new release, 1.5.13, as a EOL release,
> > > > at that time, but we don't expect more future 1.5.x releases
> > > > after that.
> > > >
> > > > Dongjoon.
> > > >
> > > >
> > > > On Thu, Aug 26, 2021 at 10:53 AM Owen O'Malley <
> owen.omalley@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> Upgrading the internal version of ORC that is bundled into Hive 1.2
> > will
> > > >> be
> > > >> a lot of work. To be honest, you should strongly consider moving to
> > Hive
> > > >> 2.3 (or later), which uses the standalone ORC 1.3. Upgrading that to
> > ORC
> > > >> 1.5 or 1.6 would be relatively straightforward.
> > > >>
> > > >> The short answer is that there have been a lot of performance
> > > improvements
> > > >> and bug fixes, but I've never run benchmarks between those
> particular
> > > >> versions.
> > > >>
> > > >> .. Owen
> > > >>
> > > >> On Thu, Aug 26, 2021 at 5:52 AM Purshotam Shah
> > > >> <pu...@verizonmedia.com.invalid> wrote:
> > > >>
> > > >> > Hi Dongjoon,
> > > >> >
> > > >> > Thanks for your reply.
> > > >> >
> > > >> > Yes, we are planning to build hive-1.2 with Apache ORC 1.5.12.
> > > >> > We realized that it's a lot of work as we have to merge multiple
> > > >> patches.
> > > >> > Hive-1.2 has been very stable for us. We are wondering if it's
> worth
> > > >> > building hive-1.2 with Apache ORC 1.5.12. We can't migrate to
> > hive-2.x
> > > >> or
> > > >> > hive-3.x, as we have built some of our features on top of
> hive-1.2.
> > > >> >
> > > >> > We looked at orc commit logs and didn't find much information on
> > > >> > performance improvement.  This is where we need some input.
> > > >> > Do you think that we will get some performance improvement? If
> yes,
> > it
> > > >> will
> > > >> > be nice if you can share some details.
> > > >> >
> > > >> > Yes, if we decided to proceed with hive-1.2 with ORC 1.5.12, we
> > would
> > > >> run
> > > >> > some performance tests.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> >
> > > >> > On Wed, Aug 25, 2021 at 11:08 AM Dongjoon Hyun <
> > > dongjoon.hyun@gmail.com
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> > > What is the baseline for your comparison?
> > > >> > >
> > > >> > > FYI, the community status is like the following.
> > > >> > >
> > > >> > > - Apache Hive 1.2.2 is not using Apache ORC.
> > > >> > > - Apache Hive 2.3.9 is using Apache ORC 1.3.4.
> > > >> > > - Apache Hive 3.1.2 is using Apache ORC 1.5.6.
> > > >> > > - Apache Hive 4.0.0-SNAPSHOT is using Apache ORC 1.6.9.
> > > >> > >
> > > >> > > So, specifically, are you going to build from the Hive 1.2
> source
> > > with
> > > >> > > Apache ORC 1.5.12 and compare it with Apache Hive 1.2.2?
> > > >> > >
> > > >> > > Dongjoon.
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Aug 24, 2021 at 11:54 PM Purshotam Shah
> > > >> > > <pu...@verizonmedia.com.invalid> wrote:
> > > >> > >
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > We have been running hive 1.2  successfully for few years.
> > > Hive-1.2
> > > >> has
> > > >> > > > been very stable for us.
> > > >> > > >
> > > >> > > > We are planning to migrate to apache orc-1.5.12 thinking that
> we
> > > >> might
> > > >> > > get
> > > >> > > > better performance.
> > > >> > > > The plan is to keep hive-1.2 and replace orc with apache
> > > orc-1.5.12.
> > > >> > > >
> > > >> > > > We looked at the orc commit logs and didn't find much
> > information
> > > on
> > > >> > > > performance improvement.
> > > >> > > >
> > > >> > > > Would you mind sharing some of the performance improvements we
> > > might
> > > >> > get
> > > >> > > > after upgrading to  orc-1.5.12 with hive-1.2?
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [E] Re: Performance of hive 1.2 with ORC-1.5.x

Posted by Dongjoon Hyun <do...@gmail.com>.
You will not get any improvement if you stick to the old functionality only.

Here is Owen and my presentation in 2007.
It's a very old one, but it seems to match your environment.

    Performance Update: When Apache ORC Met Apache Spark

https://www.slideshare.net/Hadoop_Summit/performance-update-when-apache-orc-met-apache-spark-81023199

The slide will show you the performance improvement
which Apache Spark 2.3 saw during migrating from Hive 1.2 ORC
to Apache ORC. In addition, Apache Spark 3.0 migrated from
Apache Hive 1.2 to 2.3 completely.

    Use Apache Hive 2.3 dependency by default (SPARK-30034)

Currently, Apache Spark provides 3 ORC readers.
    - Apache-ORC-based native Vectorized Reader
    - Apache-ORC-based native MR Reader
    - Apache-Hive-2.3-based MR Reader

So, the question is 'Are you using the latest Apache ORC functionality` now?

Dongjoon.

PS. Your environment will be behind again
       if you are using Apache ORC 1.5.
       For example, it's because of ORC-744 LazyIO



On Fri, Aug 27, 2021 at 4:24 PM Purshotam Shah
<pu...@verizonmedia.com.invalid> wrote:

> Thank you, Owen and Dongjoon, for your reply.
>
> Owen, you are right about work involved in supporting apache orc with
> hive-1.2.
> We did try merging patches to support apache orc, but it doesn't work out
> as there were too many changes.
> We rewrote our code to migrate from hive orc to apache orc, which is 90%
> successful.  Some work is still pending. It was a lot of work, and we
> haven't run the complete regression to check if it's breaking anything.
>
> Since it involves too many works and we have concerns about stability, we
> wonder if it's worth move.
>
> We also believe that we should good performance improvements with ORC-1.5,
> but we didn't find much information on performance improvements when we
> look at commits.
>
> We can't move to hive.2-x, as we built some of our features on hive-1.2.
>
> Thanks,
>
>
> On Thu, Aug 26, 2021 at 9:44 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> > A correction: ORC-672 landed at branch-1.5 too.
> >
> > > Some issues like ORC-672 didn't land at branch-1.5 at all.
> >
> >
> > On Thu, Aug 26, 2021 at 8:31 PM Dongjoon Hyun <do...@gmail.com>
> > wrote:
> >
> > > I agree with Owen.
> > >
> > > BTW, Purshotam,
> > > Why not Apache ORC 1.6.10 (or 1.7.0) instead of ORC 1.5.12?
> > >
> > > Apache ORC 1.5.12 was released one-year ago
> > > and also has known bug fixes.
> > >
> > > Some issues like ORC-672 didn't land at branch-1.5 at all.
> > >
> > > In addition, Apache ORC 1.7.0 is coming soon.
> > >
> > > After the 1.7.0 release, we will mark 1.6.11 as `Stable`
> > > and 1.5.x as `Archived` in our release cycle.
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__orc.apache.org_docs_releases.html&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GGoNln24_g_IjJan3oZ6_8I_OgDQb6xodJ1dfH-x4HA&m=zOUtBFYoyxRfNY3bUutsPzo9UcO4STC3flyOSN4cgo0&s=WMv9e7mBV1CqCIXB_8h3EjKyB7dDgcRMrU9-8Bqt-Us&e=
> > >
> > > We may have a new release, 1.5.13, as a EOL release,
> > > at that time, but we don't expect more future 1.5.x releases
> > > after that.
> > >
> > > Dongjoon.
> > >
> > >
> > > On Thu, Aug 26, 2021 at 10:53 AM Owen O'Malley <owen.omalley@gmail.com
> >
> > > wrote:
> > >
> > >> Upgrading the internal version of ORC that is bundled into Hive 1.2
> will
> > >> be
> > >> a lot of work. To be honest, you should strongly consider moving to
> Hive
> > >> 2.3 (or later), which uses the standalone ORC 1.3. Upgrading that to
> ORC
> > >> 1.5 or 1.6 would be relatively straightforward.
> > >>
> > >> The short answer is that there have been a lot of performance
> > improvements
> > >> and bug fixes, but I've never run benchmarks between those particular
> > >> versions.
> > >>
> > >> .. Owen
> > >>
> > >> On Thu, Aug 26, 2021 at 5:52 AM Purshotam Shah
> > >> <pu...@verizonmedia.com.invalid> wrote:
> > >>
> > >> > Hi Dongjoon,
> > >> >
> > >> > Thanks for your reply.
> > >> >
> > >> > Yes, we are planning to build hive-1.2 with Apache ORC 1.5.12.
> > >> > We realized that it's a lot of work as we have to merge multiple
> > >> patches.
> > >> > Hive-1.2 has been very stable for us. We are wondering if it's worth
> > >> > building hive-1.2 with Apache ORC 1.5.12. We can't migrate to
> hive-2.x
> > >> or
> > >> > hive-3.x, as we have built some of our features on top of hive-1.2.
> > >> >
> > >> > We looked at orc commit logs and didn't find much information on
> > >> > performance improvement.  This is where we need some input.
> > >> > Do you think that we will get some performance improvement? If yes,
> it
> > >> will
> > >> > be nice if you can share some details.
> > >> >
> > >> > Yes, if we decided to proceed with hive-1.2 with ORC 1.5.12, we
> would
> > >> run
> > >> > some performance tests.
> > >> >
> > >> > Thanks,
> > >> >
> > >> >
> > >> > On Wed, Aug 25, 2021 at 11:08 AM Dongjoon Hyun <
> > dongjoon.hyun@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > What is the baseline for your comparison?
> > >> > >
> > >> > > FYI, the community status is like the following.
> > >> > >
> > >> > > - Apache Hive 1.2.2 is not using Apache ORC.
> > >> > > - Apache Hive 2.3.9 is using Apache ORC 1.3.4.
> > >> > > - Apache Hive 3.1.2 is using Apache ORC 1.5.6.
> > >> > > - Apache Hive 4.0.0-SNAPSHOT is using Apache ORC 1.6.9.
> > >> > >
> > >> > > So, specifically, are you going to build from the Hive 1.2 source
> > with
> > >> > > Apache ORC 1.5.12 and compare it with Apache Hive 1.2.2?
> > >> > >
> > >> > > Dongjoon.
> > >> > >
> > >> > >
> > >> > > On Tue, Aug 24, 2021 at 11:54 PM Purshotam Shah
> > >> > > <pu...@verizonmedia.com.invalid> wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > We have been running hive 1.2  successfully for few years.
> > Hive-1.2
> > >> has
> > >> > > > been very stable for us.
> > >> > > >
> > >> > > > We are planning to migrate to apache orc-1.5.12 thinking that we
> > >> might
> > >> > > get
> > >> > > > better performance.
> > >> > > > The plan is to keep hive-1.2 and replace orc with apache
> > orc-1.5.12.
> > >> > > >
> > >> > > > We looked at the orc commit logs and didn't find much
> information
> > on
> > >> > > > performance improvement.
> > >> > > >
> > >> > > > Would you mind sharing some of the performance improvements we
> > might
> > >> > get
> > >> > > > after upgrading to  orc-1.5.12 with hive-1.2?
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [E] Re: Performance of hive 1.2 with ORC-1.5.x

Posted by Purshotam Shah <pu...@verizonmedia.com.INVALID>.
Thank you, Owen and Dongjoon, for your reply.

Owen, you are right about work involved in supporting apache orc with
hive-1.2.
We did try merging patches to support apache orc, but it doesn't work out
as there were too many changes.
We rewrote our code to migrate from hive orc to apache orc, which is 90%
successful.  Some work is still pending. It was a lot of work, and we
haven't run the complete regression to check if it's breaking anything.

Since it involves too many works and we have concerns about stability, we
wonder if it's worth move.

We also believe that we should good performance improvements with ORC-1.5,
but we didn't find much information on performance improvements when we
look at commits.

We can't move to hive.2-x, as we built some of our features on hive-1.2.

Thanks,


On Thu, Aug 26, 2021 at 9:44 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> A correction: ORC-672 landed at branch-1.5 too.
>
> > Some issues like ORC-672 didn't land at branch-1.5 at all.
>
>
> On Thu, Aug 26, 2021 at 8:31 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> > I agree with Owen.
> >
> > BTW, Purshotam,
> > Why not Apache ORC 1.6.10 (or 1.7.0) instead of ORC 1.5.12?
> >
> > Apache ORC 1.5.12 was released one-year ago
> > and also has known bug fixes.
> >
> > Some issues like ORC-672 didn't land at branch-1.5 at all.
> >
> > In addition, Apache ORC 1.7.0 is coming soon.
> >
> > After the 1.7.0 release, we will mark 1.6.11 as `Stable`
> > and 1.5.x as `Archived` in our release cycle.
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__orc.apache.org_docs_releases.html&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GGoNln24_g_IjJan3oZ6_8I_OgDQb6xodJ1dfH-x4HA&m=zOUtBFYoyxRfNY3bUutsPzo9UcO4STC3flyOSN4cgo0&s=WMv9e7mBV1CqCIXB_8h3EjKyB7dDgcRMrU9-8Bqt-Us&e=
> >
> > We may have a new release, 1.5.13, as a EOL release,
> > at that time, but we don't expect more future 1.5.x releases
> > after that.
> >
> > Dongjoon.
> >
> >
> > On Thu, Aug 26, 2021 at 10:53 AM Owen O'Malley <ow...@gmail.com>
> > wrote:
> >
> >> Upgrading the internal version of ORC that is bundled into Hive 1.2 will
> >> be
> >> a lot of work. To be honest, you should strongly consider moving to Hive
> >> 2.3 (or later), which uses the standalone ORC 1.3. Upgrading that to ORC
> >> 1.5 or 1.6 would be relatively straightforward.
> >>
> >> The short answer is that there have been a lot of performance
> improvements
> >> and bug fixes, but I've never run benchmarks between those particular
> >> versions.
> >>
> >> .. Owen
> >>
> >> On Thu, Aug 26, 2021 at 5:52 AM Purshotam Shah
> >> <pu...@verizonmedia.com.invalid> wrote:
> >>
> >> > Hi Dongjoon,
> >> >
> >> > Thanks for your reply.
> >> >
> >> > Yes, we are planning to build hive-1.2 with Apache ORC 1.5.12.
> >> > We realized that it's a lot of work as we have to merge multiple
> >> patches.
> >> > Hive-1.2 has been very stable for us. We are wondering if it's worth
> >> > building hive-1.2 with Apache ORC 1.5.12. We can't migrate to hive-2.x
> >> or
> >> > hive-3.x, as we have built some of our features on top of hive-1.2.
> >> >
> >> > We looked at orc commit logs and didn't find much information on
> >> > performance improvement.  This is where we need some input.
> >> > Do you think that we will get some performance improvement? If yes, it
> >> will
> >> > be nice if you can share some details.
> >> >
> >> > Yes, if we decided to proceed with hive-1.2 with ORC 1.5.12, we would
> >> run
> >> > some performance tests.
> >> >
> >> > Thanks,
> >> >
> >> >
> >> > On Wed, Aug 25, 2021 at 11:08 AM Dongjoon Hyun <
> dongjoon.hyun@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > What is the baseline for your comparison?
> >> > >
> >> > > FYI, the community status is like the following.
> >> > >
> >> > > - Apache Hive 1.2.2 is not using Apache ORC.
> >> > > - Apache Hive 2.3.9 is using Apache ORC 1.3.4.
> >> > > - Apache Hive 3.1.2 is using Apache ORC 1.5.6.
> >> > > - Apache Hive 4.0.0-SNAPSHOT is using Apache ORC 1.6.9.
> >> > >
> >> > > So, specifically, are you going to build from the Hive 1.2 source
> with
> >> > > Apache ORC 1.5.12 and compare it with Apache Hive 1.2.2?
> >> > >
> >> > > Dongjoon.
> >> > >
> >> > >
> >> > > On Tue, Aug 24, 2021 at 11:54 PM Purshotam Shah
> >> > > <pu...@verizonmedia.com.invalid> wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > We have been running hive 1.2  successfully for few years.
> Hive-1.2
> >> has
> >> > > > been very stable for us.
> >> > > >
> >> > > > We are planning to migrate to apache orc-1.5.12 thinking that we
> >> might
> >> > > get
> >> > > > better performance.
> >> > > > The plan is to keep hive-1.2 and replace orc with apache
> orc-1.5.12.
> >> > > >
> >> > > > We looked at the orc commit logs and didn't find much information
> on
> >> > > > performance improvement.
> >> > > >
> >> > > > Would you mind sharing some of the performance improvements we
> might
> >> > get
> >> > > > after upgrading to  orc-1.5.12 with hive-1.2?
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [E] Re: Performance of hive 1.2 with ORC-1.5.x

Posted by Dongjoon Hyun <do...@gmail.com>.
A correction: ORC-672 landed at branch-1.5 too.

> Some issues like ORC-672 didn't land at branch-1.5 at all.


On Thu, Aug 26, 2021 at 8:31 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> I agree with Owen.
>
> BTW, Purshotam,
> Why not Apache ORC 1.6.10 (or 1.7.0) instead of ORC 1.5.12?
>
> Apache ORC 1.5.12 was released one-year ago
> and also has known bug fixes.
>
> Some issues like ORC-672 didn't land at branch-1.5 at all.
>
> In addition, Apache ORC 1.7.0 is coming soon.
>
> After the 1.7.0 release, we will mark 1.6.11 as `Stable`
> and 1.5.x as `Archived` in our release cycle.
>
>     https://orc.apache.org/docs/releases.html
>
> We may have a new release, 1.5.13, as a EOL release,
> at that time, but we don't expect more future 1.5.x releases
> after that.
>
> Dongjoon.
>
>
> On Thu, Aug 26, 2021 at 10:53 AM Owen O'Malley <ow...@gmail.com>
> wrote:
>
>> Upgrading the internal version of ORC that is bundled into Hive 1.2 will
>> be
>> a lot of work. To be honest, you should strongly consider moving to Hive
>> 2.3 (or later), which uses the standalone ORC 1.3. Upgrading that to ORC
>> 1.5 or 1.6 would be relatively straightforward.
>>
>> The short answer is that there have been a lot of performance improvements
>> and bug fixes, but I've never run benchmarks between those particular
>> versions.
>>
>> .. Owen
>>
>> On Thu, Aug 26, 2021 at 5:52 AM Purshotam Shah
>> <pu...@verizonmedia.com.invalid> wrote:
>>
>> > Hi Dongjoon,
>> >
>> > Thanks for your reply.
>> >
>> > Yes, we are planning to build hive-1.2 with Apache ORC 1.5.12.
>> > We realized that it's a lot of work as we have to merge multiple
>> patches.
>> > Hive-1.2 has been very stable for us. We are wondering if it's worth
>> > building hive-1.2 with Apache ORC 1.5.12. We can't migrate to hive-2.x
>> or
>> > hive-3.x, as we have built some of our features on top of hive-1.2.
>> >
>> > We looked at orc commit logs and didn't find much information on
>> > performance improvement.  This is where we need some input.
>> > Do you think that we will get some performance improvement? If yes, it
>> will
>> > be nice if you can share some details.
>> >
>> > Yes, if we decided to proceed with hive-1.2 with ORC 1.5.12, we would
>> run
>> > some performance tests.
>> >
>> > Thanks,
>> >
>> >
>> > On Wed, Aug 25, 2021 at 11:08 AM Dongjoon Hyun <dongjoon.hyun@gmail.com
>> >
>> > wrote:
>> >
>> > > What is the baseline for your comparison?
>> > >
>> > > FYI, the community status is like the following.
>> > >
>> > > - Apache Hive 1.2.2 is not using Apache ORC.
>> > > - Apache Hive 2.3.9 is using Apache ORC 1.3.4.
>> > > - Apache Hive 3.1.2 is using Apache ORC 1.5.6.
>> > > - Apache Hive 4.0.0-SNAPSHOT is using Apache ORC 1.6.9.
>> > >
>> > > So, specifically, are you going to build from the Hive 1.2 source with
>> > > Apache ORC 1.5.12 and compare it with Apache Hive 1.2.2?
>> > >
>> > > Dongjoon.
>> > >
>> > >
>> > > On Tue, Aug 24, 2021 at 11:54 PM Purshotam Shah
>> > > <pu...@verizonmedia.com.invalid> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > We have been running hive 1.2  successfully for few years. Hive-1.2
>> has
>> > > > been very stable for us.
>> > > >
>> > > > We are planning to migrate to apache orc-1.5.12 thinking that we
>> might
>> > > get
>> > > > better performance.
>> > > > The plan is to keep hive-1.2 and replace orc with apache orc-1.5.12.
>> > > >
>> > > > We looked at the orc commit logs and didn't find much information on
>> > > > performance improvement.
>> > > >
>> > > > Would you mind sharing some of the performance improvements we might
>> > get
>> > > > after upgrading to  orc-1.5.12 with hive-1.2?
>> > > >
>> > > > Thanks,
>> > > >
>> > >
>> >
>>
>

Re: [E] Re: Performance of hive 1.2 with ORC-1.5.x

Posted by Dongjoon Hyun <do...@gmail.com>.
I agree with Owen.

BTW, Purshotam,
Why not Apache ORC 1.6.10 (or 1.7.0) instead of ORC 1.5.12?

Apache ORC 1.5.12 was released one-year ago
and also has known bug fixes.

Some issues like ORC-672 didn't land at branch-1.5 at all.

In addition, Apache ORC 1.7.0 is coming soon.

After the 1.7.0 release, we will mark 1.6.11 as `Stable`
and 1.5.x as `Archived` in our release cycle.

    https://orc.apache.org/docs/releases.html

We may have a new release, 1.5.13, as a EOL release,
at that time, but we don't expect more future 1.5.x releases
after that.

Dongjoon.


On Thu, Aug 26, 2021 at 10:53 AM Owen O'Malley <ow...@gmail.com>
wrote:

> Upgrading the internal version of ORC that is bundled into Hive 1.2 will be
> a lot of work. To be honest, you should strongly consider moving to Hive
> 2.3 (or later), which uses the standalone ORC 1.3. Upgrading that to ORC
> 1.5 or 1.6 would be relatively straightforward.
>
> The short answer is that there have been a lot of performance improvements
> and bug fixes, but I've never run benchmarks between those particular
> versions.
>
> .. Owen
>
> On Thu, Aug 26, 2021 at 5:52 AM Purshotam Shah
> <pu...@verizonmedia.com.invalid> wrote:
>
> > Hi Dongjoon,
> >
> > Thanks for your reply.
> >
> > Yes, we are planning to build hive-1.2 with Apache ORC 1.5.12.
> > We realized that it's a lot of work as we have to merge multiple patches.
> > Hive-1.2 has been very stable for us. We are wondering if it's worth
> > building hive-1.2 with Apache ORC 1.5.12. We can't migrate to hive-2.x or
> > hive-3.x, as we have built some of our features on top of hive-1.2.
> >
> > We looked at orc commit logs and didn't find much information on
> > performance improvement.  This is where we need some input.
> > Do you think that we will get some performance improvement? If yes, it
> will
> > be nice if you can share some details.
> >
> > Yes, if we decided to proceed with hive-1.2 with ORC 1.5.12, we would run
> > some performance tests.
> >
> > Thanks,
> >
> >
> > On Wed, Aug 25, 2021 at 11:08 AM Dongjoon Hyun <do...@gmail.com>
> > wrote:
> >
> > > What is the baseline for your comparison?
> > >
> > > FYI, the community status is like the following.
> > >
> > > - Apache Hive 1.2.2 is not using Apache ORC.
> > > - Apache Hive 2.3.9 is using Apache ORC 1.3.4.
> > > - Apache Hive 3.1.2 is using Apache ORC 1.5.6.
> > > - Apache Hive 4.0.0-SNAPSHOT is using Apache ORC 1.6.9.
> > >
> > > So, specifically, are you going to build from the Hive 1.2 source with
> > > Apache ORC 1.5.12 and compare it with Apache Hive 1.2.2?
> > >
> > > Dongjoon.
> > >
> > >
> > > On Tue, Aug 24, 2021 at 11:54 PM Purshotam Shah
> > > <pu...@verizonmedia.com.invalid> wrote:
> > >
> > > > Hi,
> > > >
> > > > We have been running hive 1.2  successfully for few years. Hive-1.2
> has
> > > > been very stable for us.
> > > >
> > > > We are planning to migrate to apache orc-1.5.12 thinking that we
> might
> > > get
> > > > better performance.
> > > > The plan is to keep hive-1.2 and replace orc with apache orc-1.5.12.
> > > >
> > > > We looked at the orc commit logs and didn't find much information on
> > > > performance improvement.
> > > >
> > > > Would you mind sharing some of the performance improvements we might
> > get
> > > > after upgrading to  orc-1.5.12 with hive-1.2?
> > > >
> > > > Thanks,
> > > >
> > >
> >
>

Re: [E] Re: Performance of hive 1.2 with ORC-1.5.x

Posted by Owen O'Malley <ow...@gmail.com>.
Upgrading the internal version of ORC that is bundled into Hive 1.2 will be
a lot of work. To be honest, you should strongly consider moving to Hive
2.3 (or later), which uses the standalone ORC 1.3. Upgrading that to ORC
1.5 or 1.6 would be relatively straightforward.

The short answer is that there have been a lot of performance improvements
and bug fixes, but I've never run benchmarks between those particular
versions.

.. Owen

On Thu, Aug 26, 2021 at 5:52 AM Purshotam Shah
<pu...@verizonmedia.com.invalid> wrote:

> Hi Dongjoon,
>
> Thanks for your reply.
>
> Yes, we are planning to build hive-1.2 with Apache ORC 1.5.12.
> We realized that it's a lot of work as we have to merge multiple patches.
> Hive-1.2 has been very stable for us. We are wondering if it's worth
> building hive-1.2 with Apache ORC 1.5.12. We can't migrate to hive-2.x or
> hive-3.x, as we have built some of our features on top of hive-1.2.
>
> We looked at orc commit logs and didn't find much information on
> performance improvement.  This is where we need some input.
> Do you think that we will get some performance improvement? If yes, it will
> be nice if you can share some details.
>
> Yes, if we decided to proceed with hive-1.2 with ORC 1.5.12, we would run
> some performance tests.
>
> Thanks,
>
>
> On Wed, Aug 25, 2021 at 11:08 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> > What is the baseline for your comparison?
> >
> > FYI, the community status is like the following.
> >
> > - Apache Hive 1.2.2 is not using Apache ORC.
> > - Apache Hive 2.3.9 is using Apache ORC 1.3.4.
> > - Apache Hive 3.1.2 is using Apache ORC 1.5.6.
> > - Apache Hive 4.0.0-SNAPSHOT is using Apache ORC 1.6.9.
> >
> > So, specifically, are you going to build from the Hive 1.2 source with
> > Apache ORC 1.5.12 and compare it with Apache Hive 1.2.2?
> >
> > Dongjoon.
> >
> >
> > On Tue, Aug 24, 2021 at 11:54 PM Purshotam Shah
> > <pu...@verizonmedia.com.invalid> wrote:
> >
> > > Hi,
> > >
> > > We have been running hive 1.2  successfully for few years. Hive-1.2 has
> > > been very stable for us.
> > >
> > > We are planning to migrate to apache orc-1.5.12 thinking that we might
> > get
> > > better performance.
> > > The plan is to keep hive-1.2 and replace orc with apache orc-1.5.12.
> > >
> > > We looked at the orc commit logs and didn't find much information on
> > > performance improvement.
> > >
> > > Would you mind sharing some of the performance improvements we might
> get
> > > after upgrading to  orc-1.5.12 with hive-1.2?
> > >
> > > Thanks,
> > >
> >
>