You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Purshotam Shah <pu...@verizonmedia.com.INVALID> on 2021/09/02 07:02:16 UTC

Re: [E] Re: Performance of hive 1.2 with ORC-1.5.x

Thank you, Dongjoon. This is very helpful.
For now, we will be migrating to ORC 1.5 and then upgrade to the
latest stable version later on.
Do you know if there are any backward compatibility issues with Hive 1.2
ORC and Apache ORC 1.5.x?

We should be able to rollback to Hive 1.2 ORC if there is an issue with Apache
ORC.
Hive 1.2 ORC should be able to read the ORC files created by Apache ORC.

Thanks,


On Sun, Aug 29, 2021 at 10:03 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> You will not get any improvement if you stick to the old functionality
> only.
>
> Here is Owen and my presentation in 2007.
> It's a very old one, but it seems to match your environment.
>
>     Performance Update: When Apache ORC Met Apache Spark
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.slideshare.net_Hadoop-5FSummit_performance-2Dupdate-2Dwhen-2Dapache-2Dorc-2Dmet-2Dapache-2Dspark-2D81023199&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GGoNln24_g_IjJan3oZ6_8I_OgDQb6xodJ1dfH-x4HA&m=cqrVFLszgxaB2iVqNPuBTurYi3ZroVOHo-wByHRlgUU&s=sOiVSAEwkU5fljJdn7faPFp3TYeAKeEsebVcs6wP65w&e=
>
> The slide will show you the performance improvement
> which Apache Spark 2.3 saw during migrating from Hive 1.2 ORC
> to Apache ORC. In addition, Apache Spark 3.0 migrated from
> Apache Hive 1.2 to 2.3 completely.
>
>     Use Apache Hive 2.3 dependency by default (SPARK-30034)
>
> Currently, Apache Spark provides 3 ORC readers.
>     - Apache-ORC-based native Vectorized Reader
>     - Apache-ORC-based native MR Reader
>     - Apache-Hive-2.3-based MR Reader
>
> So, the question is 'Are you using the latest Apache ORC functionality`
> now?
>
> Dongjoon.
>
> PS. Your environment will be behind again
>        if you are using Apache ORC 1.5.
>        For example, it's because of ORC-744 LazyIO
>
>
>
> On Fri, Aug 27, 2021 at 4:24 PM Purshotam Shah
> <pu...@verizonmedia.com.invalid> wrote:
>
> > Thank you, Owen and Dongjoon, for your reply.
> >
> > Owen, you are right about work involved in supporting apache orc with
> > hive-1.2.
> > We did try merging patches to support apache orc, but it doesn't work out
> > as there were too many changes.
> > We rewrote our code to migrate from hive orc to apache orc, which is 90%
> > successful.  Some work is still pending. It was a lot of work, and we
> > haven't run the complete regression to check if it's breaking anything.
> >
> > Since it involves too many works and we have concerns about stability, we
> > wonder if it's worth move.
> >
> > We also believe that we should good performance improvements with
> ORC-1.5,
> > but we didn't find much information on performance improvements when we
> > look at commits.
> >
> > We can't move to hive.2-x, as we built some of our features on hive-1.2.
> >
> > Thanks,
> >
> >
> > On Thu, Aug 26, 2021 at 9:44 PM Dongjoon Hyun <do...@gmail.com>
> > wrote:
> >
> > > A correction: ORC-672 landed at branch-1.5 too.
> > >
> > > > Some issues like ORC-672 didn't land at branch-1.5 at all.
> > >
> > >
> > > On Thu, Aug 26, 2021 at 8:31 PM Dongjoon Hyun <dongjoon.hyun@gmail.com
> >
> > > wrote:
> > >
> > > > I agree with Owen.
> > > >
> > > > BTW, Purshotam,
> > > > Why not Apache ORC 1.6.10 (or 1.7.0) instead of ORC 1.5.12?
> > > >
> > > > Apache ORC 1.5.12 was released one-year ago
> > > > and also has known bug fixes.
> > > >
> > > > Some issues like ORC-672 didn't land at branch-1.5 at all.
> > > >
> > > > In addition, Apache ORC 1.7.0 is coming soon.
> > > >
> > > > After the 1.7.0 release, we will mark 1.6.11 as `Stable`
> > > > and 1.5.x as `Archived` in our release cycle.
> > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__orc.apache.org_docs_releases.html&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=GGoNln24_g_IjJan3oZ6_8I_OgDQb6xodJ1dfH-x4HA&m=zOUtBFYoyxRfNY3bUutsPzo9UcO4STC3flyOSN4cgo0&s=WMv9e7mBV1CqCIXB_8h3EjKyB7dDgcRMrU9-8Bqt-Us&e=
> > > >
> > > > We may have a new release, 1.5.13, as a EOL release,
> > > > at that time, but we don't expect more future 1.5.x releases
> > > > after that.
> > > >
> > > > Dongjoon.
> > > >
> > > >
> > > > On Thu, Aug 26, 2021 at 10:53 AM Owen O'Malley <
> owen.omalley@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> Upgrading the internal version of ORC that is bundled into Hive 1.2
> > will
> > > >> be
> > > >> a lot of work. To be honest, you should strongly consider moving to
> > Hive
> > > >> 2.3 (or later), which uses the standalone ORC 1.3. Upgrading that to
> > ORC
> > > >> 1.5 or 1.6 would be relatively straightforward.
> > > >>
> > > >> The short answer is that there have been a lot of performance
> > > improvements
> > > >> and bug fixes, but I've never run benchmarks between those
> particular
> > > >> versions.
> > > >>
> > > >> .. Owen
> > > >>
> > > >> On Thu, Aug 26, 2021 at 5:52 AM Purshotam Shah
> > > >> <pu...@verizonmedia.com.invalid> wrote:
> > > >>
> > > >> > Hi Dongjoon,
> > > >> >
> > > >> > Thanks for your reply.
> > > >> >
> > > >> > Yes, we are planning to build hive-1.2 with Apache ORC 1.5.12.
> > > >> > We realized that it's a lot of work as we have to merge multiple
> > > >> patches.
> > > >> > Hive-1.2 has been very stable for us. We are wondering if it's
> worth
> > > >> > building hive-1.2 with Apache ORC 1.5.12. We can't migrate to
> > hive-2.x
> > > >> or
> > > >> > hive-3.x, as we have built some of our features on top of
> hive-1.2.
> > > >> >
> > > >> > We looked at orc commit logs and didn't find much information on
> > > >> > performance improvement.  This is where we need some input.
> > > >> > Do you think that we will get some performance improvement? If
> yes,
> > it
> > > >> will
> > > >> > be nice if you can share some details.
> > > >> >
> > > >> > Yes, if we decided to proceed with hive-1.2 with ORC 1.5.12, we
> > would
> > > >> run
> > > >> > some performance tests.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> >
> > > >> > On Wed, Aug 25, 2021 at 11:08 AM Dongjoon Hyun <
> > > dongjoon.hyun@gmail.com
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> > > What is the baseline for your comparison?
> > > >> > >
> > > >> > > FYI, the community status is like the following.
> > > >> > >
> > > >> > > - Apache Hive 1.2.2 is not using Apache ORC.
> > > >> > > - Apache Hive 2.3.9 is using Apache ORC 1.3.4.
> > > >> > > - Apache Hive 3.1.2 is using Apache ORC 1.5.6.
> > > >> > > - Apache Hive 4.0.0-SNAPSHOT is using Apache ORC 1.6.9.
> > > >> > >
> > > >> > > So, specifically, are you going to build from the Hive 1.2
> source
> > > with
> > > >> > > Apache ORC 1.5.12 and compare it with Apache Hive 1.2.2?
> > > >> > >
> > > >> > > Dongjoon.
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Aug 24, 2021 at 11:54 PM Purshotam Shah
> > > >> > > <pu...@verizonmedia.com.invalid> wrote:
> > > >> > >
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > We have been running hive 1.2  successfully for few years.
> > > Hive-1.2
> > > >> has
> > > >> > > > been very stable for us.
> > > >> > > >
> > > >> > > > We are planning to migrate to apache orc-1.5.12 thinking that
> we
> > > >> might
> > > >> > > get
> > > >> > > > better performance.
> > > >> > > > The plan is to keep hive-1.2 and replace orc with apache
> > > orc-1.5.12.
> > > >> > > >
> > > >> > > > We looked at the orc commit logs and didn't find much
> > information
> > > on
> > > >> > > > performance improvement.
> > > >> > > >
> > > >> > > > Would you mind sharing some of the performance improvements we
> > > might
> > > >> > get
> > > >> > > > after upgrading to  orc-1.5.12 with hive-1.2?
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>