You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by David <da...@gmail.com> on 2021/08/31 14:52:27 UTC

Performance Update

Hello Gang,

As you probably know, I've been working on a series of small performance
improvements in ORC-core and I just wanted to highlight the progress thus
far.

Far from "exacting," but I ran the ORC benchmark test for generating
(writing) NYC taxi data.  I compared the current main branch with the 1.6
branch.  The tests ran for roughly the same amount of time and I observed a
performance improvement of roughly 25% for this particular workload.

I've attached the images.

Normally, I would do this work as simply a hobby, in the same way that many
people enjoy the mental stimulation of Sudoku, but now that businesses are
paying per unit of compute, time is literally money.  I hope this saves you
some time. And money.

Thanks for all the reviews; you made this possible.

More PRs to come.

Re: Performance Update

Posted by Dongjoon Hyun <do...@apache.org>.
Thanks!

On 2021/08/31 16:20:37, David <da...@gmail.com> wrote: 
> Hello,
> 
> It may be that the apache.org mail server stripped the attachments.
> 
> The short text version is:
> 
> Run Time: ~1,500,000 ms runs
> 
> main: ORC - 90,500 ms (5.9%)
> 1.6: ORC - 121,100 ms (7.9%)
> 
> The percentage is percent of the total CPU time, the rest of the CPU time
> went to processing the other formats included in the benchmark framework
> JSON, Parquet, AVRO, etc.
> 
> Thanks.
> 
> On Tue, Aug 31, 2021 at 12:10 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
> 
> > Thank you for sharing, but it seems that you missed to attach.
> > I'm looking forward to seeing it. :)
> >
> > Dongjoon.
> >
> > On Tue, Aug 31, 2021 at 7:53 AM David <da...@gmail.com> wrote:
> >
> > > Hello Gang,
> > >
> > > As you probably know, I've been working on a series of small performance
> > > improvements in ORC-core and I just wanted to highlight the progress thus
> > > far.
> > >
> > > Far from "exacting," but I ran the ORC benchmark test for generating
> > > (writing) NYC taxi data.  I compared the current main branch with the 1.6
> > > branch.  The tests ran for roughly the same amount of time and I
> > observed a
> > > performance improvement of roughly 25% for this particular workload.
> > >
> > > I've attached the images.
> > >
> > > Normally, I would do this work as simply a hobby, in the same way that
> > > many people enjoy the mental stimulation of Sudoku, but now that
> > businesses
> > > are paying per unit of compute, time is literally money.  I hope this
> > saves
> > > you some time. And money.
> > >
> > > Thanks for all the reviews; you made this possible.
> > >
> > > More PRs to come.
> > >
> > >
> >
> 

Re: Performance Update

Posted by David <da...@gmail.com>.
Hello,

It may be that the apache.org mail server stripped the attachments.

The short text version is:

Run Time: ~1,500,000 ms runs

main: ORC - 90,500 ms (5.9%)
1.6: ORC - 121,100 ms (7.9%)

The percentage is percent of the total CPU time, the rest of the CPU time
went to processing the other formats included in the benchmark framework
JSON, Parquet, AVRO, etc.

Thanks.

On Tue, Aug 31, 2021 at 12:10 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you for sharing, but it seems that you missed to attach.
> I'm looking forward to seeing it. :)
>
> Dongjoon.
>
> On Tue, Aug 31, 2021 at 7:53 AM David <da...@gmail.com> wrote:
>
> > Hello Gang,
> >
> > As you probably know, I've been working on a series of small performance
> > improvements in ORC-core and I just wanted to highlight the progress thus
> > far.
> >
> > Far from "exacting," but I ran the ORC benchmark test for generating
> > (writing) NYC taxi data.  I compared the current main branch with the 1.6
> > branch.  The tests ran for roughly the same amount of time and I
> observed a
> > performance improvement of roughly 25% for this particular workload.
> >
> > I've attached the images.
> >
> > Normally, I would do this work as simply a hobby, in the same way that
> > many people enjoy the mental stimulation of Sudoku, but now that
> businesses
> > are paying per unit of compute, time is literally money.  I hope this
> saves
> > you some time. And money.
> >
> > Thanks for all the reviews; you made this possible.
> >
> > More PRs to come.
> >
> >
>

Re: Performance Update

Posted by Dongjoon Hyun <do...@gmail.com>.
Thank you for sharing, but it seems that you missed to attach.
I'm looking forward to seeing it. :)

Dongjoon.

On Tue, Aug 31, 2021 at 7:53 AM David <da...@gmail.com> wrote:

> Hello Gang,
>
> As you probably know, I've been working on a series of small performance
> improvements in ORC-core and I just wanted to highlight the progress thus
> far.
>
> Far from "exacting," but I ran the ORC benchmark test for generating
> (writing) NYC taxi data.  I compared the current main branch with the 1.6
> branch.  The tests ran for roughly the same amount of time and I observed a
> performance improvement of roughly 25% for this particular workload.
>
> I've attached the images.
>
> Normally, I would do this work as simply a hobby, in the same way that
> many people enjoy the mental stimulation of Sudoku, but now that businesses
> are paying per unit of compute, time is literally money.  I hope this saves
> you some time. And money.
>
> Thanks for all the reviews; you made this possible.
>
> More PRs to come.
>
>