You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sivaramakrishnan Narayanan <ta...@gmail.com> on 2015/05/07 15:43:55 UTC

Window function possible perf improvement

Hi,

I was reading through the PTFOperator and related code and was wondering if
there is an opportunity to optimize this function in
WindowingTableFunction.java

  public void execute(PTFPartitionIterator<Object> pItr, PTFPartition
outP) throws HiveException {

 This guy iterates over the input partition once to compute outputColumns.
This causes a full read of input partition.

It then iterates over input partition again to append newly computed
values. This causes another read of input partition and a write to output
partition.

I was wondering if it may be more efficient to append to the output
partition as soon as window expressions have been computed. This will avoid
one scan of the input partition.

FYI - I've been looking at hive 0.13 code mostly but a glance at trunk
suggests this logic is the same there.

Thanks,

Siva

Re: Window function possible perf improvement

Posted by Sivaramakrishnan Narayanan <ta...@gmail.com>.
Thanks, I'll take a look at latest changes in more detail. I'd only looked
at the specific function in trunk and it seemed unchanged from 0.13.

On Thu, May 7, 2015 at 7:50 PM, Ashutosh Chauhan <ha...@apache.org>
wrote:

> Harish has done some good work for popular use-case of windowing on
> https://issues.apache.org/jira/browse/HIVE-7062 which are available from
> 0.14 onwards. Will that be useful in your scenario? Or, are you targeting
> non-windowing PTFs?
>
> Thanks,
> Ashutosh
>
> On Thu, May 7, 2015 at 6:43 AM, Sivaramakrishnan Narayanan <
> tarball@gmail.com> wrote:
>
> > Hi,
> >
> > I was reading through the PTFOperator and related code and was wondering
> if
> > there is an opportunity to optimize this function in
> > WindowingTableFunction.java
> >
> >   public void execute(PTFPartitionIterator<Object> pItr, PTFPartition
> > outP) throws HiveException {
> >
> >  This guy iterates over the input partition once to compute
> outputColumns.
> > This causes a full read of input partition.
> >
> > It then iterates over input partition again to append newly computed
> > values. This causes another read of input partition and a write to output
> > partition.
> >
> > I was wondering if it may be more efficient to append to the output
> > partition as soon as window expressions have been computed. This will
> avoid
> > one scan of the input partition.
> >
> > FYI - I've been looking at hive 0.13 code mostly but a glance at trunk
> > suggests this logic is the same there.
> >
> > Thanks,
> >
> > Siva
> >
>

Re: Window function possible perf improvement

Posted by Ashutosh Chauhan <ha...@apache.org>.
Harish has done some good work for popular use-case of windowing on
https://issues.apache.org/jira/browse/HIVE-7062 which are available from
0.14 onwards. Will that be useful in your scenario? Or, are you targeting
non-windowing PTFs?

Thanks,
Ashutosh

On Thu, May 7, 2015 at 6:43 AM, Sivaramakrishnan Narayanan <
tarball@gmail.com> wrote:

> Hi,
>
> I was reading through the PTFOperator and related code and was wondering if
> there is an opportunity to optimize this function in
> WindowingTableFunction.java
>
>   public void execute(PTFPartitionIterator<Object> pItr, PTFPartition
> outP) throws HiveException {
>
>  This guy iterates over the input partition once to compute outputColumns.
> This causes a full read of input partition.
>
> It then iterates over input partition again to append newly computed
> values. This causes another read of input partition and a write to output
> partition.
>
> I was wondering if it may be more efficient to append to the output
> partition as soon as window expressions have been computed. This will avoid
> one scan of the input partition.
>
> FYI - I've been looking at hive 0.13 code mostly but a glance at trunk
> suggests this logic is the same there.
>
> Thanks,
>
> Siva
>