You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "孟庆义(孟庆义)" <qi...@alibaba-inc.com> on 2014/08/29 05:15:41 UTC

HashJoin become slower or even fail due to use of ChunkedResultIterator

Dears: 

 

My use case is “ select * from A inner join B on xx where xx ”. A has
about 400m rows, but the result only has few rows.

 

Problem 1: before ChunkedResultIterator, SpoolingResultIterators will run in
parallel when they created in ParallelIterators. Now they work in a serial
way.

In my case, not using ChunkedResultIterator will get 5times faster. And it
not necessary to use chunked scan as the actually returned rows is few. 

 

Problem2 : as scan work in serial, it may cause some RS’s HashCache
out-of-date, and then fail the join. It happened in my case, and I fix it by
increase the timeout to be 60s(default is 30), but I think a worse case may
trigger it again in some future, 

It’s hard to determine how long is enough.

 

My solution is adding a config option to enable/disable
ChunkedResultIterator.

I’m looking forward for your advice.

 

Daniel.Meng

 


Re: HashJoin become slower or even fail due to use of ChunkedResultIterator

Posted by James Taylor <ja...@apache.org>.
Hello,
This issue was fixed with PHOENIX-1188 which is in our 3.1/4.1
release. Would you mind trying on the newly released version?

There's a configuration parameter, phoenix.query.scanResultChunkSize,
that controls after how many rows chunking starts. The default value
is 2999, so if the total number of rows is less than this, no chunking
will occur.

Thanks,
James

On Thu, Aug 28, 2014 at 8:15 PM, 孟庆义(孟庆义) <qi...@alibaba-inc.com> wrote:
> Dears:
>
>
>
> My use case is “ select * from A inner join B on xx where xx ”. A has
> about 400m rows, but the result only has few rows.
>
>
>
> Problem 1: before ChunkedResultIterator, SpoolingResultIterators will run in
> parallel when they created in ParallelIterators. Now they work in a serial
> way.
>
> In my case, not using ChunkedResultIterator will get 5times faster. And it
> not necessary to use chunked scan as the actually returned rows is few.
>
>
>
> Problem2 : as scan work in serial, it may cause some RS’s HashCache
> out-of-date, and then fail the join. It happened in my case, and I fix it by
> increase the timeout to be 60s(default is 30), but I think a worse case may
> trigger it again in some future,
>
> It’s hard to determine how long is enough.
>
>
>
> My solution is adding a config option to enable/disable
> ChunkedResultIterator.
>
> I’m looking forward for your advice.
>
>
>
> Daniel.Meng
>
>
>