You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Rex Fenley <Re...@remind101.com> on 2020/11/06 18:28:48 UTC

Join Bottleneck

Hello,

I have a Job that's a series of Joins, GroupBys, and Aggs and it's
bottlenecked in one of the joins. The join's cardinality is ~300 million
rows on the left and ~200 million rows on the right all with unique keys.
I'm seeing this in the plan for that bottlenecked Join.

Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id,
user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey],
rightInputSpec=[JoinKeyContainsUniqueKey])

The join condition is basically (left.user_id === right.id). So `id0` must
be right.id here.

My first question is, what is the difference between

leftInputSpec=[HasUniqueKey]

and

rightInputSpec=[JoinKeyContainsUniqueKey]

 ?

Is the left side not using the join key for hashing the join but instead
using its pk id, which would be underperformant?

Is there anything else about this that stands out?

Thanks!

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: Join Bottleneck

Posted by Rex Fenley <Re...@remind101.com>.
Thank you for the clarification.

On Sat, Nov 7, 2020 at 7:37 AM Till Rohrmann <tr...@apache.org> wrote:

> Hi Rex,
>
> "HasUniqueKey" means that the left input has a unique key.
> "JoinKeyContainsUniqueKey" means that the join key of the right side
> contains the unique key of this relation. Hence, it looks normal to me.
>
> Cheers,
> Till
>
> On Fri, Nov 6, 2020 at 7:29 PM Rex Fenley <Re...@remind101.com> wrote:
>
>> Hello,
>>
>> I have a Job that's a series of Joins, GroupBys, and Aggs and it's
>> bottlenecked in one of the joins. The join's cardinality is ~300 million
>> rows on the left and ~200 million rows on the right all with unique keys.
>> I'm seeing this in the plan for that bottlenecked Join.
>>
>> Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id,
>> user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey],
>> rightInputSpec=[JoinKeyContainsUniqueKey])
>>
>> The join condition is basically (left.user_id === right.id). So `id0`
>> must be right.id here.
>>
>> My first question is, what is the difference between
>>
>> leftInputSpec=[HasUniqueKey]
>>
>> and
>>
>> rightInputSpec=[JoinKeyContainsUniqueKey]
>>
>>  ?
>>
>> Is the left side not using the join key for hashing the join but instead
>> using its pk id, which would be underperformant?
>>
>> Is there anything else about this that stands out?
>>
>> Thanks!
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: Join Bottleneck

Posted by Till Rohrmann <tr...@apache.org>.
Hi Rex,

"HasUniqueKey" means that the left input has a unique key.
"JoinKeyContainsUniqueKey" means that the join key of the right side
contains the unique key of this relation. Hence, it looks normal to me.

Cheers,
Till

On Fri, Nov 6, 2020 at 7:29 PM Rex Fenley <Re...@remind101.com> wrote:

> Hello,
>
> I have a Job that's a series of Joins, GroupBys, and Aggs and it's
> bottlenecked in one of the joins. The join's cardinality is ~300 million
> rows on the left and ~200 million rows on the right all with unique keys.
> I'm seeing this in the plan for that bottlenecked Join.
>
> Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id,
> user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey],
> rightInputSpec=[JoinKeyContainsUniqueKey])
>
> The join condition is basically (left.user_id === right.id). So `id0`
> must be right.id here.
>
> My first question is, what is the difference between
>
> leftInputSpec=[HasUniqueKey]
>
> and
>
> rightInputSpec=[JoinKeyContainsUniqueKey]
>
>  ?
>
> Is the left side not using the join key for hashing the join but instead
> using its pk id, which would be underperformant?
>
> Is there anything else about this that stands out?
>
> Thanks!
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>