You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Enrico Minack <ma...@Enrico.Minack.dev> on 2019/11/06 14:50:31 UTC

[SPARK-29176][DISCUSS] Optimization should change join type to CROSS

Hi,

I would like to discuss issue SPARK-29176 to see if this is considered a 
bug and if so, to sketch out a fix.

In short, the issue is that a valid inner join with condition gets 
optimized so that no condition is left, but the type is still INNER. 
Then CheckCartesianProducts throws an exception. The type should have 
changed to CROSS when it gets optimized in that way.

I understand that with spark.sql.crossJoin.enabled you can make Spark 
not throw this exception, but I think you should not need this 
work-around for a valid query.

Please let me know what you think about this issue and how I could fix 
it. It might affect more rules than the two given in the Jira ticket.

Thanks,
Enrico

Re: [SPARK-29176][DISCUSS] Optimization should change join type to CROSS

Posted by Enrico Minack <ma...@Enrico.Minack.dev>.
So you say the optimized inner join with no conditions is also a valid 
query?

Then I agree the optimizer is not breaking the query, hence it is not a bug.

Enrico

Am 06.11.19 um 15:53 schrieb Sean Owen:
> You asked for an inner join but it turned into a cross-join. This
> might be surprising, hence the error you can disable.
> The query is not invalid in any case. It's just stopping you from
> doing something you may not meant to, and which may be expensive.
> However I think we've already changed the default to enable it in
> Spark 3 anyway.
>
> On Wed, Nov 6, 2019 at 8:50 AM Enrico Minack <ma...@enrico.minack.dev> wrote:
>> Hi,
>>
>> I would like to discuss issue SPARK-29176 to see if this is considered a bug and if so, to sketch out a fix.
>>
>> In short, the issue is that a valid inner join with condition gets optimized so that no condition is left, but the type is still INNER. Then CheckCartesianProducts throws an exception. The type should have changed to CROSS when it gets optimized in that way.
>>
>> I understand that with spark.sql.crossJoin.enabled you can make Spark not throw this exception, but I think you should not need this work-around for a valid query.
>>
>> Please let me know what you think about this issue and how I could fix it. It might affect more rules than the two given in the Jira ticket.
>>
>> Thanks,
>> Enrico
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: [SPARK-29176][DISCUSS] Optimization should change join type to CROSS

Posted by Sean Owen <sr...@gmail.com>.
You asked for an inner join but it turned into a cross-join. This
might be surprising, hence the error you can disable.
The query is not invalid in any case. It's just stopping you from
doing something you may not meant to, and which may be expensive.
However I think we've already changed the default to enable it in
Spark 3 anyway.

On Wed, Nov 6, 2019 at 8:50 AM Enrico Minack <ma...@enrico.minack.dev> wrote:
>
> Hi,
>
> I would like to discuss issue SPARK-29176 to see if this is considered a bug and if so, to sketch out a fix.
>
> In short, the issue is that a valid inner join with condition gets optimized so that no condition is left, but the type is still INNER. Then CheckCartesianProducts throws an exception. The type should have changed to CROSS when it gets optimized in that way.
>
> I understand that with spark.sql.crossJoin.enabled you can make Spark not throw this exception, but I think you should not need this work-around for a valid query.
>
> Please let me know what you think about this issue and how I could fix it. It might affect more rules than the two given in the Jira ticket.
>
> Thanks,
> Enrico

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org