You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "xiatch (Jira)" <ji...@apache.org> on 2022/05/23 09:51:00 UTC
[jira] [Updated] (HIVE-26257) Mapjoin with LateralViewJoin generates wrong plan in Spark
[ https://issues.apache.org/jira/browse/HIVE-26257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xiatch updated HIVE-26257:
--------------------------
Description:
Queries like these
{{SELECT}}
{{ *}}
{{FROM}}
{{ (}}
{{ SELECT}}
{{ C.lv_col,}}
{{ '1' AS match_col}}
{{ FROM}}
{{ (}}
{{ SELECT}}
{{ '1' AS a}}
{{ ) B LATERAL VIEW explode(split('abcd', ';')) C AS lv_col}}
{{ ) A}}
{{ LEFT JOIN}}
{{ (}}
{{ SELECT}}
{{ '1' AS match_col}}
{{ FROM}}
{{ (}}
{{ SELECT}}
{{ 'a' AS b}}
{{ ) E}}
{{ LEFT JOIN (}}
{{ SELECT}}
{{ 'a' AS c}}
{{ ) F ON E.b = F.c}}
{{ ) D ON A.match_col = D.match_col;}}
generates twice the number of rows in Spark when compared to MR.
was:
Queries like these
# SELECT
# *
# FROM
# (
# SELECT
# C.lv_col,
# '1' AS match_col
# FROM
# (
# SELECT
# '1' AS a
# ) B LATERAL VIEW explode(split('abcd', ';')) C AS lv_col
# ) A
# LEFT JOIN
# (
# SELECT
# '1' AS match_col
# FROM
# (
# SELECT
# 'a' AS b
# ) E
# LEFT JOIN (
# SELECT
# 'a' AS c
# ) F ON E.b = F.c
# ) D ON A.match_col = D.match_col;
generates twice the number of rows in Spark when compared to MR.
> Mapjoin with LateralViewJoin generates wrong plan in Spark
> ----------------------------------------------------------
>
> Key: HIVE-26257
> URL: https://issues.apache.org/jira/browse/HIVE-26257
> Project: Hive
> Issue Type: Bug
> Reporter: xiatch
> Assignee: xiatch
> Priority: Major
>
> Queries like these
> {{SELECT}}
> {{ *}}
> {{FROM}}
> {{ (}}
> {{ SELECT}}
> {{ C.lv_col,}}
> {{ '1' AS match_col}}
> {{ FROM}}
> {{ (}}
> {{ SELECT}}
> {{ '1' AS a}}
> {{ ) B LATERAL VIEW explode(split('abcd', ';')) C AS lv_col}}
> {{ ) A}}
> {{ LEFT JOIN}}
> {{ (}}
> {{ SELECT}}
> {{ '1' AS match_col}}
> {{ FROM}}
> {{ (}}
> {{ SELECT}}
> {{ 'a' AS b}}
> {{ ) E}}
> {{ LEFT JOIN (}}
> {{ SELECT}}
> {{ 'a' AS c}}
> {{ ) F ON E.b = F.c}}
> {{ ) D ON A.match_col = D.match_col;}}
> generates twice the number of rows in Spark when compared to MR.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)