You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2019/02/19 18:29:00 UTC

[jira] [Created] (IMPALA-8214) Bad plan in load_nested.py

Tim Armstrong created IMPALA-8214:
-------------------------------------

             Summary: Bad plan in load_nested.py
                 Key: IMPALA-8214
                 URL: https://issues.apache.org/jira/browse/IMPALA-8214
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 3.1.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong


The plan for the below SQL, which is executed without stats, has the larger input on the build side of the join and does a broadcast join, which is very suboptimal. This causes high memory consumption when loading larger scale factors, and generally makes the loading process slower than necessary. We should flip the join and make it a shuffle join.

https://github.com/apache/impala/blob/d481cd4/testdata/bin/load_nested.py#L123
{code}

      tmp_customer_sql = r"""
          SELECT
            c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment,
            c_comment,
            GROUP_CONCAT(
              CONCAT(
                CAST(o_orderkey AS STRING), '\003',
                CAST(o_orderstatus AS STRING), '\003',
                CAST(o_totalprice AS STRING), '\003',
                CAST(o_orderdate AS STRING), '\003',
                CAST(o_orderpriority AS STRING), '\003',
                CAST(o_clerk AS STRING), '\003',
                CAST(o_shippriority AS STRING), '\003',
                CAST(o_comment AS STRING), '\003',
                CAST(lineitems_string AS STRING)
              ), '\002'
            ) orders_string
          FROM {source_db}.customer
          LEFT JOIN tmp_orders_string ON c_custkey = o_custkey
          WHERE c_custkey % {chunks} = {chunk_idx}
          GROUP BY 1, 2, 3, 4, 5, 6, 7, 8""".format(**sql_params)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)