You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2022/04/22 11:11:00 UTC

[jira] [Created] (HIVE-26168) EXPLAIN DDL command output is not deterministic

Stamatis Zampetakis created HIVE-26168:
------------------------------------------

             Summary: EXPLAIN DDL command output is not deterministic 
                 Key: HIVE-26168
                 URL: https://issues.apache.org/jira/browse/HIVE-26168
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
            Reporter: Stamatis Zampetakis


The EXPLAIN DDL command (HIVE-24596) can be used to recreate the schema for a given query in order to debug planner issues. This is achieved by fetching information from the metastore and outputting series of DDL commands. 

The output commands though may appear in different order among runs since there is no mechanism to enforce an explicit order.

Consider for instance the following scenario.

{code:sql}
CREATE TABLE customer
(
    `c_custkey` bigint,
    `c_name`    string,
    `c_address` string
);

INSERT INTO customer VALUES (1, 'Bob', '12 avenue Mansart'), (2, 'Alice', '24 avenue Mansart');

EXPLAIN DDL SELECT c_custkey FROM customer WHERE c_name = 'Bob'; 
{code}

+Result 1+

{noformat}
ALTER TABLE default.customer UPDATE STATISTICS SET('numRows'='2','rawDataSize'='48' );
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF 
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED 
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg== 
{noformat}

+Result 2+

{noformat}
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED
ALTER TABLE default.customer UPDATE STATISTICS SET('numRows'='2','rawDataSize'='48' );
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF  
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg== 
{noformat}

The two results are equivalent but the statements appear in a different order. This is not a big issue cause the results remain correct but it may lead to test flakiness so it might be worth addressing.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)