You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "michaelli (Jira)" <ji...@apache.org> on 2022/05/31 08:01:00 UTC

[jira] [Created] (HIVE-26273) “file does not exist” exception occured when using spark dynamic partition pruning and small table is empty

michaelli created HIVE-26273:
--------------------------------

             Summary: “file does not exist” exception occured when using spark dynamic partition pruning and small table is empty
                 Key: HIVE-26273
                 URL: https://issues.apache.org/jira/browse/HIVE-26273
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
    Affects Versions: 2.1.1
            Reporter: michaelli
         Attachments: execution plan for good run.txt, execution plan for issue run.txt, issue log.txt

*Issue summary:*

When inner join tableA to tableB on partition key of tableB, if dynamic partition pruning is enabled and tableA is emplty, the query will fail with below exception: 

Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: File hdfs://nameservice1/tmp/hive/hive/fddbc5ac-3596-428d-8b42-cbc61952d182/hive_2022-05-30_14-03-17_139_1843975612196554546-15339/-mr-10003/2/1 does not exist. (state=42000,code=3).

I encountered this when using hive-2.1.1-cdh6.3.2, and i think this occurs to other versions too.

*Steps to reproduce the issue:*

1. prepare tables:
CREATE TABLE tableA (                             
   businsys_no decimal(10,0),                     
   acct_id string,                                                   
   prod_code string)  
PARTITIONED BY (init_date int)       
stored as orc;                               
CREATE TABLE tableB (      
   client_id string,                              
   open_date decimal(10,0),                       
   client_status string,                          
   organ_flag string)                                           
 PARTITIONED BY (businsys_no decimal(10,0))  
stored as orc;    

2. prepare data for tables:

 – tableA should be emplty;
-- prepare some data for tableB

3. run below steps to reproduce the issue:

set hive.execution.engine=spark;
set hive.auto.convert.join=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.spark.dynamic.partition.pruning.map.join.only=true;
select *
      from (select *
            from tableA fp
           where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no
                 from tableB ic) ici on cfp.businsys_no = ici.businsys_no
  and cfp.acct_id = ici.client_id;
4. currently we turned off spark dynamic partition pruning to workaround this:

set hive.execution.engine=spark;
set hive.auto.convert.join=true;
set hive.spark.dynamic.partition.pruning=false;
set hive.spark.dynamic.partition.pruning.map.join.only=false;
select *
      from (select *
            from tableA fp
           where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no
                 from tableB ic) ici on cfp.businsys_no = ici.businsys_no
  and cfp.acct_id = ici.client_id;

 

*execution logs and execution plan:*

the execution logs and execution plans are attached:

 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)