You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "michaelli (Jira)" <ji...@apache.org> on 2022/05/31 08:01:00 UTC
[jira] [Created] (HIVE-26273) “file does not exist” exception occured when using spark dynamic partition pruning and small table is empty
michaelli created HIVE-26273:
--------------------------------
Summary: “file does not exist” exception occured when using spark dynamic partition pruning and small table is empty
Key: HIVE-26273
URL: https://issues.apache.org/jira/browse/HIVE-26273
Project: Hive
Issue Type: Bug
Components: HiveServer2
Affects Versions: 2.1.1
Reporter: michaelli
Attachments: execution plan for good run.txt, execution plan for issue run.txt, issue log.txt
*Issue summary:*
When inner join tableA to tableB on partition key of tableB, if dynamic partition pruning is enabled and tableA is emplty, the query will fail with below exception:
Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: File hdfs://nameservice1/tmp/hive/hive/fddbc5ac-3596-428d-8b42-cbc61952d182/hive_2022-05-30_14-03-17_139_1843975612196554546-15339/-mr-10003/2/1 does not exist. (state=42000,code=3).
I encountered this when using hive-2.1.1-cdh6.3.2, and i think this occurs to other versions too.
*Steps to reproduce the issue:*
1. prepare tables:
CREATE TABLE tableA (
businsys_no decimal(10,0),
acct_id string,
prod_code string)
PARTITIONED BY (init_date int)
stored as orc;
CREATE TABLE tableB (
client_id string,
open_date decimal(10,0),
client_status string,
organ_flag string)
PARTITIONED BY (businsys_no decimal(10,0))
stored as orc;
2. prepare data for tables:
– tableA should be emplty;
-- prepare some data for tableB
3. run below steps to reproduce the issue:
set hive.execution.engine=spark;
set hive.auto.convert.join=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.spark.dynamic.partition.pruning.map.join.only=true;
select *
from (select *
from tableA fp
where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no
from tableB ic) ici on cfp.businsys_no = ici.businsys_no
and cfp.acct_id = ici.client_id;
4. currently we turned off spark dynamic partition pruning to workaround this:
set hive.execution.engine=spark;
set hive.auto.convert.join=true;
set hive.spark.dynamic.partition.pruning=false;
set hive.spark.dynamic.partition.pruning.map.join.only=false;
select *
from (select *
from tableA fp
where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no
from tableB ic) ici on cfp.businsys_no = ici.businsys_no
and cfp.acct_id = ici.client_id;
*execution logs and execution plan:*
the execution logs and execution plans are attached:
--
This message was sent by Atlassian Jira
(v8.20.7#820007)