You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Aditya Shah (JIRA)" <ji...@apache.org> on 2018/08/24 08:02:00 UTC
[jira] [Created] (HIVE-20456) Query fails with FNFException using MR with skewjoin enabled and auto convert join disabled

Aditya Shah created HIVE-20456:
----------------------------------

             Summary: Query fails with FNFException using MR with skewjoin enabled and auto convert join disabled
                 Key: HIVE-20456
                 URL: https://issues.apache.org/jira/browse/HIVE-20456
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 3.1.0, 2.1.1, 1.2.0
            Reporter: Aditya Shah
            Assignee: Aditya Shah


When skew join is enabled and auto convert join is disabled the query fails with file not found exception. The following query reproduces the error:



set hive.optimize.skewjoin = true;
set hive.auto.convert.join = false;
set hive.groupby.orderby.position.alias = true;
set hive.on.master=true;
set hive.execution.engine=mr;

drop database if exists test cascade;
create database if not exists test;
use test;

CREATE EXTERNAL TABLE test_table1
( `a` int , `b` int, `c` int)
PARTITIONED BY (
`d` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;

CREATE EXTERNAL TABLE test_table2
( `a` int , `b` int, `c` int)
PARTITIONED BY (
`d` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';


CREATE EXTERNAL TABLE test_table3
( `a` int , `b` int, `c` int)
PARTITIONED BY (
`e` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\u0001',
'serialization.format'='\u0001')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';


CREATE EXTERNAL TABLE test_table4 (`a` int , `b` int, `c` int)
PARTITIONED BY (
`e` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\u0001',
'serialization.format'='\u0001')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';


with
temp1 as (
select
g.a,
n.b,
u.c
from
test_table2 g
inner join test_table4 u on g.a = u.a
inner join test_table3 n on u.b = n.b
),
temp2 as (
select * from test_table4 where a = 2
),
temp21 as (
select
g.b,
n.c,
u.a
from
temp2 g
inner join test_table3 u on g.b = u.b
inner join test_table2 n on u.c = n.c
group by g.b, n.c, u.a
),
stack as (
select * from temp1
union all
select * from temp21
)
select * from stack;



The query runs perfectly fine when tez is used or other combinations of skew join and auto convert join are set. On diagnosing the issue, the problem was when a conditional task resolves tasks it puts the resolved task directly in the runnable state without checking the parental dependencies as well as whether the task is already queued.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)