You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "gaozhan ding (Jira)" <ji...@apache.org> on 2020/10/19 12:09:00 UTC

[jira] [Updated] (HIVE-23667) Incorrect output with option hive.auto.convert.join=fasle

     [ https://issues.apache.org/jira/browse/HIVE-23667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaozhan ding updated HIVE-23667:
--------------------------------
    Description: 
We use hive with version 3.1.0 with tez engine 0.9.1.3

I encountered an error when executing a hive SQL. This SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;
set mapred.reduce.tasks=50;
set hive.auto.convert.join=true;
use xxx;

select count(*) from   A  join B on  B.b=A.a;{code}
with the output.
{code:java}
+----------+ | _c0 | +----------+ | 4954736 | +----------+
{code}
But when the hive.auto.convert.join option is set to false,the utput is not as expected。

The SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;  
set mapred.reduce.tasks=50;
set hive.auto.convert.join=false; //changed
use xxx;

select count(*) from A join B on B.b=A.a;{code}
with output:
{code:java}
+------+ | _c0 | +------+ | 0 | +------+
{code}
Beside,both tables participating in the join are partition tables.

Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql output expected results.

We just upgraded hive from 1.2 to 3.1.0, and we found that these problems only occurred in the old hive table.

  was:
We use hive with version 3.1.0 with tez engine 0.9.1.3

I encountered an error when executing a hive SQL. This SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;
set mapred.reduce.tasks=50;
set hive.auto.convert.join=true;
use xxx;

select count(*) from   230_dim_site  join dw_fact_inverter_detail on  dw_fact_inverter_detail.site=230_dim_site.id;{code}
with the output.
{code:java}
+----------+ | _c0 | +----------+ | 4954736 | +----------+
{code}
But when the hive.auto.convert.join option is set to false,the utput is not as expected。

The SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;  
set mapred.reduce.tasks=50;
set hive.auto.convert.join=false; //changed
use xxx;

select count(*) from   230_dim_site  join dw_fact_inverter_detail on  dw_fact_inverter_detail.site=230_dim_site.id;{code}
with output:
{code:java}
+------+ | _c0 | +------+ | 0 | +------+
{code}
Beside,both tables participating in the join are partition tables.

Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql output expected results.

We just upgraded hive from 1.2 to 3.1.0, and we found that these problems only occurred in the old hive table.


> Incorrect output with option hive.auto.convert.join=fasle
> ---------------------------------------------------------
>
>                 Key: HIVE-23667
>                 URL: https://issues.apache.org/jira/browse/HIVE-23667
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: gaozhan ding
>            Priority: Critical
>
> We use hive with version 3.1.0 with tez engine 0.9.1.3
> I encountered an error when executing a hive SQL. This SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=10000;
> set hive.exec.max.dynamic.partitions=10000;
> set hive.fileformat.check=false;
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=true;
> use xxx;
> select count(*) from   A  join B on  B.b=A.a;{code}
> with the output.
> {code:java}
> +----------+ | _c0 | +----------+ | 4954736 | +----------+
> {code}
> But when the hive.auto.convert.join option is set to false,the utput is not as expected。
> The SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=10000;
> set hive.exec.max.dynamic.partitions=10000;
> set hive.fileformat.check=false;  
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=false; //changed
> use xxx;
> select count(*) from A join B on B.b=A.a;{code}
> with output:
> {code:java}
> +------+ | _c0 | +------+ | 0 | +------+
> {code}
> Beside,both tables participating in the join are partition tables.
> Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql output expected results.
> We just upgraded hive from 1.2 to 3.1.0, and we found that these problems only occurred in the old hive table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)