You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Gunther Hagleitner <gh...@hortonworks.com> on 2015/04/06 20:27:06 UTC

Re: Review Request 32549: HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32549/#review78961
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java
<https://reviews.apache.org/r/32549/#comment128080>

    this seems to patch the symptom - not fix the cause. see below (line 286). The code is supposed to detect if there are unions within the previous work and automatically set work to the union work (which later becomes the preceedingWork).
    
    Why isn't that part working?


- Gunther Hagleitner


On March 26, 2015, 8:25 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32549/
> -----------------------------------------------------------
> 
> (Updated March 26, 2015, 8:25 p.m.)
> 
> 
> Review request for hive, Gunther Hagleitner and Vikram Dixit Kumaraswamy.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In q.test environment with src table, execute the following query: 
> {code}
> CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE;
> 
> CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE;
> 
> FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
>                          UNION all 
>       select s2.key as key, s2.value as value from src s2) unionsrc
> INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key
> INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) 
> GROUP BY unionsrc.key, unionsrc.value;
> 
> select * from DEST1;
> select * from DEST2;
> {code}
> 
> DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row "tst1    500     1"
> 
> 
> Diffs
> -----
> 
>   itests/src/test/resources/testconfiguration.properties 288270e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java e67d98b 
>   ql/src/test/queries/clientpositive/tez_union_multiinsert.q PRE-CREATION 
>   ql/src/test/results/clientpositive/tez/tez_union_multiinsert.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32549/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>