You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Chen Song <ch...@gmail.com> on 2012/12/19 17:55:57 UTC
Re: Hive producing difference outputs
resent as no one has replied yet :)
On Thu, Nov 15, 2012 at 11:18 AM, Chen Song <ch...@gmail.com> wrote:
> Hi Folks
>
> We are getting inconsistent output when running some semantically same
> Hive queries. We are using *CDH3u3* with *Hive 0.7.1.*
>
> The query I am running performs a map-side join of two subqueries (s1 and
> s2)--s1 is a multi-table join and s2 is a union of a table against itself.
> If we add in mapside join to s1 (which is necessary for performance) the
> query produces different (and incorrect) results.
>
> Basically, s1 returns one row and s2 returns 2 rows. By joining them, it
> should give us two rows. See below for output and queries (I abridged the
> queries a bit for readability).
>
> NOTE: I searched online and found anther thread reporting a similar issue
> as what I have seen,
> http://mail-archives.apache.org/mod_mbox/hive-user/201207.mbox/%3CCADejTpz5rSf-Sxi9HNoCTRHVazOk=F+bu6XPL=K7TBkNQJ92qg@mail.gmail.com%3E.
> Unfortunately, this thread never got answered. It looks like a bug in Hive.
> Greatly appreciate if anyone give thoughts on this issue.
>
>
> Q1 output (incorrect, missing one row):
> 2012-08-15 18:56:38 2258929231696355094 1 10 2258929231696355094
> 2012-08-15 18:56:38 1315 0 US 728 90 0 97525 1 0 NULL 1 0 NULL
>
> Q2 output (correct):
> 2012-08-15 18:56:38 2258929231696355094 1 10 2258929231696355094
> 2012-08-15 18:56:38 1315 0 US 728 90 0 97525 1 0 NULL 10 NULL 0
> 2012-08-15 18:56:38 2258929231696355094 1 10 2258929231696355094
> 2012-08-15 18:56:38 1315 0 US 728 90 0 97525 1 0 NULL 10 NULL 1
>
> Q1 and Q2 only differs in the way that s1 in Q1 has a mapside join.
>
> - Q1 (incorrect results, highlighted portion is the only difference
> from Q2):
>
>
> SELECT /*+MAPJOIN(table) */ *
> FROM
> (
> SELECT /*+ MAPJOIN(tableA,tableB) */
> id,
> ...
> FROM main_table ctg
> JOIN sample smp
> ON ctg.id = smp.id
> LEFT OUTER JOIN tableA
> ON smp.publisher_id = tableA.id
> LEFT OUTER JOIN tableB
> ON smp.inventory_source_id = tableB.id
> WHERE ctg.date_time >= '2012-08-15 00:00:00' and
> ctg.date_time <= '2012-08-15 23:59:59' and
> ctg.dd = '2012-08-15' and
> smp.date_time >= '2012-08-15 00:00:00' and
> smp.date_time <= '2012-08-15 23:59:59' and
> smp.dd = '2012-08-15'
> )sub
> JOIN
> (
> select
> id,
> parent_category_id,
> cast(0 as tinyint) as is_parent
> from table_category
> where is_system = 1
> union all
> select
> id,
> parent_category_id,
> cast(1 as tinyint) as is_parent
> from table_category
> where is_system = 1
> )table
> ON table.id = sub.id
>
>
> - Q2 (correct results):
>
> SELECT /*+MAPJOIN(table) */ *
> FROM
> (
> SELECT
> id,
> ...
> FROM main_table ctg
> JOIN sample smp
> ON ctg.id = smp.id
> LEFT OUTER JOIN tableA
> ON smp.publisher_id = tableA.id
> LEFT OUTER JOIN tableB
> ON smp.inventory_source_id = tableB.id
> WHERE ctg.date_time >= '2012-08-15 00:00:00' and
> ctg.date_time <= '2012-08-15 23:59:59' and
> ctg.dd = '2012-08-15' and
> smp.date_time >= '2012-08-15 00:00:00' and
> smp.date_time <= '2012-08-15 23:59:59' and
> smp.dd = '2012-08-15'
> )sub
> JOIN
> (
> select
> id,
> parent_category_id,
> cast(0 as tinyint) as is_parent
> from table_category
> where is_system = 1
> union all
> select
> id,
> parent_category_id,
> cast(1 as tinyint) as is_parent
> from table_category
> where is_system = 1
> )table
> ON table.id = sub.id
>
> --
> Chen Song
>
>
>
--
Chen Song