You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Sukhendu Chakraborty <su...@gmail.com> on 2014/05/02 04:10:42 UTC

SMB join bug

I am seeing very different number of rows in this query output depending on
whether I enable SMB join:

select count(*)
from dss.hist_hshld_profl_mc  a
          join
          dss.hshld_summary_mc     b
       on a.hh_key = b.hh_key
 where ('2012-02-27' between a.hshld_profl_eff_dt and a.hshld_profl_exp_dt)
      and a.hshld_exp_dt='9999-12-31'
   and trim(a.cntry_id) = 'USA'

The SMB join returns 60 rows (wrong value) while the regular join returns
30million plus rows (correct value).

Is there a known issue/jira for this? We are using CDH5.0/hive-0.12.

-Sukhendu

Re: SMB join bug

Posted by Sukhendu Chakraborty <su...@gmail.com>.

Thanks. But this seems to happen for a partitioned bucketed table with
subqueries. While my use case is a basic join of non partitioned bucketed
tables. I will try the patch and let you know.
-Sukhendu
On May 2, 2014 12:10 PM, "Thejas Nair" <th...@hortonworks.com> wrote:

> It is possible that you hit this issue  -
> https://issues.apache.org/jira/browse/HIVE-5973
> It is fixed in apache hive 0.13 release.
>
>
> On Thu, May 1, 2014 at 7:10 PM, Sukhendu Chakraborty
> <su...@gmail.com> wrote:
> > I am seeing very different number of rows in this query output depending
> on
> > whether I enable SMB join:
> >
> > select count(*)
> > from dss.hist_hshld_profl_mc  a
> >           join
> >           dss.hshld_summary_mc     b
> >        on a.hh_key = b.hh_key
> >  where ('2012-02-27' between a.hshld_profl_eff_dt and
> a.hshld_profl_exp_dt)
> >       and a.hshld_exp_dt='9999-12-31'
> >    and trim(a.cntry_id) = 'USA'
> >
> > The SMB join returns 60 rows (wrong value) while the regular join returns
> > 30million plus rows (correct value).
> >
> > Is there a known issue/jira for this? We are using CDH5.0/hive-0.12.
> >
> > -Sukhendu
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: SMB join bug

Posted by Thejas Nair <th...@hortonworks.com>.

It is possible that you hit this issue  -
https://issues.apache.org/jira/browse/HIVE-5973
It is fixed in apache hive 0.13 release.


On Thu, May 1, 2014 at 7:10 PM, Sukhendu Chakraborty
<su...@gmail.com> wrote:
> I am seeing very different number of rows in this query output depending on
> whether I enable SMB join:
>
> select count(*)
> from dss.hist_hshld_profl_mc  a
>           join
>           dss.hshld_summary_mc     b
>        on a.hh_key = b.hh_key
>  where ('2012-02-27' between a.hshld_profl_eff_dt and a.hshld_profl_exp_dt)
>       and a.hshld_exp_dt='9999-12-31'
>    and trim(a.cntry_id) = 'USA'
>
> The SMB join returns 60 rows (wrong value) while the regular join returns
> 30million plus rows (correct value).
>
> Is there a known issue/jira for this? We are using CDH5.0/hive-0.12.
>
> -Sukhendu

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.