You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Carrot Hu -CIC <ca...@cicdata.com> on 2015/07/10 05:03:45 UTC

DRILL on HBSE: SELECT COUNT(*) issue

Hi,

I created a test table in HBASE with 2 column families [‘cf0’, ‘cf1’] and 3 columns for each, all same value, 100,000 rows in total.

SELECT COUNT(*) will not return the correct row count (much less, a few thousands).
However, doing SELECT on a subset of those columns gives the right count.

If I reduced the column number to 2 in this case, SELECT COUNT(*) gives the right count.

Later I tried tables with only one column family, drill returns the right count only when column number < 6.

What could be the explanations of this matter?
Have I missed any drill configurations?

Carrot Hu, 胡意仪
R&D Engineer
Tech BU
[cid:image001.png@01D0BAFD.F934E570]
a Kantar Media Company

[cid:image003.jpg@01D0BB00.16F2CF30]T: +86 (21) 5237 3860 ext. 8704  |   F: +86 (21) 5237 3632
Email/MSN: carrot.hu@cicdata.com<ma...@cicdata.com>
A: Floor 4, Building 6, Fenglin Link, No.485 Feng Lin Road, Xuhui District, Shanghai, 200032, China
Corp. Website: www.ciccorporate.com<http://www.ciccorporate.com/>    |    Service Platform: www.iwommaster.com<http://www.iwommaster.com/>   |   Corp. Blog: www.seeisee.com<http://www.seeisee.com/>
Weibo: @seeisee<http://weibo.com/seeisee>   |   Wechat: seeiseeCHAT   |   Facebook: facebook.com/CICcorporate<http://www.facebook.com/CICcorporate>

免责声明:本信函为可能包含机密信息的非公开文件。如果阁下非信函所指定之收件人,谨请立即通知发件人,并敬请阁下不要使用、保存、复印、散布信函内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作.DISCLAIMER: This e-mail and any file transmitted with it are confidential and may contain legally privileged information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying, dissemination of, or taking any action in reliance on this e-mail is strictly prohibited. If you have received this e-mail in error, please immediately inform us by returning e-mail, and thereafter, proceed to permanently delete the entire e-mail sent in error. Thank you.


答复: DRILL on HBSE: SELECT COUNT(*) issue

Posted by Carrot Hu -CIC <ca...@cicdata.com>.
Hi,

Thanks Steven, and George for sharing your experience.

George, I understand that you have quite a lot of data, but have you tried to count them all ?

As this matter occurs when selecting all columns, selection on a subset of them is correct.


> >
> >
> >
> >

In my Job, I used Drill to query HBase all the time and the HBase contains billions of rows and over 20 columns, I never noticed any count errors.

On Sat, Jul 11, 2015 at 3:57 AM, Steven Phillips <sp...@maprtech.com>
wrote:

> This looks like a bug to me. You should file a jira, and include as 
> much information as possible for us to reproduce the issue.
>
> On Thu, Jul 9, 2015 at 8:03 PM, Carrot Hu -CIC <ca...@cicdata.com>
> wrote:
>
> > Hi,
> >
> >
> >
> > I created a test table in HBASE with 2 column families [‘cf0’, 
> > ‘cf1’] and
> > 3 columns for each, all same value, 100,000 rows in total.
> >
> >
> >
> > SELECT COUNT(*) will not return the correct row count (much less, a 
> > few thousands).
> >
> > However, doing SELECT on a subset of those columns gives the right count.
> >
> >
> >
> > If I reduced the column number to 2 in this case, SELECT COUNT(*) 
> > gives the right count.
> >
> >
> >
> > Later I tried tables with only one column family, drill returns the 
> > right count only when column number < 6.
> >
> >
> >
> > What could be the explanations of this matter?
> >
> > Have I missed any drill configurations?
> >
> >
> >
> > *Carrot Hu,
> >
> > *R&D Engineer*
> >
> >
> >
> >
> >
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>



Re: DRILL on HBSE: SELECT COUNT(*) issue

Posted by George Lu <lu...@gmail.com>.
In my Job, I used Drill to query HBase all the time and the HBase contains
billions of rows and over 20 columns, I never noticed any count errors.

On Sat, Jul 11, 2015 at 3:57 AM, Steven Phillips <sp...@maprtech.com>
wrote:

> This looks like a bug to me. You should file a jira, and include as much
> information as possible for us to reproduce the issue.
>
> On Thu, Jul 9, 2015 at 8:03 PM, Carrot Hu -CIC <ca...@cicdata.com>
> wrote:
>
> > Hi,
> >
> >
> >
> > I created a test table in HBASE with 2 column families [‘cf0’, ‘cf1’] and
> > 3 columns for each, all same value, 100,000 rows in total.
> >
> >
> >
> > SELECT COUNT(*) will not return the correct row count (much less, a few
> > thousands).
> >
> > However, doing SELECT on a subset of those columns gives the right count.
> >
> >
> >
> > If I reduced the column number to 2 in this case, SELECT COUNT(*) gives
> > the right count.
> >
> >
> >
> > Later I tried tables with only one column family, drill returns the right
> > count only when column number < 6.
> >
> >
> >
> > What could be the explanations of this matter?
> >
> > Have I missed any drill configurations?
> >
> >
> >
> > *Carrot Hu, **胡意仪*
> >
> > *R&D Engineer*
> >
> > *Tech BU*
> >
> > *[image: cid:image002.png@01CD5936.EAA1C9F0]*
> >
> > a Kantar Media Company
> >
> >
> >
> > [image: seeiseeCHAT Weixin QRcode.png]T: +86 (21) 5237 3860 ext. 8704
> > |   F: +86 (21) 5237 3632
> >
> > Email/MSN: carrot.hu@cicdata.com
> >
> > A: Floor 4, Building 6, Fenglin Link, No.485 Feng Lin Road, Xuhui
> > District, Shanghai, 200032, China
> >
> > Corp. Website: www.ciccorporate.com    |    Service Platform:
> > www.iwommaster.com   |   Corp. Blog: www.seeisee.com
> >
> > Weibo: @seeisee <http://weibo.com/seeisee>   |   Wechat: seeiseeCHAT   |
> >   Facebook: facebook.com/CICcorporate
> > <http://www.facebook.com/CICcorporate>
> >
> >
> >
> >
> >
> 免责声明:本信函为可能包含机密信息的非公开文件。如果阁下非信函所指定之收件人,谨请立即通知发件人,并敬请阁下不要使用、保存、复印、散布信函内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作.DISCLAIMER:
> > This e-mail and any file transmitted with it are confidential and may
> > contain legally privileged information. If you are not the intended
> > recipient, you are hereby notified that any use, disclosure, copying,
> > dissemination of, or taking any action in reliance on this e-mail is
> > strictly prohibited. If you have received this e-mail in error, please
> > immediately inform us by returning e-mail, and thereafter, proceed to
> > permanently delete the entire e-mail sent in error. Thank you.
> >
> >
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Re: DRILL on HBSE: SELECT COUNT(*) issue

Posted by Steven Phillips <sp...@maprtech.com>.
This looks like a bug to me. You should file a jira, and include as much
information as possible for us to reproduce the issue.

On Thu, Jul 9, 2015 at 8:03 PM, Carrot Hu -CIC <ca...@cicdata.com>
wrote:

> Hi,
>
>
>
> I created a test table in HBASE with 2 column families [‘cf0’, ‘cf1’] and
> 3 columns for each, all same value, 100,000 rows in total.
>
>
>
> SELECT COUNT(*) will not return the correct row count (much less, a few
> thousands).
>
> However, doing SELECT on a subset of those columns gives the right count.
>
>
>
> If I reduced the column number to 2 in this case, SELECT COUNT(*) gives
> the right count.
>
>
>
> Later I tried tables with only one column family, drill returns the right
> count only when column number < 6.
>
>
>
> What could be the explanations of this matter?
>
> Have I missed any drill configurations?
>
>
>
> *Carrot Hu, **胡意仪*
>
> *R&D Engineer*
>
> *Tech BU*
>
> *[image: cid:image002.png@01CD5936.EAA1C9F0]*
>
> a Kantar Media Company
>
>
>
> [image: seeiseeCHAT Weixin QRcode.png]T: +86 (21) 5237 3860 ext. 8704
> |   F: +86 (21) 5237 3632
>
> Email/MSN: carrot.hu@cicdata.com
>
> A: Floor 4, Building 6, Fenglin Link, No.485 Feng Lin Road, Xuhui
> District, Shanghai, 200032, China
>
> Corp. Website: www.ciccorporate.com    |    Service Platform:
> www.iwommaster.com   |   Corp. Blog: www.seeisee.com
>
> Weibo: @seeisee <http://weibo.com/seeisee>   |   Wechat: seeiseeCHAT   |
>   Facebook: facebook.com/CICcorporate
> <http://www.facebook.com/CICcorporate>
>
>
>
>
> 免责声明:本信函为可能包含机密信息的非公开文件。如果阁下非信函所指定之收件人,谨请立即通知发件人,并敬请阁下不要使用、保存、复印、散布信函内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作.DISCLAIMER:
> This e-mail and any file transmitted with it are confidential and may
> contain legally privileged information. If you are not the intended
> recipient, you are hereby notified that any use, disclosure, copying,
> dissemination of, or taking any action in reliance on this e-mail is
> strictly prohibited. If you have received this e-mail in error, please
> immediately inform us by returning e-mail, and thereafter, proceed to
> permanently delete the entire e-mail sent in error. Thank you.
>
>
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com