You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Kevin Verhoeven <Ke...@ds-iq.com> on 2015/12/21 19:36:37 UTC

Drill query does not return all results from HBase

We have a problem where a Drill query against HBase does not return all results. The following query should return over 100,000 rows, but we only get about 1,030 back.

SELECT row_key FROM `hbase`.`customer_staged` WHERE customer_number = 800

If we scan directly using the hbase shell we see over 100,000 rows, but the same Drill query does not return a fraction of the expected results. We have also run a count against the table and Drill returns the same 1,030 number, which is far less than expect. What could be going wrong?

We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 (HBase 1.0). We run HBase on six RegionServers, the table has about 1.3 billion rows.

Thanks,

Kevin


RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
Thank you for your help, Aditya!  We were able to workaround this.

We replaced the included hbase client jar in the Drill distributable tarball to workaround the bug in https://issues.apache.org/jira/browse/HBASE-13262. The fix on the client side is in >= 0.98.12. We initially tested with 0.98.17 (latest available artifact in public mvn repository) but that the drill code encountered a deadlock when using it. Went ahead with 0.98.13.jar instead to minimize changes in interaction with Drill.
Files affected:
/opt/apache-drill/jars/3rdparty/
                hbase-annotations-0.98.13-hadoop2.jar
                hbase-client-0.98.13-hadoop2.jar
                hbase-common-0.98.13-hadoop2.jar
                hbase-protocol-0.98.13-hadoop2.jar
Another workaround for the bug is to configure the client scanners to have a max batch size of <= the server's default batch size (which varies between 98 and 1.x).
Add this line in the hbase storage plugin through the Drill UI:
"hbase.client.scanner.max.result.size": "1"
It should be within the config dictionary that also holds the hbase.zookeeper.quorum property.

-Kumiko

From: Aditya [mailto:adityakishore@gmail.com]
Sent: Monday, March 21, 2016 1:08 PM
To: Kevin Verhoeven <Ke...@ds-iq.com>
Cc: user@drill.apache.org; Kumiko Yada <Ku...@ds-iq.com>; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

I did not see any issue when running with HBase 0.98.7 client bundled with Drill against HBase 1.1 servers.
I have just assigned DRILL-4199[1] to myself to evaluate moving to HBase 1.1 in next Drill release.

[1] https://issues.apache.org/jira/browse/DRILL-4199

On Mon, Mar 21, 2016 at 12:13 PM, Kevin Verhoeven <Ke...@ds-iq.com>> wrote:
Aditya,

Looking into the bug we read that the behavior will still occur if the hbase-client version does not include the fix (between a 0.98 client and 1.0 server). The hbase-client used by Drill under jars/3rdparty is hbase-client-0.98.7-hadoop2.jar which does not include the fix. I updated the hbase-client jar with hbase-client-0.98.17-hadoop2.jar, but I receive a java.lang.NoClassDefFoundError error. Are you able to test Drill with an updated hbase-client jar against CDH? Here is the error I received:

2016-03-21 18:58:36,874 [USER-rpc-event-queue] ERROR o.a.d.exec.server.rest.QueryWrapper - Query Failed
org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Failure while loading table test6c in database hbase.
Message:  com.google.protobuf.ServiceException: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge

        at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269) [drill-rpc-1.4.0.jar:1.4.0]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]

Thanks,

Kevin

-----Original Message-----
From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>]
Sent: Monday, March 21, 2016 11:33 AM
To: adityakishore@gmail.com<ma...@gmail.com>
Cc: Kumiko Yada <Ku...@ds-iq.com>>; user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>
Subject: RE: Drill query does not return all results from HBase

Thanks Aditya,

I also see that the bug was backported in CDH 5.4.3: https://archive.cloudera.com/cdh5/cdh/5/hbase-1.0.0-cdh5.4.3.releasenotes.html. I tested Drill on CDH version 5.4.2, 5.4.3, 5.4.7, and 5.5.2 and see the same behavior.

Kevin

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Monday, March 21, 2016 10:26 AM
To: Kevin Verhoeven <Ke...@ds-iq.com>>
Cc: Kumiko Yada <Ku...@ds-iq.com>>; user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Since I suspected that it was a bug in HBase, I tried it with the original version you reported in the first post in this thread, i.e. CDH 5.4.3.
If it was back-ported to 5.4.7, upgrading should fix this issue.

On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven <Ke...@ds-iq.com>>> wrote:
Aditya,

Thank you for your help. What version of CDH are you running? I contacted Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.

Thanks,

Kevin

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>>]
Sent: Sunday, March 20, 2016 10:45 PM

To: Kumiko Yada <Ku...@ds-iq.com>>>
Cc: user@drill.apache.org<ma...@drill.apache.org>>; dev@drill.apache.org<ma...@drill.apache.org>>; altekrusejason@gmail.com<ma...@gmail.com>>; Ki Kang <Ki...@ds-iq.com>>>; Kevin Verhoeven <Ke...@ds-iq.com>>>
Subject: Re: Drill query does not return all results from HBase

Finally managed to reproduce it with CDH distribution (So far I was testing with HBase 1.1 distributed with MapR, which does not have this bug).
This is essentially an HBase bug, HBASE-13262[1], which has been fixed in 1.0.1, 1.1.0.
Please update your HBase distribution.

[1] https://issues.apache.org/jira/browse/HBASE-13262

On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com>>> wrote:
Aditya,

When we were exchanging the emails, you mentioned to me that you discovered another issue in case where the table is spit into multiple regions and the first region returned to the client did not have any rows.  I think this issue is related to the issue that I’m seeing.  Have you opened the JIRA for this issue?  Have you investigated/fixed this issue?

Thanks
Kumiko

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>>]
Sent: Thursday, March 17, 2016 3:02 PM
To: Kumiko Yada <Ku...@ds-iq.com>>>
Cc: user@drill.apache.org<ma...@drill.apache.org>>; dev@drill.apache.org<ma...@drill.apache.org>>; altekrusejason@gmail.com<ma...@gmail.com>>; Ki Kang <Ki...@ds-iq.com>>>; Kevin Verhoeven <Ke...@ds-iq.com>>>

Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have failed so far.
From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops returning rows after a while which seem odd.
Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>>> wrote:
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>>]
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>>>
Cc: altekrusejason@gmail.com<ma...@gmail.com>>; Ki Kang <Ki...@ds-iq.com>>>; Kevin Verhoeven <Ke...@ds-iq.com>>>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>>> wrote:

> Hello,
>
> Does anyone have any update on this issue,
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada
> [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>>]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org<ma...@drill.apache.org>>;
> altekrusejason@gmail.com<ma...@gmail.com>>
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse
> [mailto:altekrusejason@gmail.com<ma...@gmail.com>>]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>>>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the
> incorrect result? In the incorrect case I think we are just misreading
> the HBase metadata during our optimization to return row counts
> without reading any data. This should be really fast, and noticeably
> different than running a complete query, even with a small dataset as
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is
> occurring, I will hopefully have time soon to get this fixed but I'm
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada
> <Ku...@ds-iq.com>>>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse
> <al...@gmail.com>>>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> did differently; however your guess were correct.  When I did the one
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important
> for us that this is fixed.  Let me know if I can do anything to get
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family,
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family,
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse'
> <al...@gmail.com>>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>>; Kevin
> Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try
> force a full scan, but the result was the same, just 10000 rows
> selected.  With the same table (contains 6 columns), I run the query
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse'
> <al...@gmail.com>>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>>; Kevin
> Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse
> [mailto:altekrusejason@gmail.com<ma...@gmail.com>> <
> altekrusejason@gmail.com<ma...@gmail.com>>>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada
> <Ku...@ds-iq.com>>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>>; Kevin
> Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try
> to skip reading the dataset all together if we have metadata for the
> number of rows and all that is requested is a count(*). I assume that
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada
> <Ku...@ds-iq.com>>>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse
> [mailto:altekrusejason@gmail.com<ma...@gmail.com>>]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>>>
>
> Cc: Aditya Kishore
> <ad...@gmail.com>>>; Kevin
> Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but
> this is a critical wrong results issues, so I will be taking a look at
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada
> <Ku...@ds-iq.com>>>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada
> > [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>>]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>>
> > Cc: Aditya Kishore
> > <ad...@gmail.com>>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > row count returned when the Hbase table contains only 1 column
> > family, 1 column, but the incorrect row count is returned for the
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish
> > [mailto:abhishek.girish@gmail.com<ma...@gmail.com>>]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>>>
> > Cc: Aditya Kishore
> > <ad...@gmail.com>>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > now and don't see the issue. Also, I am on a MapR setup, which might
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > <al...@gmail.com>>
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what
> > > version did you upgrade to that solved the problem? I assume this
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com<ma...@gmail.com>>
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > to fetch all records, whereas I was only able to see a small
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was
> > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com<ma...@gmail.com>>
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > <Ku...@ds-iq.com>>>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven
> > > > > > [mailto:Kevin.Verhoeven@ds-iq.com<ma...@ds->
> > > > > > iq.com<http://iq.com>>]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org<ma...@drill.apache.org>>
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>




RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
Thank you for your help, Aditya!  We were able to workaround this.

We replaced the included hbase client jar in the Drill distributable tarball to workaround the bug in https://issues.apache.org/jira/browse/HBASE-13262. The fix on the client side is in >= 0.98.12. We initially tested with 0.98.17 (latest available artifact in public mvn repository) but that the drill code encountered a deadlock when using it. Went ahead with 0.98.13.jar instead to minimize changes in interaction with Drill.
Files affected:
/opt/apache-drill/jars/3rdparty/
                hbase-annotations-0.98.13-hadoop2.jar
                hbase-client-0.98.13-hadoop2.jar
                hbase-common-0.98.13-hadoop2.jar
                hbase-protocol-0.98.13-hadoop2.jar
Another workaround for the bug is to configure the client scanners to have a max batch size of <= the server's default batch size (which varies between 98 and 1.x).
Add this line in the hbase storage plugin through the Drill UI:
"hbase.client.scanner.max.result.size": "1"
It should be within the config dictionary that also holds the hbase.zookeeper.quorum property.

-Kumiko

From: Aditya [mailto:adityakishore@gmail.com]
Sent: Monday, March 21, 2016 1:08 PM
To: Kevin Verhoeven <Ke...@ds-iq.com>
Cc: user@drill.apache.org; Kumiko Yada <Ku...@ds-iq.com>; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

I did not see any issue when running with HBase 0.98.7 client bundled with Drill against HBase 1.1 servers.
I have just assigned DRILL-4199[1] to myself to evaluate moving to HBase 1.1 in next Drill release.

[1] https://issues.apache.org/jira/browse/DRILL-4199

On Mon, Mar 21, 2016 at 12:13 PM, Kevin Verhoeven <Ke...@ds-iq.com>> wrote:
Aditya,

Looking into the bug we read that the behavior will still occur if the hbase-client version does not include the fix (between a 0.98 client and 1.0 server). The hbase-client used by Drill under jars/3rdparty is hbase-client-0.98.7-hadoop2.jar which does not include the fix. I updated the hbase-client jar with hbase-client-0.98.17-hadoop2.jar, but I receive a java.lang.NoClassDefFoundError error. Are you able to test Drill with an updated hbase-client jar against CDH? Here is the error I received:

2016-03-21 18:58:36,874 [USER-rpc-event-queue] ERROR o.a.d.exec.server.rest.QueryWrapper - Query Failed
org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Failure while loading table test6c in database hbase.
Message:  com.google.protobuf.ServiceException: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge

        at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269) [drill-rpc-1.4.0.jar:1.4.0]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]

Thanks,

Kevin

-----Original Message-----
From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>]
Sent: Monday, March 21, 2016 11:33 AM
To: adityakishore@gmail.com<ma...@gmail.com>
Cc: Kumiko Yada <Ku...@ds-iq.com>>; user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>
Subject: RE: Drill query does not return all results from HBase

Thanks Aditya,

I also see that the bug was backported in CDH 5.4.3: https://archive.cloudera.com/cdh5/cdh/5/hbase-1.0.0-cdh5.4.3.releasenotes.html. I tested Drill on CDH version 5.4.2, 5.4.3, 5.4.7, and 5.5.2 and see the same behavior.

Kevin

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Monday, March 21, 2016 10:26 AM
To: Kevin Verhoeven <Ke...@ds-iq.com>>
Cc: Kumiko Yada <Ku...@ds-iq.com>>; user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Since I suspected that it was a bug in HBase, I tried it with the original version you reported in the first post in this thread, i.e. CDH 5.4.3.
If it was back-ported to 5.4.7, upgrading should fix this issue.

On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven <Ke...@ds-iq.com>>> wrote:
Aditya,

Thank you for your help. What version of CDH are you running? I contacted Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.

Thanks,

Kevin

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>>]
Sent: Sunday, March 20, 2016 10:45 PM

To: Kumiko Yada <Ku...@ds-iq.com>>>
Cc: user@drill.apache.org<ma...@drill.apache.org>>; dev@drill.apache.org<ma...@drill.apache.org>>; altekrusejason@gmail.com<ma...@gmail.com>>; Ki Kang <Ki...@ds-iq.com>>>; Kevin Verhoeven <Ke...@ds-iq.com>>>
Subject: Re: Drill query does not return all results from HBase

Finally managed to reproduce it with CDH distribution (So far I was testing with HBase 1.1 distributed with MapR, which does not have this bug).
This is essentially an HBase bug, HBASE-13262[1], which has been fixed in 1.0.1, 1.1.0.
Please update your HBase distribution.

[1] https://issues.apache.org/jira/browse/HBASE-13262

On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com>>> wrote:
Aditya,

When we were exchanging the emails, you mentioned to me that you discovered another issue in case where the table is spit into multiple regions and the first region returned to the client did not have any rows.  I think this issue is related to the issue that I’m seeing.  Have you opened the JIRA for this issue?  Have you investigated/fixed this issue?

Thanks
Kumiko

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>>]
Sent: Thursday, March 17, 2016 3:02 PM
To: Kumiko Yada <Ku...@ds-iq.com>>>
Cc: user@drill.apache.org<ma...@drill.apache.org>>; dev@drill.apache.org<ma...@drill.apache.org>>; altekrusejason@gmail.com<ma...@gmail.com>>; Ki Kang <Ki...@ds-iq.com>>>; Kevin Verhoeven <Ke...@ds-iq.com>>>

Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have failed so far.
From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops returning rows after a while which seem odd.
Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>>> wrote:
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>>]
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>>>
Cc: altekrusejason@gmail.com<ma...@gmail.com>>; Ki Kang <Ki...@ds-iq.com>>>; Kevin Verhoeven <Ke...@ds-iq.com>>>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>>> wrote:

> Hello,
>
> Does anyone have any update on this issue,
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada
> [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>>]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org<ma...@drill.apache.org>>;
> altekrusejason@gmail.com<ma...@gmail.com>>
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse
> [mailto:altekrusejason@gmail.com<ma...@gmail.com>>]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>>>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the
> incorrect result? In the incorrect case I think we are just misreading
> the HBase metadata during our optimization to return row counts
> without reading any data. This should be really fast, and noticeably
> different than running a complete query, even with a small dataset as
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is
> occurring, I will hopefully have time soon to get this fixed but I'm
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada
> <Ku...@ds-iq.com>>>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse
> <al...@gmail.com>>>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> did differently; however your guess were correct.  When I did the one
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important
> for us that this is fixed.  Let me know if I can do anything to get
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family,
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family,
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse'
> <al...@gmail.com>>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>>; Kevin
> Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try
> force a full scan, but the result was the same, just 10000 rows
> selected.  With the same table (contains 6 columns), I run the query
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse'
> <al...@gmail.com>>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>>; Kevin
> Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse
> [mailto:altekrusejason@gmail.com<ma...@gmail.com>> <
> altekrusejason@gmail.com<ma...@gmail.com>>>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada
> <Ku...@ds-iq.com>>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>>; Kevin
> Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try
> to skip reading the dataset all together if we have metadata for the
> number of rows and all that is requested is a count(*). I assume that
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada
> <Ku...@ds-iq.com>>>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse
> [mailto:altekrusejason@gmail.com<ma...@gmail.com>>]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>>>
>
> Cc: Aditya Kishore
> <ad...@gmail.com>>>; Kevin
> Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but
> this is a critical wrong results issues, so I will be taking a look at
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada
> <Ku...@ds-iq.com>>>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada
> > [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>>]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>>
> > Cc: Aditya Kishore
> > <ad...@gmail.com>>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > row count returned when the Hbase table contains only 1 column
> > family, 1 column, but the incorrect row count is returned for the
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish
> > [mailto:abhishek.girish@gmail.com<ma...@gmail.com>>]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>>>
> > Cc: Aditya Kishore
> > <ad...@gmail.com>>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > now and don't see the issue. Also, I am on a MapR setup, which might
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > <al...@gmail.com>>
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what
> > > version did you upgrade to that solved the problem? I assume this
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com<ma...@gmail.com>>
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > to fetch all records, whereas I was only able to see a small
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was
> > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com<ma...@gmail.com>>
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > <Ku...@ds-iq.com>>>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven
> > > > > > [mailto:Kevin.Verhoeven@ds-iq.com<ma...@ds->
> > > > > > iq.com<http://iq.com>>]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org<ma...@drill.apache.org>>
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>




Re: Drill query does not return all results from HBase

Posted by Aditya <ad...@gmail.com>.
I did not see any issue when running with HBase 0.98.7 client bundled with
Drill against HBase 1.1 servers.

I have just assigned DRILL-4199[1] to myself to evaluate moving to HBase
1.1 in next Drill release.

[1] https://issues.apache.org/jira/browse/DRILL-4199

On Mon, Mar 21, 2016 at 12:13 PM, Kevin Verhoeven <Kevin.Verhoeven@ds-iq.com
> wrote:

> Aditya,
>
> Looking into the bug we read that the behavior will still occur if the
> hbase-client version does not include the fix (between a 0.98 client and
> 1.0 server). The hbase-client used by Drill under jars/3rdparty is
> hbase-client-0.98.7-hadoop2.jar which does not include the fix. I updated
> the hbase-client jar with hbase-client-0.98.17-hadoop2.jar, but I receive a
> java.lang.NoClassDefFoundError error. Are you able to test Drill with an
> updated hbase-client jar against CDH? Here is the error I received:
>
> 2016-03-21 18:58:36,874 [USER-rpc-event-queue] ERROR
> o.a.d.exec.server.rest.QueryWrapper - Query Failed
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR:
> Failure while loading table test6c in database hbase.
> Message:  com.google.protobuf.ServiceException:
> java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
>
>         at
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
> [drill-java-exec-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
> [drill-java-exec-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
> [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
> [netty-handler-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at
> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
>
> Thanks,
>
> Kevin
>
> -----Original Message-----
> From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> Sent: Monday, March 21, 2016 11:33 AM
> To: adityakishore@gmail.com
> Cc: Kumiko Yada <Ku...@ds-iq.com>; user@drill.apache.org;
> dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki.Kang@ds-iq.com
> >
> Subject: RE: Drill query does not return all results from HBase
>
> Thanks Aditya,
>
> I also see that the bug was backported in CDH 5.4.3:
> https://archive.cloudera.com/cdh5/cdh/5/hbase-1.0.0-cdh5.4.3.releasenotes.html.
> I tested Drill on CDH version 5.4.2, 5.4.3, 5.4.7, and 5.5.2 and see the
> same behavior.
>
> Kevin
>
> From: Aditya [mailto:adityakishore@gmail.com]
> Sent: Monday, March 21, 2016 10:26 AM
> To: Kevin Verhoeven <Ke...@ds-iq.com>
> Cc: Kumiko Yada <Ku...@ds-iq.com>; user@drill.apache.org;
> dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki.Kang@ds-iq.com
> >
> Subject: Re: Drill query does not return all results from HBase
>
> Since I suspected that it was a bug in HBase, I tried it with the original
> version you reported in the first post in this thread, i.e. CDH 5.4.3.
> If it was back-ported to 5.4.7, upgrading should fix this issue.
>
> On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>> wrote:
> Aditya,
>
> Thank you for your help. What version of CDH are you running? I contacted
> Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.
>
> Thanks,
>
> Kevin
>
> From: Aditya [mailto:adityakishore@gmail.com<mailto:
> adityakishore@gmail.com>]
> Sent: Sunday, March 20, 2016 10:45 PM
>
> To: Kumiko Yada <Ku...@ds-iq.com>>
> Cc: user@drill.apache.org<ma...@drill.apache.org>;
> dev@drill.apache.org<ma...@drill.apache.org>;
> altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <
> Ki.Kang@ds-iq.com<ma...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Finally managed to reproduce it with CDH distribution (So far I was
> testing with HBase 1.1 distributed with MapR, which does not have this bug).
> This is essentially an HBase bug, HBASE-13262[1], which has been fixed in
> 1.0.1, 1.1.0.
> Please update your HBase distribution.
>
> [1] https://issues.apache.org/jira/browse/HBASE-13262
>
> On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com
> <ma...@ds-iq.com>> wrote:
> Aditya,
>
> When we were exchanging the emails, you mentioned to me that you
> discovered another issue in case where the table is spit into multiple
> regions and the first region returned to the client did not have any rows.
> I think this issue is related to the issue that I’m seeing.  Have you
> opened the JIRA for this issue?  Have you investigated/fixed this issue?
>
> Thanks
> Kumiko
>
> From: Aditya [mailto:adityakishore@gmail.com<mailto:
> adityakishore@gmail.com>]
> Sent: Thursday, March 17, 2016 3:02 PM
> To: Kumiko Yada <Ku...@ds-iq.com>>
> Cc: user@drill.apache.org<ma...@drill.apache.org>;
> dev@drill.apache.org<ma...@drill.apache.org>;
> altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <
> Ki.Kang@ds-iq.com<ma...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I have tried to reproduce this locally with Apache 1.x release but have
> failed so far.
> From my mail exchange with Kevin on another thread, it appears that the
> HBase scanner stops returning rows after a while which seem odd.
> Probably it is unique to CDH distribution. I am planning to setup a single
> node CDH cluster to see if it I can reproduce it there.
>
> On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com
> <ma...@ds-iq.com>> wrote:
> Hello,
>
> I provided all information that was requested; however, I haven't heard
> back anything since February 24.
>
> Is anyone taking look at this?  Are there any workarounds?
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Aditya [mailto:adityakishore@gmail.com<mailto:
> adityakishore@gmail.com>]
> Sent: Friday, February 19, 2016 12:48 PM
> To: user <us...@drill.apache.org>>
> Cc: altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <
> Ki.Kang@ds-iq.com<ma...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I apologies for not chiming in until now, considering that if there is a
> bug here it is most probably put in by me :)
>
> I've assigned the JIRA to myself and going to take a l look.
>
> Would it be possible for you to either attach to the JIRA or send me
> privately the Drill query profiles form both the correct and the incorrect
> executions?
>
> Regards,
> aditya...
>
> On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com
> <ma...@ds-iq.com>> wrote:
>
> > Hello,
> >
> > Does anyone have any update on this issue,
> > https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> > that this would be investigated/fixed?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada
> > [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > Sent: Thursday, January 14, 2016 3:44 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>;
> > altekrusejason@gmail.com<ma...@gmail.com>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > The query time was very short on the one with the incorrect result.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse
> > [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> > Sent: Thursday, January 14, 2016 1:25 PM
> > To: user <us...@drill.apache.org>>
> > Subject: Fwd: Drill query does not return all results from HBase
> >
> > Thanks for the update, I'm forwarding your message back to the list.
> >
> > Just to confirm, was the query time longer on the the one with the
> > incorrect result? In the incorrect case I think we are just misreading
> > the HBase metadata during our optimization to return row counts
> > without reading any data. This should be really fast, and noticeably
> > different than running a complete query, even with a small dataset as
> > we have to read in your table and run an aggregation over it.
> >
> > This would just be a final confirmation of where the issue is
> > occurring, I will hopefully have time soon to get this fixed but I'm
> > wrapping up some other things right now.
> >
> >
> > ---------- Forwarded message ----------
> > From: Kumiko Yada
> > <Ku...@ds-iq.com>>
> > Date: Thu, Jan 14, 2016 at 12:53 PM
> > Subject: RE: Drill query does not return all results from HBase
> > To: Jason Altekruse
> > <al...@gmail.com>>
> >
> >
> > Jason,
> >
> >
> >
> > I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> > did differently; however your guess were correct.  When I did the one
> > column count, the row count was correct.  Here is the additional testing
> results.
> >
> >
> >
> > My company has been invested to use the drill, and it’s very important
> > for us that this is fixed.  Let me know if I can do anything to get
> > this issue to be fixed.  I really appreciate you that you are looking
> into issue!
> >
> > Hbase table (1 column family, 5 columns, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (1 column family, 6 columns,  10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (**returned 6724 rows)*
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (2 column family, 6 columns in each columns family,
> > 10000000
> > rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 3362 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbase table (2 column family, 2 columns in each columns family,
> > 10000000
> > rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbasetable (2 column family, 4 columns in one column family and 2
> > columns in other column family, 10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 6723 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbasetable (2 column family, 1 column in one column family and 3
> > columns in other column family, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> >
> >
> > Thanks
> >
> > Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:28 PM
> > *To:* 'Jason Altekruse'
> > <al...@gmail.com>>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > I also run the query to display only 1 column with no limit to try
> > force a full scan, but the result was the same, just 10000 rows
> > selected.  With the same table (contains 6 columns), I run the query
> > to display the row_key, and it display all records, 10,000,000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:24 PM
> > *To:* 'Jason Altekruse'
> > <al...@gmail.com>>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > Jason
> >
> >
> >
> > I run the query to display only 1 column for 100000 rows, and it only
> > returned 10000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Jason Altekruse
> > [mailto:altekrusejason@gmail.com<ma...@gmail.com> <
> > altekrusejason@gmail.com<ma...@gmail.com>>]
> > *Sent:* Wednesday, January 13, 2016 6:39 PM
> > *To:* Kumiko Yada
> > <Ku...@ds-iq.com>>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> >
> > *Subject:* Re: Drill query does not return all results from HBase
> >
> >
> >
> > I know in a number of cases we have special optimizer rules that try
> > to skip reading the dataset all together if we have metadata for the
> > number of rows and all that is requested is a count(*). I assume that
> > this is the case with HBase, and this may be where we aren't doing
> something correctly.
> > Can you try to run a 'sum', or other aggregate query on one of the
> > columns to see if a full scan of the data is operating correctly?
> >
> >
> >
> > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada
> > <Ku...@ds-iq.com>>
> > wrote:
> >
> > Thank you, Jason!
> >
> > Let me know if you need any help on this. I will be glad to help on
> > repro and/or test the fix.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse
> > [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> > Sent: Wednesday, January 13, 2016 6:24 PM
> > To: user <us...@drill.apache.org>>
> >
> > Cc: Aditya Kishore
> > <ad...@gmail.com>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Thanks for filing the issue. I haven't worked much with HBase, but
> > this is a critical wrong results issues, so I will be taking a look at
> > this soon if no one else raises their hand.
> >
> > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada
> > <Ku...@ds-iq.com>>
> > wrote:
> >
> > > I opened the bug on this.  The drill is returning the correct rows
> > > when the hbase contains 5 or less columns, but not 6 or more columns.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4271
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kumiko Yada
> > > [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > > Sent: Wednesday, January 13, 2016 4:52 PM
> > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > Cc: Aditya Kishore
> > > <ad...@gmail.com>>; Kevin
> > > Verhoeven <
> > > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > > Subject: RE: Drill query does not return all results from HBase
> > >
> > > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > > row count returned when the Hbase table contains only 1 column
> > > family, 1 column, but the incorrect row count is returned for the
> > > Hbase table contains 1 column family, 6 columns.
> > >
> > > This looks like the Drill issue.  Has anyone found any workaround?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Abhishek Girish
> > > [mailto:abhishek.girish@gmail.com<ma...@gmail.com>]
> > > Sent: Tuesday, January 12, 2016 6:51 PM
> > > To: user <us...@drill.apache.org>>
> > > Cc: Aditya Kishore
> > > <ad...@gmail.com>>
> > > Subject: Re: Drill query does not return all results from HBase
> > >
> > > Well, the major version din't change if I remember it right, hence
> > > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > > now and don't see the issue. Also, I am on a MapR setup, which might
> > > not be comparable with their CDH setups.
> > >
> > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > > <al...@gmail.com>
> > > >
> > > wrote:
> > >
> > > > Abhishek,
> > > >
> > > > What version of HBase did you have the problem with, and what
> > > > version did you upgrade to that solved the problem? I assume this
> > > > would be useful information to compare your setup with Kevin's and
> > Kumiko's.
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com<ma...@gmail.com>
> > > > > wrote:
> > > >
> > > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > > to fetch all records, whereas I was only able to see a small
> > > > > subset of records
> > > > when
> > > > > queried from Drill. Each time I inserted 1000 records, only
> > > > > about
> > > > > 50 of those would show up.
> > > > >
> > > > > Although I could repro' the problem consistently, it was
> > > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > > might not have to do with
> > > > Drill
> > > > > itself.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com<ma...@gmail.com>
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I'm not sure why this is happening, we have tests in our
> > > > > > automated
> > > > suite
> > > > > > that I believe run some pretty large queries against Hbase and
> > > > > > verify
> > > > the
> > > > > > results.
> > > > > >
> > > > > > Aditya, do you have some time available to try to reproduce
> > > > > > this and diagnose the problem?
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > > <Ku...@ds-iq.com>>
> > > > > wrote:
> > > > > >
> > > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kumiko
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kevin Verhoeven
> > > > > > > [mailto:Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-
> > > > > > > iq.com>]
> > > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > > > > > Subject: Drill query does not return all results from HBase
> > > > > > >
> > > > > > > We have a problem where a Drill query against HBase does not
> > > > > > > return
> > > > all
> > > > > > > results. The following query should return over 100,000
> > > > > > > rows, but we
> > > > > only
> > > > > > > get about 1,030 back.
> > > > > > >
> > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > > customer_number =
> > > > > 800
> > > > > > >
> > > > > > > If we scan directly using the hbase shell we see over
> > > > > > > 100,000 rows,
> > > > but
> > > > > > > the same Drill query does not return a fraction of the
> > > > > > > expected
> > > > > results.
> > > > > > We
> > > > > > > have also run a count against the table and Drill returns
> > > > > > > the same
> > > > > 1,030
> > > > > > > number, which is far less than expect. What could be going
> wrong?
> > > > > > >
> > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > > (HBase
> > > > 1.0).
> > > > > > We
> > > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > > billion
> > > rows.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Kevin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
>

Re: Drill query does not return all results from HBase

Posted by Aditya <ad...@gmail.com>.
I did not see any issue when running with HBase 0.98.7 client bundled with
Drill against HBase 1.1 servers.

I have just assigned DRILL-4199[1] to myself to evaluate moving to HBase
1.1 in next Drill release.

[1] https://issues.apache.org/jira/browse/DRILL-4199

On Mon, Mar 21, 2016 at 12:13 PM, Kevin Verhoeven <Kevin.Verhoeven@ds-iq.com
> wrote:

> Aditya,
>
> Looking into the bug we read that the behavior will still occur if the
> hbase-client version does not include the fix (between a 0.98 client and
> 1.0 server). The hbase-client used by Drill under jars/3rdparty is
> hbase-client-0.98.7-hadoop2.jar which does not include the fix. I updated
> the hbase-client jar with hbase-client-0.98.17-hadoop2.jar, but I receive a
> java.lang.NoClassDefFoundError error. Are you able to test Drill with an
> updated hbase-client jar against CDH? Here is the error I received:
>
> 2016-03-21 18:58:36,874 [USER-rpc-event-queue] ERROR
> o.a.d.exec.server.rest.QueryWrapper - Query Failed
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR:
> Failure while loading table test6c in database hbase.
> Message:  com.google.protobuf.ServiceException:
> java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
>
>         at
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
> [drill-java-exec-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
> [drill-java-exec-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269)
> [drill-rpc-1.4.0.jar:1.4.0]
>         at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
> [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
> [netty-handler-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at
> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
>
> Thanks,
>
> Kevin
>
> -----Original Message-----
> From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> Sent: Monday, March 21, 2016 11:33 AM
> To: adityakishore@gmail.com
> Cc: Kumiko Yada <Ku...@ds-iq.com>; user@drill.apache.org;
> dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki.Kang@ds-iq.com
> >
> Subject: RE: Drill query does not return all results from HBase
>
> Thanks Aditya,
>
> I also see that the bug was backported in CDH 5.4.3:
> https://archive.cloudera.com/cdh5/cdh/5/hbase-1.0.0-cdh5.4.3.releasenotes.html.
> I tested Drill on CDH version 5.4.2, 5.4.3, 5.4.7, and 5.5.2 and see the
> same behavior.
>
> Kevin
>
> From: Aditya [mailto:adityakishore@gmail.com]
> Sent: Monday, March 21, 2016 10:26 AM
> To: Kevin Verhoeven <Ke...@ds-iq.com>
> Cc: Kumiko Yada <Ku...@ds-iq.com>; user@drill.apache.org;
> dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki.Kang@ds-iq.com
> >
> Subject: Re: Drill query does not return all results from HBase
>
> Since I suspected that it was a bug in HBase, I tried it with the original
> version you reported in the first post in this thread, i.e. CDH 5.4.3.
> If it was back-ported to 5.4.7, upgrading should fix this issue.
>
> On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>> wrote:
> Aditya,
>
> Thank you for your help. What version of CDH are you running? I contacted
> Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.
>
> Thanks,
>
> Kevin
>
> From: Aditya [mailto:adityakishore@gmail.com<mailto:
> adityakishore@gmail.com>]
> Sent: Sunday, March 20, 2016 10:45 PM
>
> To: Kumiko Yada <Ku...@ds-iq.com>>
> Cc: user@drill.apache.org<ma...@drill.apache.org>;
> dev@drill.apache.org<ma...@drill.apache.org>;
> altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <
> Ki.Kang@ds-iq.com<ma...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Finally managed to reproduce it with CDH distribution (So far I was
> testing with HBase 1.1 distributed with MapR, which does not have this bug).
> This is essentially an HBase bug, HBASE-13262[1], which has been fixed in
> 1.0.1, 1.1.0.
> Please update your HBase distribution.
>
> [1] https://issues.apache.org/jira/browse/HBASE-13262
>
> On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com
> <ma...@ds-iq.com>> wrote:
> Aditya,
>
> When we were exchanging the emails, you mentioned to me that you
> discovered another issue in case where the table is spit into multiple
> regions and the first region returned to the client did not have any rows.
> I think this issue is related to the issue that I’m seeing.  Have you
> opened the JIRA for this issue?  Have you investigated/fixed this issue?
>
> Thanks
> Kumiko
>
> From: Aditya [mailto:adityakishore@gmail.com<mailto:
> adityakishore@gmail.com>]
> Sent: Thursday, March 17, 2016 3:02 PM
> To: Kumiko Yada <Ku...@ds-iq.com>>
> Cc: user@drill.apache.org<ma...@drill.apache.org>;
> dev@drill.apache.org<ma...@drill.apache.org>;
> altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <
> Ki.Kang@ds-iq.com<ma...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I have tried to reproduce this locally with Apache 1.x release but have
> failed so far.
> From my mail exchange with Kevin on another thread, it appears that the
> HBase scanner stops returning rows after a while which seem odd.
> Probably it is unique to CDH distribution. I am planning to setup a single
> node CDH cluster to see if it I can reproduce it there.
>
> On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com
> <ma...@ds-iq.com>> wrote:
> Hello,
>
> I provided all information that was requested; however, I haven't heard
> back anything since February 24.
>
> Is anyone taking look at this?  Are there any workarounds?
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Aditya [mailto:adityakishore@gmail.com<mailto:
> adityakishore@gmail.com>]
> Sent: Friday, February 19, 2016 12:48 PM
> To: user <us...@drill.apache.org>>
> Cc: altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <
> Ki.Kang@ds-iq.com<ma...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I apologies for not chiming in until now, considering that if there is a
> bug here it is most probably put in by me :)
>
> I've assigned the JIRA to myself and going to take a l look.
>
> Would it be possible for you to either attach to the JIRA or send me
> privately the Drill query profiles form both the correct and the incorrect
> executions?
>
> Regards,
> aditya...
>
> On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com
> <ma...@ds-iq.com>> wrote:
>
> > Hello,
> >
> > Does anyone have any update on this issue,
> > https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> > that this would be investigated/fixed?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada
> > [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > Sent: Thursday, January 14, 2016 3:44 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>;
> > altekrusejason@gmail.com<ma...@gmail.com>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > The query time was very short on the one with the incorrect result.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse
> > [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> > Sent: Thursday, January 14, 2016 1:25 PM
> > To: user <us...@drill.apache.org>>
> > Subject: Fwd: Drill query does not return all results from HBase
> >
> > Thanks for the update, I'm forwarding your message back to the list.
> >
> > Just to confirm, was the query time longer on the the one with the
> > incorrect result? In the incorrect case I think we are just misreading
> > the HBase metadata during our optimization to return row counts
> > without reading any data. This should be really fast, and noticeably
> > different than running a complete query, even with a small dataset as
> > we have to read in your table and run an aggregation over it.
> >
> > This would just be a final confirmation of where the issue is
> > occurring, I will hopefully have time soon to get this fixed but I'm
> > wrapping up some other things right now.
> >
> >
> > ---------- Forwarded message ----------
> > From: Kumiko Yada
> > <Ku...@ds-iq.com>>
> > Date: Thu, Jan 14, 2016 at 12:53 PM
> > Subject: RE: Drill query does not return all results from HBase
> > To: Jason Altekruse
> > <al...@gmail.com>>
> >
> >
> > Jason,
> >
> >
> >
> > I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> > did differently; however your guess were correct.  When I did the one
> > column count, the row count was correct.  Here is the additional testing
> results.
> >
> >
> >
> > My company has been invested to use the drill, and it’s very important
> > for us that this is fixed.  Let me know if I can do anything to get
> > this issue to be fixed.  I really appreciate you that you are looking
> into issue!
> >
> > Hbase table (1 column family, 5 columns, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (1 column family, 6 columns,  10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (**returned 6724 rows)*
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (2 column family, 6 columns in each columns family,
> > 10000000
> > rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 3362 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbase table (2 column family, 2 columns in each columns family,
> > 10000000
> > rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbasetable (2 column family, 4 columns in one column family and 2
> > columns in other column family, 10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 6723 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbasetable (2 column family, 1 column in one column family and 3
> > columns in other column family, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> >
> >
> > Thanks
> >
> > Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:28 PM
> > *To:* 'Jason Altekruse'
> > <al...@gmail.com>>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > I also run the query to display only 1 column with no limit to try
> > force a full scan, but the result was the same, just 10000 rows
> > selected.  With the same table (contains 6 columns), I run the query
> > to display the row_key, and it display all records, 10,000,000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:24 PM
> > *To:* 'Jason Altekruse'
> > <al...@gmail.com>>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > Jason
> >
> >
> >
> > I run the query to display only 1 column for 100000 rows, and it only
> > returned 10000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Jason Altekruse
> > [mailto:altekrusejason@gmail.com<ma...@gmail.com> <
> > altekrusejason@gmail.com<ma...@gmail.com>>]
> > *Sent:* Wednesday, January 13, 2016 6:39 PM
> > *To:* Kumiko Yada
> > <Ku...@ds-iq.com>>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> >
> > *Subject:* Re: Drill query does not return all results from HBase
> >
> >
> >
> > I know in a number of cases we have special optimizer rules that try
> > to skip reading the dataset all together if we have metadata for the
> > number of rows and all that is requested is a count(*). I assume that
> > this is the case with HBase, and this may be where we aren't doing
> something correctly.
> > Can you try to run a 'sum', or other aggregate query on one of the
> > columns to see if a full scan of the data is operating correctly?
> >
> >
> >
> > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada
> > <Ku...@ds-iq.com>>
> > wrote:
> >
> > Thank you, Jason!
> >
> > Let me know if you need any help on this. I will be glad to help on
> > repro and/or test the fix.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse
> > [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> > Sent: Wednesday, January 13, 2016 6:24 PM
> > To: user <us...@drill.apache.org>>
> >
> > Cc: Aditya Kishore
> > <ad...@gmail.com>>; Kevin
> > Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Thanks for filing the issue. I haven't worked much with HBase, but
> > this is a critical wrong results issues, so I will be taking a look at
> > this soon if no one else raises their hand.
> >
> > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada
> > <Ku...@ds-iq.com>>
> > wrote:
> >
> > > I opened the bug on this.  The drill is returning the correct rows
> > > when the hbase contains 5 or less columns, but not 6 or more columns.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4271
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kumiko Yada
> > > [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > > Sent: Wednesday, January 13, 2016 4:52 PM
> > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > Cc: Aditya Kishore
> > > <ad...@gmail.com>>; Kevin
> > > Verhoeven <
> > > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > > Subject: RE: Drill query does not return all results from HBase
> > >
> > > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > > row count returned when the Hbase table contains only 1 column
> > > family, 1 column, but the incorrect row count is returned for the
> > > Hbase table contains 1 column family, 6 columns.
> > >
> > > This looks like the Drill issue.  Has anyone found any workaround?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Abhishek Girish
> > > [mailto:abhishek.girish@gmail.com<ma...@gmail.com>]
> > > Sent: Tuesday, January 12, 2016 6:51 PM
> > > To: user <us...@drill.apache.org>>
> > > Cc: Aditya Kishore
> > > <ad...@gmail.com>>
> > > Subject: Re: Drill query does not return all results from HBase
> > >
> > > Well, the major version din't change if I remember it right, hence
> > > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > > now and don't see the issue. Also, I am on a MapR setup, which might
> > > not be comparable with their CDH setups.
> > >
> > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > > <al...@gmail.com>
> > > >
> > > wrote:
> > >
> > > > Abhishek,
> > > >
> > > > What version of HBase did you have the problem with, and what
> > > > version did you upgrade to that solved the problem? I assume this
> > > > would be useful information to compare your setup with Kevin's and
> > Kumiko's.
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com<ma...@gmail.com>
> > > > > wrote:
> > > >
> > > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > > to fetch all records, whereas I was only able to see a small
> > > > > subset of records
> > > > when
> > > > > queried from Drill. Each time I inserted 1000 records, only
> > > > > about
> > > > > 50 of those would show up.
> > > > >
> > > > > Although I could repro' the problem consistently, it was
> > > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > > might not have to do with
> > > > Drill
> > > > > itself.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com<ma...@gmail.com>
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I'm not sure why this is happening, we have tests in our
> > > > > > automated
> > > > suite
> > > > > > that I believe run some pretty large queries against Hbase and
> > > > > > verify
> > > > the
> > > > > > results.
> > > > > >
> > > > > > Aditya, do you have some time available to try to reproduce
> > > > > > this and diagnose the problem?
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > > <Ku...@ds-iq.com>>
> > > > > wrote:
> > > > > >
> > > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kumiko
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kevin Verhoeven
> > > > > > > [mailto:Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-
> > > > > > > iq.com>]
> > > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > > > > > Subject: Drill query does not return all results from HBase
> > > > > > >
> > > > > > > We have a problem where a Drill query against HBase does not
> > > > > > > return
> > > > all
> > > > > > > results. The following query should return over 100,000
> > > > > > > rows, but we
> > > > > only
> > > > > > > get about 1,030 back.
> > > > > > >
> > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > > customer_number =
> > > > > 800
> > > > > > >
> > > > > > > If we scan directly using the hbase shell we see over
> > > > > > > 100,000 rows,
> > > > but
> > > > > > > the same Drill query does not return a fraction of the
> > > > > > > expected
> > > > > results.
> > > > > > We
> > > > > > > have also run a count against the table and Drill returns
> > > > > > > the same
> > > > > 1,030
> > > > > > > number, which is far less than expect. What could be going
> wrong?
> > > > > > >
> > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > > (HBase
> > > > 1.0).
> > > > > > We
> > > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > > billion
> > > rows.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Kevin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
>

RE: Drill query does not return all results from HBase

Posted by Kevin Verhoeven <Ke...@ds-iq.com>.
Aditya,

Looking into the bug we read that the behavior will still occur if the hbase-client version does not include the fix (between a 0.98 client and 1.0 server). The hbase-client used by Drill under jars/3rdparty is hbase-client-0.98.7-hadoop2.jar which does not include the fix. I updated the hbase-client jar with hbase-client-0.98.17-hadoop2.jar, but I receive a java.lang.NoClassDefFoundError error. Are you able to test Drill with an updated hbase-client jar against CDH? Here is the error I received:

2016-03-21 18:58:36,874 [USER-rpc-event-queue] ERROR o.a.d.exec.server.rest.QueryWrapper - Query Failed
org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Failure while loading table test6c in database hbase.
Message:  com.google.protobuf.ServiceException: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge

        at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298) [drill-rpc-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269) [drill-rpc-1.4.0.jar:1.4.0]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]

Thanks,

Kevin

-----Original Message-----
From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com] 
Sent: Monday, March 21, 2016 11:33 AM
To: adityakishore@gmail.com
Cc: Kumiko Yada <Ku...@ds-iq.com>; user@drill.apache.org; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>
Subject: RE: Drill query does not return all results from HBase

Thanks Aditya,

I also see that the bug was backported in CDH 5.4.3: https://archive.cloudera.com/cdh5/cdh/5/hbase-1.0.0-cdh5.4.3.releasenotes.html. I tested Drill on CDH version 5.4.2, 5.4.3, 5.4.7, and 5.5.2 and see the same behavior.

Kevin

From: Aditya [mailto:adityakishore@gmail.com]
Sent: Monday, March 21, 2016 10:26 AM
To: Kevin Verhoeven <Ke...@ds-iq.com>
Cc: Kumiko Yada <Ku...@ds-iq.com>; user@drill.apache.org; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Since I suspected that it was a bug in HBase, I tried it with the original version you reported in the first post in this thread, i.e. CDH 5.4.3.
If it was back-ported to 5.4.7, upgrading should fix this issue.

On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven <Ke...@ds-iq.com>> wrote:
Aditya,

Thank you for your help. What version of CDH are you running? I contacted Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.

Thanks,

Kevin

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Sunday, March 20, 2016 10:45 PM

To: Kumiko Yada <Ku...@ds-iq.com>>
Cc: user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Finally managed to reproduce it with CDH distribution (So far I was testing with HBase 1.1 distributed with MapR, which does not have this bug).
This is essentially an HBase bug, HBASE-13262[1], which has been fixed in 1.0.1, 1.1.0.
Please update your HBase distribution.

[1] https://issues.apache.org/jira/browse/HBASE-13262

On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Aditya,

When we were exchanging the emails, you mentioned to me that you discovered another issue in case where the table is spit into multiple regions and the first region returned to the client did not have any rows.  I think this issue is related to the issue that I’m seeing.  Have you opened the JIRA for this issue?  Have you investigated/fixed this issue?

Thanks
Kumiko

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Thursday, March 17, 2016 3:02 PM
To: Kumiko Yada <Ku...@ds-iq.com>>
Cc: user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>

Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have failed so far.
From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops returning rows after a while which seem odd.
Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>>
Cc: altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:

> Hello,
>
> Does anyone have any update on this issue, 
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan 
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada 
> [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org<ma...@drill.apache.org>; 
> altekrusejason@gmail.com<ma...@gmail.com>
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse 
> [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the 
> incorrect result? In the incorrect case I think we are just misreading 
> the HBase metadata during our optimization to return row counts 
> without reading any data. This should be really fast, and noticeably 
> different than running a complete query, even with a small dataset as 
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is 
> occurring, I will hopefully have time soon to get this fixed but I'm 
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada 
> <Ku...@ds-iq.com>>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse 
> <al...@gmail.com>>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I 
> did differently; however your guess were correct.  When I did the one 
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important 
> for us that this is fixed.  Let me know if I can do anything to get 
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family,
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family,
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2 
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3 
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' 
> <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin 
> Verhoeven < 
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try 
> force a full scan, but the result was the same, just 10000 rows 
> selected.  With the same table (contains 6 columns), I run the query 
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' 
> <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin 
> Verhoeven < 
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only 
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse 
> [mailto:altekrusejason@gmail.com<ma...@gmail.com> < 
> altekrusejason@gmail.com<ma...@gmail.com>>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada 
> <Ku...@ds-iq.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin 
> Verhoeven < 
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try 
> to skip reading the dataset all together if we have metadata for the 
> number of rows and all that is requested is a count(*). I assume that 
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the 
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada 
> <Ku...@ds-iq.com>>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on 
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse 
> [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>>
>
> Cc: Aditya Kishore 
> <ad...@gmail.com>>; Kevin 
> Verhoeven < 
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but 
> this is a critical wrong results issues, so I will be taking a look at 
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada 
> <Ku...@ds-iq.com>>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows 
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada 
> > [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>
> > Cc: Aditya Kishore 
> > <ad...@gmail.com>>; Kevin 
> > Verhoeven < 
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct 
> > row count returned when the Hbase table contains only 1 column 
> > family, 1 column, but the incorrect row count is returned for the 
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish 
> > [mailto:abhishek.girish@gmail.com<ma...@gmail.com>]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>>
> > Cc: Aditya Kishore 
> > <ad...@gmail.com>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence 
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right 
> > now and don't see the issue. Also, I am on a MapR setup, which might 
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse 
> > <al...@gmail.com>
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what 
> > > version did you upgrade to that solved the problem? I assume this 
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < 
> > > abhishek.girish@gmail.com<ma...@gmail.com>
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able 
> > > > to fetch all records, whereas I was only able to see a small 
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only 
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was 
> > > > resolved once i updated my Hadoop setup. My guess is that it was 
> > > > a HBase bug which got resolved. Although strange as it seems, it 
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com<ma...@gmail.com>
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our 
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and 
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce 
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada 
> > > > > <Ku...@ds-iq.com>>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven 
> > > > > > [mailto:Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-
> > > > > > iq.com>]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not 
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000 
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE 
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the 
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns 
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3 
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>




RE: Drill query does not return all results from HBase

Posted by Kevin Verhoeven <Ke...@ds-iq.com>.
Thanks Aditya,

I also see that the bug was backported in CDH 5.4.3: https://archive.cloudera.com/cdh5/cdh/5/hbase-1.0.0-cdh5.4.3.releasenotes.html. I tested Drill on CDH version 5.4.2, 5.4.3, 5.4.7, and 5.5.2 and see the same behavior.

Kevin

From: Aditya [mailto:adityakishore@gmail.com]
Sent: Monday, March 21, 2016 10:26 AM
To: Kevin Verhoeven <Ke...@ds-iq.com>
Cc: Kumiko Yada <Ku...@ds-iq.com>; user@drill.apache.org; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Since I suspected that it was a bug in HBase, I tried it with the original version you reported in the first post in this thread, i.e. CDH 5.4.3.
If it was back-ported to 5.4.7, upgrading should fix this issue.

On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven <Ke...@ds-iq.com>> wrote:
Aditya,

Thank you for your help. What version of CDH are you running? I contacted Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.

Thanks,

Kevin

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Sunday, March 20, 2016 10:45 PM

To: Kumiko Yada <Ku...@ds-iq.com>>
Cc: user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Finally managed to reproduce it with CDH distribution (So far I was testing with HBase 1.1 distributed with MapR, which does not have this bug).
This is essentially an HBase bug, HBASE-13262[1], which has been fixed in 1.0.1, 1.1.0.
Please update your HBase distribution.

[1] https://issues.apache.org/jira/browse/HBASE-13262

On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Aditya,

When we were exchanging the emails, you mentioned to me that you discovered another issue in case where the table is spit into multiple regions and the first region returned to the client did not have any rows.  I think this issue is related to the issue that I’m seeing.  Have you opened the JIRA for this issue?  Have you investigated/fixed this issue?

Thanks
Kumiko

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Thursday, March 17, 2016 3:02 PM
To: Kumiko Yada <Ku...@ds-iq.com>>
Cc: user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>

Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have failed so far.
From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops returning rows after a while which seem odd.
Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>>
Cc: altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:

> Hello,
>
> Does anyone have any update on this issue,
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the
> incorrect result? In the incorrect case I think we are just misreading
> the HBase metadata during our optimization to return row counts
> without reading any data. This should be really fast, and noticeably
> different than running a complete query, even with a small dataset as
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is
> occurring, I will hopefully have time soon to get this fixed but I'm
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada <Ku...@ds-iq.com>>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse <al...@gmail.com>>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> did differently; however your guess were correct.  When I did the one
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important
> for us that this is fixed.  Let me know if I can do anything to get
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family,
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family,
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try
> force a full scan, but the result was the same, just 10000 rows
> selected.  With the same table (contains 6 columns), I run the query
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com> <
> altekrusejason@gmail.com<ma...@gmail.com>>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try
> to skip reading the dataset all together if we have metadata for the
> number of rows and all that is requested is a count(*). I assume that
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>>
>
> Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but
> this is a critical wrong results issues, so I will be taking a look at
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>
> > Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > row count returned when the Hbase table contains only 1 column
> > family, 1 column, but the incorrect row count is returned for the
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish [mailto:abhishek.girish@gmail.com<ma...@gmail.com>]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>>
> > Cc: Aditya Kishore <ad...@gmail.com>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > now and don't see the issue. Also, I am on a MapR setup, which might
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > <al...@gmail.com>
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what
> > > version did you upgrade to that solved the problem? I assume this
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com<ma...@gmail.com>
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > to fetch all records, whereas I was only able to see a small
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was
> > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com<ma...@gmail.com>
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > <Ku...@ds-iq.com>>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>




Re: Drill query does not return all results from HBase

Posted by Aditya <ad...@gmail.com>.
Since I suspected that it was a bug in HBase, I tried it with the original
version you reported in the first post in this thread, i.e. CDH 5.4.3.

If it was back-ported to 5.4.7, upgrading should fix this issue.

On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven <Kevin.Verhoeven@ds-iq.com
> wrote:

> Aditya,
>
>
>
> Thank you for your help. What version of CDH are you running? I contacted
> Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.
>
>
>
> Thanks,
>
>
>
> Kevin
>
>
>
> *From:* Aditya [mailto:adityakishore@gmail.com]
> *Sent:* Sunday, March 20, 2016 10:45 PM
>
> *To:* Kumiko Yada <Ku...@ds-iq.com>
> *Cc:* user@drill.apache.org; dev@drill.apache.org;
> altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> Finally managed to reproduce it with CDH distribution (So far I was
> testing with HBase 1.1 distributed with MapR, which does not have this bug).
>
> This is essentially an HBase bug, HBASE-13262[1], which has been fixed in
> 1.0.1, 1.1.0.
>
> Please update your HBase distribution.
>
>
> [1] https://issues.apache.org/jira/browse/HBASE-13262
>
>
>
> On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> Aditya,
>
>
>
> When we were exchanging the emails, you mentioned to me that you
> discovered another issue in case where the table is spit into multiple
> regions and the first region returned to the client did not have any rows.
> I think this issue is related to the issue that I’m seeing.  Have you
> opened the JIRA for this issue?  Have you investigated/fixed this issue?
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Aditya [mailto:adityakishore@gmail.com]
> *Sent:* Thursday, March 17, 2016 3:02 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>
> *Cc:* user@drill.apache.org; dev@drill.apache.org;
> altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> Hi Kumiko,
>
> I have tried to reproduce this locally with Apache 1.x release but have
> failed so far.
>
> From my mail exchange with Kevin on another thread, it appears that the
> HBase scanner stops returning rows after a while which seem odd.
>
> Probably it is unique to CDH distribution. I am planning to setup a single
> node CDH cluster to see if it I can reproduce it there.
>
>
>
> On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> Hello,
>
> I provided all information that was requested; however, I haven't heard
> back anything since February 24.
>
> Is anyone taking look at this?  Are there any workarounds?
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Aditya [mailto:adityakishore@gmail.com]
> Sent: Friday, February 19, 2016 12:48 PM
> To: user <us...@drill.apache.org>
>
> Cc: altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin
> Verhoeven <Ke...@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I apologies for not chiming in until now, considering that if there is a
> bug here it is most probably put in by me :)
>
> I've assigned the JIRA to myself and going to take a l look.
>
> Would it be possible for you to either attach to the JIRA or send me
> privately the Drill query profiles form both the correct and the incorrect
> executions?
>
> Regards,
> aditya...
>
> On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> > Hello,
> >
> > Does anyone have any update on this issue,
> > https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> > that this would be investigated/fixed?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Thursday, January 14, 2016 3:44 PM
> > To: user@drill.apache.org; altekrusejason@gmail.com
> > Subject: RE: Drill query does not return all results from HBase
> >
> > The query time was very short on the one with the incorrect result.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Thursday, January 14, 2016 1:25 PM
> > To: user <us...@drill.apache.org>
> > Subject: Fwd: Drill query does not return all results from HBase
> >
> > Thanks for the update, I'm forwarding your message back to the list.
> >
> > Just to confirm, was the query time longer on the the one with the
> > incorrect result? In the incorrect case I think we are just misreading
> > the HBase metadata during our optimization to return row counts
> > without reading any data. This should be really fast, and noticeably
> > different than running a complete query, even with a small dataset as
> > we have to read in your table and run an aggregation over it.
> >
> > This would just be a final confirmation of where the issue is
> > occurring, I will hopefully have time soon to get this fixed but I'm
> > wrapping up some other things right now.
> >
> >
> > ---------- Forwarded message ----------
> > From: Kumiko Yada <Ku...@ds-iq.com>
> > Date: Thu, Jan 14, 2016 at 12:53 PM
> > Subject: RE: Drill query does not return all results from HBase
> > To: Jason Altekruse <al...@gmail.com>
> >
> >
> > Jason,
> >
> >
> >
> > I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> > did differently; however your guess were correct.  When I did the one
> > column count, the row count was correct.  Here is the additional testing
> results.
> >
> >
> >
> > My company has been invested to use the drill, and it’s very important
> > for us that this is fixed.  Let me know if I can do anything to get
> > this issue to be fixed.  I really appreciate you that you are looking
> into issue!
> >
> > Hbase table (1 column family, 5 columns, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (1 column family, 6 columns,  10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (**returned 6724 rows)*
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (2 column family, 6 columns in each columns family,
> > 10000000
> > rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 3362 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbase table (2 column family, 2 columns in each columns family,
> > 10000000
> > rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbasetable (2 column family, 4 columns in one column family and 2
> > columns in other column family, 10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 6723 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbasetable (2 column family, 1 column in one column family and 3
> > columns in other column family, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> >
> >
> > Thanks
> >
> > Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:28 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > I also run the query to display only 1 column with no limit to try
> > force a full scan, but the result was the same, just 10000 rows
> > selected.  With the same table (contains 6 columns), I run the query
> > to display the row_key, and it display all records, 10,000,000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:24 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > Jason
> >
> >
> >
> > I run the query to display only 1 column for 100000 rows, and it only
> > returned 10000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Jason Altekruse [mailto:altekrusejason@gmail.com <
> > altekrusejason@gmail.com>]
> > *Sent:* Wednesday, January 13, 2016 6:39 PM
> > *To:* Kumiko Yada <Ku...@ds-iq.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> >
> > *Subject:* Re: Drill query does not return all results from HBase
> >
> >
> >
> > I know in a number of cases we have special optimizer rules that try
> > to skip reading the dataset all together if we have metadata for the
> > number of rows and all that is requested is a count(*). I assume that
> > this is the case with HBase, and this may be where we aren't doing
> something correctly.
> > Can you try to run a 'sum', or other aggregate query on one of the
> > columns to see if a full scan of the data is operating correctly?
> >
> >
> >
> > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > Thank you, Jason!
> >
> > Let me know if you need any help on this. I will be glad to help on
> > repro and/or test the fix.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Wednesday, January 13, 2016 6:24 PM
> > To: user <us...@drill.apache.org>
> >
> > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Thanks for filing the issue. I haven't worked much with HBase, but
> > this is a critical wrong results issues, so I will be taking a look at
> > this soon if no one else raises their hand.
> >
> > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > > I opened the bug on this.  The drill is returning the correct rows
> > > when the hbase contains 5 or less columns, but not 6 or more columns.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4271
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > > Sent: Wednesday, January 13, 2016 4:52 PM
> > > To: user@drill.apache.org
> > > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > > Kevin.Verhoeven@ds-iq.com>
> > > Subject: RE: Drill query does not return all results from HBase
> > >
> > > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > > row count returned when the Hbase table contains only 1 column
> > > family, 1 column, but the incorrect row count is returned for the
> > > Hbase table contains 1 column family, 6 columns.
> > >
> > > This looks like the Drill issue.  Has anyone found any workaround?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > > Sent: Tuesday, January 12, 2016 6:51 PM
> > > To: user <us...@drill.apache.org>
> > > Cc: Aditya Kishore <ad...@gmail.com>
> > > Subject: Re: Drill query does not return all results from HBase
> > >
> > > Well, the major version din't change if I remember it right, hence
> > > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > > now and don't see the issue. Also, I am on a MapR setup, which might
> > > not be comparable with their CDH setups.
> > >
> > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > > <altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > Abhishek,
> > > >
> > > > What version of HBase did you have the problem with, and what
> > > > version did you upgrade to that solved the problem? I assume this
> > > > would be useful information to compare your setup with Kevin's and
> > Kumiko's.
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com
> > > > > wrote:
> > > >
> > > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > > to fetch all records, whereas I was only able to see a small
> > > > > subset of records
> > > > when
> > > > > queried from Drill. Each time I inserted 1000 records, only
> > > > > about
> > > > > 50 of those would show up.
> > > > >
> > > > > Although I could repro' the problem consistently, it was
> > > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > > might not have to do with
> > > > Drill
> > > > > itself.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I'm not sure why this is happening, we have tests in our
> > > > > > automated
> > > > suite
> > > > > > that I believe run some pretty large queries against Hbase and
> > > > > > verify
> > > > the
> > > > > > results.
> > > > > >
> > > > > > Aditya, do you have some time available to try to reproduce
> > > > > > this and diagnose the problem?
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > > <Ku...@ds-iq.com>
> > > > > wrote:
> > > > > >
> > > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kumiko
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > > To: user@drill.apache.org
> > > > > > > Subject: Drill query does not return all results from HBase
> > > > > > >
> > > > > > > We have a problem where a Drill query against HBase does not
> > > > > > > return
> > > > all
> > > > > > > results. The following query should return over 100,000
> > > > > > > rows, but we
> > > > > only
> > > > > > > get about 1,030 back.
> > > > > > >
> > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > > customer_number =
> > > > > 800
> > > > > > >
> > > > > > > If we scan directly using the hbase shell we see over
> > > > > > > 100,000 rows,
> > > > but
> > > > > > > the same Drill query does not return a fraction of the
> > > > > > > expected
> > > > > results.
> > > > > > We
> > > > > > > have also run a count against the table and Drill returns
> > > > > > > the same
> > > > > 1,030
> > > > > > > number, which is far less than expect. What could be going
> wrong?
> > > > > > >
> > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > > (HBase
> > > > 1.0).
> > > > > > We
> > > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > > billion
> > > rows.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Kevin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
>
>

Re: Drill query does not return all results from HBase

Posted by Aditya <ad...@gmail.com>.
Since I suspected that it was a bug in HBase, I tried it with the original
version you reported in the first post in this thread, i.e. CDH 5.4.3.

If it was back-ported to 5.4.7, upgrading should fix this issue.

On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven <Kevin.Verhoeven@ds-iq.com
> wrote:

> Aditya,
>
>
>
> Thank you for your help. What version of CDH are you running? I contacted
> Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.
>
>
>
> Thanks,
>
>
>
> Kevin
>
>
>
> *From:* Aditya [mailto:adityakishore@gmail.com]
> *Sent:* Sunday, March 20, 2016 10:45 PM
>
> *To:* Kumiko Yada <Ku...@ds-iq.com>
> *Cc:* user@drill.apache.org; dev@drill.apache.org;
> altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> Finally managed to reproduce it with CDH distribution (So far I was
> testing with HBase 1.1 distributed with MapR, which does not have this bug).
>
> This is essentially an HBase bug, HBASE-13262[1], which has been fixed in
> 1.0.1, 1.1.0.
>
> Please update your HBase distribution.
>
>
> [1] https://issues.apache.org/jira/browse/HBASE-13262
>
>
>
> On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> Aditya,
>
>
>
> When we were exchanging the emails, you mentioned to me that you
> discovered another issue in case where the table is spit into multiple
> regions and the first region returned to the client did not have any rows.
> I think this issue is related to the issue that I’m seeing.  Have you
> opened the JIRA for this issue?  Have you investigated/fixed this issue?
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Aditya [mailto:adityakishore@gmail.com]
> *Sent:* Thursday, March 17, 2016 3:02 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>
> *Cc:* user@drill.apache.org; dev@drill.apache.org;
> altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> Hi Kumiko,
>
> I have tried to reproduce this locally with Apache 1.x release but have
> failed so far.
>
> From my mail exchange with Kevin on another thread, it appears that the
> HBase scanner stops returning rows after a while which seem odd.
>
> Probably it is unique to CDH distribution. I am planning to setup a single
> node CDH cluster to see if it I can reproduce it there.
>
>
>
> On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> Hello,
>
> I provided all information that was requested; however, I haven't heard
> back anything since February 24.
>
> Is anyone taking look at this?  Are there any workarounds?
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Aditya [mailto:adityakishore@gmail.com]
> Sent: Friday, February 19, 2016 12:48 PM
> To: user <us...@drill.apache.org>
>
> Cc: altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin
> Verhoeven <Ke...@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I apologies for not chiming in until now, considering that if there is a
> bug here it is most probably put in by me :)
>
> I've assigned the JIRA to myself and going to take a l look.
>
> Would it be possible for you to either attach to the JIRA or send me
> privately the Drill query profiles form both the correct and the incorrect
> executions?
>
> Regards,
> aditya...
>
> On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> > Hello,
> >
> > Does anyone have any update on this issue,
> > https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> > that this would be investigated/fixed?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Thursday, January 14, 2016 3:44 PM
> > To: user@drill.apache.org; altekrusejason@gmail.com
> > Subject: RE: Drill query does not return all results from HBase
> >
> > The query time was very short on the one with the incorrect result.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Thursday, January 14, 2016 1:25 PM
> > To: user <us...@drill.apache.org>
> > Subject: Fwd: Drill query does not return all results from HBase
> >
> > Thanks for the update, I'm forwarding your message back to the list.
> >
> > Just to confirm, was the query time longer on the the one with the
> > incorrect result? In the incorrect case I think we are just misreading
> > the HBase metadata during our optimization to return row counts
> > without reading any data. This should be really fast, and noticeably
> > different than running a complete query, even with a small dataset as
> > we have to read in your table and run an aggregation over it.
> >
> > This would just be a final confirmation of where the issue is
> > occurring, I will hopefully have time soon to get this fixed but I'm
> > wrapping up some other things right now.
> >
> >
> > ---------- Forwarded message ----------
> > From: Kumiko Yada <Ku...@ds-iq.com>
> > Date: Thu, Jan 14, 2016 at 12:53 PM
> > Subject: RE: Drill query does not return all results from HBase
> > To: Jason Altekruse <al...@gmail.com>
> >
> >
> > Jason,
> >
> >
> >
> > I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> > did differently; however your guess were correct.  When I did the one
> > column count, the row count was correct.  Here is the additional testing
> results.
> >
> >
> >
> > My company has been invested to use the drill, and it’s very important
> > for us that this is fixed.  Let me know if I can do anything to get
> > this issue to be fixed.  I really appreciate you that you are looking
> into issue!
> >
> > Hbase table (1 column family, 5 columns, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (1 column family, 6 columns,  10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (**returned 6724 rows)*
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (2 column family, 6 columns in each columns family,
> > 10000000
> > rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 3362 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbase table (2 column family, 2 columns in each columns family,
> > 10000000
> > rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbasetable (2 column family, 4 columns in one column family and 2
> > columns in other column family, 10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 6723 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbasetable (2 column family, 1 column in one column family and 3
> > columns in other column family, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> >
> >
> > Thanks
> >
> > Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:28 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > I also run the query to display only 1 column with no limit to try
> > force a full scan, but the result was the same, just 10000 rows
> > selected.  With the same table (contains 6 columns), I run the query
> > to display the row_key, and it display all records, 10,000,000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:24 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > Jason
> >
> >
> >
> > I run the query to display only 1 column for 100000 rows, and it only
> > returned 10000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Jason Altekruse [mailto:altekrusejason@gmail.com <
> > altekrusejason@gmail.com>]
> > *Sent:* Wednesday, January 13, 2016 6:39 PM
> > *To:* Kumiko Yada <Ku...@ds-iq.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> >
> > *Subject:* Re: Drill query does not return all results from HBase
> >
> >
> >
> > I know in a number of cases we have special optimizer rules that try
> > to skip reading the dataset all together if we have metadata for the
> > number of rows and all that is requested is a count(*). I assume that
> > this is the case with HBase, and this may be where we aren't doing
> something correctly.
> > Can you try to run a 'sum', or other aggregate query on one of the
> > columns to see if a full scan of the data is operating correctly?
> >
> >
> >
> > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > Thank you, Jason!
> >
> > Let me know if you need any help on this. I will be glad to help on
> > repro and/or test the fix.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Wednesday, January 13, 2016 6:24 PM
> > To: user <us...@drill.apache.org>
> >
> > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Thanks for filing the issue. I haven't worked much with HBase, but
> > this is a critical wrong results issues, so I will be taking a look at
> > this soon if no one else raises their hand.
> >
> > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > > I opened the bug on this.  The drill is returning the correct rows
> > > when the hbase contains 5 or less columns, but not 6 or more columns.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4271
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > > Sent: Wednesday, January 13, 2016 4:52 PM
> > > To: user@drill.apache.org
> > > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > > Kevin.Verhoeven@ds-iq.com>
> > > Subject: RE: Drill query does not return all results from HBase
> > >
> > > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > > row count returned when the Hbase table contains only 1 column
> > > family, 1 column, but the incorrect row count is returned for the
> > > Hbase table contains 1 column family, 6 columns.
> > >
> > > This looks like the Drill issue.  Has anyone found any workaround?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > > Sent: Tuesday, January 12, 2016 6:51 PM
> > > To: user <us...@drill.apache.org>
> > > Cc: Aditya Kishore <ad...@gmail.com>
> > > Subject: Re: Drill query does not return all results from HBase
> > >
> > > Well, the major version din't change if I remember it right, hence
> > > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > > now and don't see the issue. Also, I am on a MapR setup, which might
> > > not be comparable with their CDH setups.
> > >
> > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > > <altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > Abhishek,
> > > >
> > > > What version of HBase did you have the problem with, and what
> > > > version did you upgrade to that solved the problem? I assume this
> > > > would be useful information to compare your setup with Kevin's and
> > Kumiko's.
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com
> > > > > wrote:
> > > >
> > > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > > to fetch all records, whereas I was only able to see a small
> > > > > subset of records
> > > > when
> > > > > queried from Drill. Each time I inserted 1000 records, only
> > > > > about
> > > > > 50 of those would show up.
> > > > >
> > > > > Although I could repro' the problem consistently, it was
> > > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > > might not have to do with
> > > > Drill
> > > > > itself.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I'm not sure why this is happening, we have tests in our
> > > > > > automated
> > > > suite
> > > > > > that I believe run some pretty large queries against Hbase and
> > > > > > verify
> > > > the
> > > > > > results.
> > > > > >
> > > > > > Aditya, do you have some time available to try to reproduce
> > > > > > this and diagnose the problem?
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > > <Ku...@ds-iq.com>
> > > > > wrote:
> > > > > >
> > > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kumiko
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > > To: user@drill.apache.org
> > > > > > > Subject: Drill query does not return all results from HBase
> > > > > > >
> > > > > > > We have a problem where a Drill query against HBase does not
> > > > > > > return
> > > > all
> > > > > > > results. The following query should return over 100,000
> > > > > > > rows, but we
> > > > > only
> > > > > > > get about 1,030 back.
> > > > > > >
> > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > > customer_number =
> > > > > 800
> > > > > > >
> > > > > > > If we scan directly using the hbase shell we see over
> > > > > > > 100,000 rows,
> > > > but
> > > > > > > the same Drill query does not return a fraction of the
> > > > > > > expected
> > > > > results.
> > > > > > We
> > > > > > > have also run a count against the table and Drill returns
> > > > > > > the same
> > > > > 1,030
> > > > > > > number, which is far less than expect. What could be going
> wrong?
> > > > > > >
> > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > > (HBase
> > > > 1.0).
> > > > > > We
> > > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > > billion
> > > rows.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Kevin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
>
>

RE: Drill query does not return all results from HBase

Posted by Kevin Verhoeven <Ke...@ds-iq.com>.
Aditya,

Thank you for your help. What version of CDH are you running? I contacted Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.

Thanks,

Kevin

From: Aditya [mailto:adityakishore@gmail.com]
Sent: Sunday, March 20, 2016 10:45 PM
To: Kumiko Yada <Ku...@ds-iq.com>
Cc: user@drill.apache.org; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <Ke...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Finally managed to reproduce it with CDH distribution (So far I was testing with HBase 1.1 distributed with MapR, which does not have this bug).
This is essentially an HBase bug, HBASE-13262[1], which has been fixed in 1.0.1, 1.1.0.
Please update your HBase distribution.

[1] https://issues.apache.org/jira/browse/HBASE-13262

On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Aditya,

When we were exchanging the emails, you mentioned to me that you discovered another issue in case where the table is spit into multiple regions and the first region returned to the client did not have any rows.  I think this issue is related to the issue that I’m seeing.  Have you opened the JIRA for this issue?  Have you investigated/fixed this issue?

Thanks
Kumiko

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Thursday, March 17, 2016 3:02 PM
To: Kumiko Yada <Ku...@ds-iq.com>>
Cc: user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>

Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have failed so far.
From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops returning rows after a while which seem odd.
Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>>
Cc: altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:

> Hello,
>
> Does anyone have any update on this issue,
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the
> incorrect result? In the incorrect case I think we are just misreading
> the HBase metadata during our optimization to return row counts
> without reading any data. This should be really fast, and noticeably
> different than running a complete query, even with a small dataset as
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is
> occurring, I will hopefully have time soon to get this fixed but I'm
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada <Ku...@ds-iq.com>>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse <al...@gmail.com>>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> did differently; however your guess were correct.  When I did the one
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important
> for us that this is fixed.  Let me know if I can do anything to get
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family,
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family,
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try
> force a full scan, but the result was the same, just 10000 rows
> selected.  With the same table (contains 6 columns), I run the query
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com> <
> altekrusejason@gmail.com<ma...@gmail.com>>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try
> to skip reading the dataset all together if we have metadata for the
> number of rows and all that is requested is a count(*). I assume that
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>>
>
> Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but
> this is a critical wrong results issues, so I will be taking a look at
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>
> > Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > row count returned when the Hbase table contains only 1 column
> > family, 1 column, but the incorrect row count is returned for the
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish [mailto:abhishek.girish@gmail.com<ma...@gmail.com>]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>>
> > Cc: Aditya Kishore <ad...@gmail.com>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > now and don't see the issue. Also, I am on a MapR setup, which might
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > <al...@gmail.com>
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what
> > > version did you upgrade to that solved the problem? I assume this
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com<ma...@gmail.com>
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > to fetch all records, whereas I was only able to see a small
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was
> > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com<ma...@gmail.com>
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > <Ku...@ds-iq.com>>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



RE: Drill query does not return all results from HBase

Posted by Kevin Verhoeven <Ke...@ds-iq.com>.
Aditya,

Thank you for your help. What version of CDH are you running? I contacted Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.

Thanks,

Kevin

From: Aditya [mailto:adityakishore@gmail.com]
Sent: Sunday, March 20, 2016 10:45 PM
To: Kumiko Yada <Ku...@ds-iq.com>
Cc: user@drill.apache.org; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <Ke...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Finally managed to reproduce it with CDH distribution (So far I was testing with HBase 1.1 distributed with MapR, which does not have this bug).
This is essentially an HBase bug, HBASE-13262[1], which has been fixed in 1.0.1, 1.1.0.
Please update your HBase distribution.

[1] https://issues.apache.org/jira/browse/HBASE-13262

On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Aditya,

When we were exchanging the emails, you mentioned to me that you discovered another issue in case where the table is spit into multiple regions and the first region returned to the client did not have any rows.  I think this issue is related to the issue that I’m seeing.  Have you opened the JIRA for this issue?  Have you investigated/fixed this issue?

Thanks
Kumiko

From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Thursday, March 17, 2016 3:02 PM
To: Kumiko Yada <Ku...@ds-iq.com>>
Cc: user@drill.apache.org<ma...@drill.apache.org>; dev@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>

Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have failed so far.
From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops returning rows after a while which seem odd.
Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>>
Cc: altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:

> Hello,
>
> Does anyone have any update on this issue,
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the
> incorrect result? In the incorrect case I think we are just misreading
> the HBase metadata during our optimization to return row counts
> without reading any data. This should be really fast, and noticeably
> different than running a complete query, even with a small dataset as
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is
> occurring, I will hopefully have time soon to get this fixed but I'm
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada <Ku...@ds-iq.com>>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse <al...@gmail.com>>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> did differently; however your guess were correct.  When I did the one
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important
> for us that this is fixed.  Let me know if I can do anything to get
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family,
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family,
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try
> force a full scan, but the result was the same, just 10000 rows
> selected.  With the same table (contains 6 columns), I run the query
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com> <
> altekrusejason@gmail.com<ma...@gmail.com>>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try
> to skip reading the dataset all together if we have metadata for the
> number of rows and all that is requested is a count(*). I assume that
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>>
>
> Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but
> this is a critical wrong results issues, so I will be taking a look at
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>
> > Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > row count returned when the Hbase table contains only 1 column
> > family, 1 column, but the incorrect row count is returned for the
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish [mailto:abhishek.girish@gmail.com<ma...@gmail.com>]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>>
> > Cc: Aditya Kishore <ad...@gmail.com>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > now and don't see the issue. Also, I am on a MapR setup, which might
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > <al...@gmail.com>
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what
> > > version did you upgrade to that solved the problem? I assume this
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com<ma...@gmail.com>
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > to fetch all records, whereas I was only able to see a small
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was
> > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com<ma...@gmail.com>
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > <Ku...@ds-iq.com>>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



Re: Drill query does not return all results from HBase

Posted by Aditya <ad...@gmail.com>.
Finally managed to reproduce it with CDH distribution (So far I was testing
with HBase 1.1 distributed with MapR, which does not have this bug).

This is essentially an HBase bug, HBASE-13262[1], which has been fixed in
1.0.1, 1.1.0.

Please update your HBase distribution.

[1] https://issues.apache.org/jira/browse/HBASE-13262

On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> Aditya,
>
>
>
> When we were exchanging the emails, you mentioned to me that you
> discovered another issue in case where the table is spit into multiple
> regions and the first region returned to the client did not have any rows.
> I think this issue is related to the issue that I’m seeing.  Have you
> opened the JIRA for this issue?  Have you investigated/fixed this issue?
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Aditya [mailto:adityakishore@gmail.com]
> *Sent:* Thursday, March 17, 2016 3:02 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>
> *Cc:* user@drill.apache.org; dev@drill.apache.org;
> altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> Hi Kumiko,
>
> I have tried to reproduce this locally with Apache 1.x release but have
> failed so far.
>
> From my mail exchange with Kevin on another thread, it appears that the
> HBase scanner stops returning rows after a while which seem odd.
>
> Probably it is unique to CDH distribution. I am planning to setup a single
> node CDH cluster to see if it I can reproduce it there.
>
>
>
> On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> Hello,
>
> I provided all information that was requested; however, I haven't heard
> back anything since February 24.
>
> Is anyone taking look at this?  Are there any workarounds?
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Aditya [mailto:adityakishore@gmail.com]
> Sent: Friday, February 19, 2016 12:48 PM
> To: user <us...@drill.apache.org>
>
> Cc: altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin
> Verhoeven <Ke...@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I apologies for not chiming in until now, considering that if there is a
> bug here it is most probably put in by me :)
>
> I've assigned the JIRA to myself and going to take a l look.
>
> Would it be possible for you to either attach to the JIRA or send me
> privately the Drill query profiles form both the correct and the incorrect
> executions?
>
> Regards,
> aditya...
>
> On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> > Hello,
> >
> > Does anyone have any update on this issue,
> > https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> > that this would be investigated/fixed?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Thursday, January 14, 2016 3:44 PM
> > To: user@drill.apache.org; altekrusejason@gmail.com
> > Subject: RE: Drill query does not return all results from HBase
> >
> > The query time was very short on the one with the incorrect result.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Thursday, January 14, 2016 1:25 PM
> > To: user <us...@drill.apache.org>
> > Subject: Fwd: Drill query does not return all results from HBase
> >
> > Thanks for the update, I'm forwarding your message back to the list.
> >
> > Just to confirm, was the query time longer on the the one with the
> > incorrect result? In the incorrect case I think we are just misreading
> > the HBase metadata during our optimization to return row counts
> > without reading any data. This should be really fast, and noticeably
> > different than running a complete query, even with a small dataset as
> > we have to read in your table and run an aggregation over it.
> >
> > This would just be a final confirmation of where the issue is
> > occurring, I will hopefully have time soon to get this fixed but I'm
> > wrapping up some other things right now.
> >
> >
> > ---------- Forwarded message ----------
> > From: Kumiko Yada <Ku...@ds-iq.com>
> > Date: Thu, Jan 14, 2016 at 12:53 PM
> > Subject: RE: Drill query does not return all results from HBase
> > To: Jason Altekruse <al...@gmail.com>
> >
> >
> > Jason,
> >
> >
> >
> > I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> > did differently; however your guess were correct.  When I did the one
> > column count, the row count was correct.  Here is the additional testing
> results.
> >
> >
> >
> > My company has been invested to use the drill, and it’s very important
> > for us that this is fixed.  Let me know if I can do anything to get
> > this issue to be fixed.  I really appreciate you that you are looking
> into issue!
> >
> > Hbase table (1 column family, 5 columns, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (1 column family, 6 columns,  10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (**returned 6724 rows)*
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (2 column family, 6 columns in each columns family,
> > 10000000
> > rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 3362 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbase table (2 column family, 2 columns in each columns family,
> > 10000000
> > rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbasetable (2 column family, 4 columns in one column family and 2
> > columns in other column family, 10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 6723 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbasetable (2 column family, 1 column in one column family and 3
> > columns in other column family, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> >
> >
> > Thanks
> >
> > Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:28 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > I also run the query to display only 1 column with no limit to try
> > force a full scan, but the result was the same, just 10000 rows
> > selected.  With the same table (contains 6 columns), I run the query
> > to display the row_key, and it display all records, 10,000,000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:24 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > Jason
> >
> >
> >
> > I run the query to display only 1 column for 100000 rows, and it only
> > returned 10000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Jason Altekruse [mailto:altekrusejason@gmail.com <
> > altekrusejason@gmail.com>]
> > *Sent:* Wednesday, January 13, 2016 6:39 PM
> > *To:* Kumiko Yada <Ku...@ds-iq.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> >
> > *Subject:* Re: Drill query does not return all results from HBase
> >
> >
> >
> > I know in a number of cases we have special optimizer rules that try
> > to skip reading the dataset all together if we have metadata for the
> > number of rows and all that is requested is a count(*). I assume that
> > this is the case with HBase, and this may be where we aren't doing
> something correctly.
> > Can you try to run a 'sum', or other aggregate query on one of the
> > columns to see if a full scan of the data is operating correctly?
> >
> >
> >
> > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > Thank you, Jason!
> >
> > Let me know if you need any help on this. I will be glad to help on
> > repro and/or test the fix.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Wednesday, January 13, 2016 6:24 PM
> > To: user <us...@drill.apache.org>
> >
> > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Thanks for filing the issue. I haven't worked much with HBase, but
> > this is a critical wrong results issues, so I will be taking a look at
> > this soon if no one else raises their hand.
> >
> > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > > I opened the bug on this.  The drill is returning the correct rows
> > > when the hbase contains 5 or less columns, but not 6 or more columns.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4271
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > > Sent: Wednesday, January 13, 2016 4:52 PM
> > > To: user@drill.apache.org
> > > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > > Kevin.Verhoeven@ds-iq.com>
> > > Subject: RE: Drill query does not return all results from HBase
> > >
> > > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > > row count returned when the Hbase table contains only 1 column
> > > family, 1 column, but the incorrect row count is returned for the
> > > Hbase table contains 1 column family, 6 columns.
> > >
> > > This looks like the Drill issue.  Has anyone found any workaround?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > > Sent: Tuesday, January 12, 2016 6:51 PM
> > > To: user <us...@drill.apache.org>
> > > Cc: Aditya Kishore <ad...@gmail.com>
> > > Subject: Re: Drill query does not return all results from HBase
> > >
> > > Well, the major version din't change if I remember it right, hence
> > > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > > now and don't see the issue. Also, I am on a MapR setup, which might
> > > not be comparable with their CDH setups.
> > >
> > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > > <altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > Abhishek,
> > > >
> > > > What version of HBase did you have the problem with, and what
> > > > version did you upgrade to that solved the problem? I assume this
> > > > would be useful information to compare your setup with Kevin's and
> > Kumiko's.
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com
> > > > > wrote:
> > > >
> > > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > > to fetch all records, whereas I was only able to see a small
> > > > > subset of records
> > > > when
> > > > > queried from Drill. Each time I inserted 1000 records, only
> > > > > about
> > > > > 50 of those would show up.
> > > > >
> > > > > Although I could repro' the problem consistently, it was
> > > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > > might not have to do with
> > > > Drill
> > > > > itself.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I'm not sure why this is happening, we have tests in our
> > > > > > automated
> > > > suite
> > > > > > that I believe run some pretty large queries against Hbase and
> > > > > > verify
> > > > the
> > > > > > results.
> > > > > >
> > > > > > Aditya, do you have some time available to try to reproduce
> > > > > > this and diagnose the problem?
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > > <Ku...@ds-iq.com>
> > > > > wrote:
> > > > > >
> > > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kumiko
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > > To: user@drill.apache.org
> > > > > > > Subject: Drill query does not return all results from HBase
> > > > > > >
> > > > > > > We have a problem where a Drill query against HBase does not
> > > > > > > return
> > > > all
> > > > > > > results. The following query should return over 100,000
> > > > > > > rows, but we
> > > > > only
> > > > > > > get about 1,030 back.
> > > > > > >
> > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > > customer_number =
> > > > > 800
> > > > > > >
> > > > > > > If we scan directly using the hbase shell we see over
> > > > > > > 100,000 rows,
> > > > but
> > > > > > > the same Drill query does not return a fraction of the
> > > > > > > expected
> > > > > results.
> > > > > > We
> > > > > > > have also run a count against the table and Drill returns
> > > > > > > the same
> > > > > 1,030
> > > > > > > number, which is far less than expect. What could be going
> wrong?
> > > > > > >
> > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > > (HBase
> > > > 1.0).
> > > > > > We
> > > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > > billion
> > > rows.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Kevin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>

Re: Drill query does not return all results from HBase

Posted by Aditya <ad...@gmail.com>.
Finally managed to reproduce it with CDH distribution (So far I was testing
with HBase 1.1 distributed with MapR, which does not have this bug).

This is essentially an HBase bug, HBASE-13262[1], which has been fixed in
1.0.1, 1.1.0.

Please update your HBase distribution.

[1] https://issues.apache.org/jira/browse/HBASE-13262

On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> Aditya,
>
>
>
> When we were exchanging the emails, you mentioned to me that you
> discovered another issue in case where the table is spit into multiple
> regions and the first region returned to the client did not have any rows.
> I think this issue is related to the issue that I’m seeing.  Have you
> opened the JIRA for this issue?  Have you investigated/fixed this issue?
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Aditya [mailto:adityakishore@gmail.com]
> *Sent:* Thursday, March 17, 2016 3:02 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>
> *Cc:* user@drill.apache.org; dev@drill.apache.org;
> altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> Hi Kumiko,
>
> I have tried to reproduce this locally with Apache 1.x release but have
> failed so far.
>
> From my mail exchange with Kevin on another thread, it appears that the
> HBase scanner stops returning rows after a while which seem odd.
>
> Probably it is unique to CDH distribution. I am planning to setup a single
> node CDH cluster to see if it I can reproduce it there.
>
>
>
> On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> Hello,
>
> I provided all information that was requested; however, I haven't heard
> back anything since February 24.
>
> Is anyone taking look at this?  Are there any workarounds?
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Aditya [mailto:adityakishore@gmail.com]
> Sent: Friday, February 19, 2016 12:48 PM
> To: user <us...@drill.apache.org>
>
> Cc: altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin
> Verhoeven <Ke...@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I apologies for not chiming in until now, considering that if there is a
> bug here it is most probably put in by me :)
>
> I've assigned the JIRA to myself and going to take a l look.
>
> Would it be possible for you to either attach to the JIRA or send me
> privately the Drill query profiles form both the correct and the incorrect
> executions?
>
> Regards,
> aditya...
>
> On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> > Hello,
> >
> > Does anyone have any update on this issue,
> > https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> > that this would be investigated/fixed?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Thursday, January 14, 2016 3:44 PM
> > To: user@drill.apache.org; altekrusejason@gmail.com
> > Subject: RE: Drill query does not return all results from HBase
> >
> > The query time was very short on the one with the incorrect result.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Thursday, January 14, 2016 1:25 PM
> > To: user <us...@drill.apache.org>
> > Subject: Fwd: Drill query does not return all results from HBase
> >
> > Thanks for the update, I'm forwarding your message back to the list.
> >
> > Just to confirm, was the query time longer on the the one with the
> > incorrect result? In the incorrect case I think we are just misreading
> > the HBase metadata during our optimization to return row counts
> > without reading any data. This should be really fast, and noticeably
> > different than running a complete query, even with a small dataset as
> > we have to read in your table and run an aggregation over it.
> >
> > This would just be a final confirmation of where the issue is
> > occurring, I will hopefully have time soon to get this fixed but I'm
> > wrapping up some other things right now.
> >
> >
> > ---------- Forwarded message ----------
> > From: Kumiko Yada <Ku...@ds-iq.com>
> > Date: Thu, Jan 14, 2016 at 12:53 PM
> > Subject: RE: Drill query does not return all results from HBase
> > To: Jason Altekruse <al...@gmail.com>
> >
> >
> > Jason,
> >
> >
> >
> > I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> > did differently; however your guess were correct.  When I did the one
> > column count, the row count was correct.  Here is the additional testing
> results.
> >
> >
> >
> > My company has been invested to use the drill, and it’s very important
> > for us that this is fixed.  Let me know if I can do anything to get
> > this issue to be fixed.  I really appreciate you that you are looking
> into issue!
> >
> > Hbase table (1 column family, 5 columns, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (1 column family, 6 columns,  10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (**returned 6724 rows)*
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (2 column family, 6 columns in each columns family,
> > 10000000
> > rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 3362 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbase table (2 column family, 2 columns in each columns family,
> > 10000000
> > rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbasetable (2 column family, 4 columns in one column family and 2
> > columns in other column family, 10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 6723 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbasetable (2 column family, 1 column in one column family and 3
> > columns in other column family, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> >
> >
> > Thanks
> >
> > Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:28 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > I also run the query to display only 1 column with no limit to try
> > force a full scan, but the result was the same, just 10000 rows
> > selected.  With the same table (contains 6 columns), I run the query
> > to display the row_key, and it display all records, 10,000,000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:24 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > Jason
> >
> >
> >
> > I run the query to display only 1 column for 100000 rows, and it only
> > returned 10000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Jason Altekruse [mailto:altekrusejason@gmail.com <
> > altekrusejason@gmail.com>]
> > *Sent:* Wednesday, January 13, 2016 6:39 PM
> > *To:* Kumiko Yada <Ku...@ds-iq.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> >
> > *Subject:* Re: Drill query does not return all results from HBase
> >
> >
> >
> > I know in a number of cases we have special optimizer rules that try
> > to skip reading the dataset all together if we have metadata for the
> > number of rows and all that is requested is a count(*). I assume that
> > this is the case with HBase, and this may be where we aren't doing
> something correctly.
> > Can you try to run a 'sum', or other aggregate query on one of the
> > columns to see if a full scan of the data is operating correctly?
> >
> >
> >
> > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > Thank you, Jason!
> >
> > Let me know if you need any help on this. I will be glad to help on
> > repro and/or test the fix.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Wednesday, January 13, 2016 6:24 PM
> > To: user <us...@drill.apache.org>
> >
> > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Thanks for filing the issue. I haven't worked much with HBase, but
> > this is a critical wrong results issues, so I will be taking a look at
> > this soon if no one else raises their hand.
> >
> > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > > I opened the bug on this.  The drill is returning the correct rows
> > > when the hbase contains 5 or less columns, but not 6 or more columns.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4271
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > > Sent: Wednesday, January 13, 2016 4:52 PM
> > > To: user@drill.apache.org
> > > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > > Kevin.Verhoeven@ds-iq.com>
> > > Subject: RE: Drill query does not return all results from HBase
> > >
> > > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > > row count returned when the Hbase table contains only 1 column
> > > family, 1 column, but the incorrect row count is returned for the
> > > Hbase table contains 1 column family, 6 columns.
> > >
> > > This looks like the Drill issue.  Has anyone found any workaround?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > > Sent: Tuesday, January 12, 2016 6:51 PM
> > > To: user <us...@drill.apache.org>
> > > Cc: Aditya Kishore <ad...@gmail.com>
> > > Subject: Re: Drill query does not return all results from HBase
> > >
> > > Well, the major version din't change if I remember it right, hence
> > > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > > now and don't see the issue. Also, I am on a MapR setup, which might
> > > not be comparable with their CDH setups.
> > >
> > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > > <altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > Abhishek,
> > > >
> > > > What version of HBase did you have the problem with, and what
> > > > version did you upgrade to that solved the problem? I assume this
> > > > would be useful information to compare your setup with Kevin's and
> > Kumiko's.
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com
> > > > > wrote:
> > > >
> > > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > > to fetch all records, whereas I was only able to see a small
> > > > > subset of records
> > > > when
> > > > > queried from Drill. Each time I inserted 1000 records, only
> > > > > about
> > > > > 50 of those would show up.
> > > > >
> > > > > Although I could repro' the problem consistently, it was
> > > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > > might not have to do with
> > > > Drill
> > > > > itself.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I'm not sure why this is happening, we have tests in our
> > > > > > automated
> > > > suite
> > > > > > that I believe run some pretty large queries against Hbase and
> > > > > > verify
> > > > the
> > > > > > results.
> > > > > >
> > > > > > Aditya, do you have some time available to try to reproduce
> > > > > > this and diagnose the problem?
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > > <Ku...@ds-iq.com>
> > > > > wrote:
> > > > > >
> > > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kumiko
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > > To: user@drill.apache.org
> > > > > > > Subject: Drill query does not return all results from HBase
> > > > > > >
> > > > > > > We have a problem where a Drill query against HBase does not
> > > > > > > return
> > > > all
> > > > > > > results. The following query should return over 100,000
> > > > > > > rows, but we
> > > > > only
> > > > > > > get about 1,030 back.
> > > > > > >
> > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > > customer_number =
> > > > > 800
> > > > > > >
> > > > > > > If we scan directly using the hbase shell we see over
> > > > > > > 100,000 rows,
> > > > but
> > > > > > > the same Drill query does not return a fraction of the
> > > > > > > expected
> > > > > results.
> > > > > > We
> > > > > > > have also run a count against the table and Drill returns
> > > > > > > the same
> > > > > 1,030
> > > > > > > number, which is far less than expect. What could be going
> wrong?
> > > > > > >
> > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > > (HBase
> > > > 1.0).
> > > > > > We
> > > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > > billion
> > > rows.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Kevin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>

RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
Aditya,

When we were exchanging the emails, you mentioned to me that you discovered another issue in case where the table is spit into multiple regions and the first region returned to the client did not have any rows.  I think this issue is related to the issue that I’m seeing.  Have you opened the JIRA for this issue?  Have you investigated/fixed this issue?

Thanks
Kumiko

From: Aditya [mailto:adityakishore@gmail.com]
Sent: Thursday, March 17, 2016 3:02 PM
To: Kumiko Yada <Ku...@ds-iq.com>
Cc: user@drill.apache.org; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <Ke...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have failed so far.
From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops returning rows after a while which seem odd.
Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>>
Cc: altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:

> Hello,
>
> Does anyone have any update on this issue,
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the
> incorrect result? In the incorrect case I think we are just misreading
> the HBase metadata during our optimization to return row counts
> without reading any data. This should be really fast, and noticeably
> different than running a complete query, even with a small dataset as
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is
> occurring, I will hopefully have time soon to get this fixed but I'm
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada <Ku...@ds-iq.com>>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse <al...@gmail.com>>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> did differently; however your guess were correct.  When I did the one
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important
> for us that this is fixed.  Let me know if I can do anything to get
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family,
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family,
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try
> force a full scan, but the result was the same, just 10000 rows
> selected.  With the same table (contains 6 columns), I run the query
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com> <
> altekrusejason@gmail.com<ma...@gmail.com>>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try
> to skip reading the dataset all together if we have metadata for the
> number of rows and all that is requested is a count(*). I assume that
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>>
>
> Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but
> this is a critical wrong results issues, so I will be taking a look at
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>
> > Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > row count returned when the Hbase table contains only 1 column
> > family, 1 column, but the incorrect row count is returned for the
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish [mailto:abhishek.girish@gmail.com<ma...@gmail.com>]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>>
> > Cc: Aditya Kishore <ad...@gmail.com>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > now and don't see the issue. Also, I am on a MapR setup, which might
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > <al...@gmail.com>
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what
> > > version did you upgrade to that solved the problem? I assume this
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com<ma...@gmail.com>
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > to fetch all records, whereas I was only able to see a small
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was
> > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com<ma...@gmail.com>
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > <Ku...@ds-iq.com>>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
Aditya,

When we were exchanging the emails, you mentioned to me that you discovered another issue in case where the table is spit into multiple regions and the first region returned to the client did not have any rows.  I think this issue is related to the issue that I’m seeing.  Have you opened the JIRA for this issue?  Have you investigated/fixed this issue?

Thanks
Kumiko

From: Aditya [mailto:adityakishore@gmail.com]
Sent: Thursday, March 17, 2016 3:02 PM
To: Kumiko Yada <Ku...@ds-iq.com>
Cc: user@drill.apache.org; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <Ke...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have failed so far.
From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops returning rows after a while which seem odd.
Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com<ma...@gmail.com>]
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>>
Cc: altekrusejason@gmail.com<ma...@gmail.com>; Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <Ke...@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>> wrote:

> Hello,
>
> Does anyone have any update on this issue,
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org<ma...@drill.apache.org>; altekrusejason@gmail.com<ma...@gmail.com>
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the
> incorrect result? In the incorrect case I think we are just misreading
> the HBase metadata during our optimization to return row counts
> without reading any data. This should be really fast, and noticeably
> different than running a complete query, even with a small dataset as
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is
> occurring, I will hopefully have time soon to get this fixed but I'm
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada <Ku...@ds-iq.com>>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse <al...@gmail.com>>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> did differently; however your guess were correct.  When I did the one
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important
> for us that this is fixed.  Let me know if I can do anything to get
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family,
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family,
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try
> force a full scan, but the result was the same, just 10000 rows
> selected.  With the same table (contains 6 columns), I run the query
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com> <
> altekrusejason@gmail.com<ma...@gmail.com>>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>>
> *Cc:* Ki Kang <Ki...@ds-iq.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try
> to skip reading the dataset all together if we have metadata for the
> number of rows and all that is requested is a count(*). I assume that
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<ma...@gmail.com>]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>>
>
> Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but
> this is a critical wrong results issues, so I will be taking a look at
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<ma...@ds-iq.com>]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org<ma...@drill.apache.org>
> > Cc: Aditya Kishore <ad...@gmail.com>>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > row count returned when the Hbase table contains only 1 column
> > family, 1 column, but the incorrect row count is returned for the
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish [mailto:abhishek.girish@gmail.com<ma...@gmail.com>]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>>
> > Cc: Aditya Kishore <ad...@gmail.com>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > now and don't see the issue. Also, I am on a MapR setup, which might
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > <al...@gmail.com>
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what
> > > version did you upgrade to that solved the problem? I assume this
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com<ma...@gmail.com>
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > to fetch all records, whereas I was only able to see a small
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was
> > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com<ma...@gmail.com>
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > <Ku...@ds-iq.com>>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com<ma...@ds-iq.com>]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org<ma...@drill.apache.org>
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Drill query does not return all results from HBase

Posted by Aditya <ad...@gmail.com>.
Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have
failed so far.

>From my mail exchange with Kevin on another thread, it appears that the
HBase scanner stops returning rows after a while which seem odd.

Probably it is unique to CDH distribution. I am planning to setup a single
node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> Hello,
>
> I provided all information that was requested; however, I haven't heard
> back anything since February 24.
>
> Is anyone taking look at this?  Are there any workarounds?
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Aditya [mailto:adityakishore@gmail.com]
> Sent: Friday, February 19, 2016 12:48 PM
> To: user <us...@drill.apache.org>
> Cc: altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin
> Verhoeven <Ke...@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I apologies for not chiming in until now, considering that if there is a
> bug here it is most probably put in by me :)
>
> I've assigned the JIRA to myself and going to take a l look.
>
> Would it be possible for you to either attach to the JIRA or send me
> privately the Drill query profiles form both the correct and the incorrect
> executions?
>
> Regards,
> aditya...
>
> On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> > Hello,
> >
> > Does anyone have any update on this issue,
> > https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> > that this would be investigated/fixed?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Thursday, January 14, 2016 3:44 PM
> > To: user@drill.apache.org; altekrusejason@gmail.com
> > Subject: RE: Drill query does not return all results from HBase
> >
> > The query time was very short on the one with the incorrect result.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Thursday, January 14, 2016 1:25 PM
> > To: user <us...@drill.apache.org>
> > Subject: Fwd: Drill query does not return all results from HBase
> >
> > Thanks for the update, I'm forwarding your message back to the list.
> >
> > Just to confirm, was the query time longer on the the one with the
> > incorrect result? In the incorrect case I think we are just misreading
> > the HBase metadata during our optimization to return row counts
> > without reading any data. This should be really fast, and noticeably
> > different than running a complete query, even with a small dataset as
> > we have to read in your table and run an aggregation over it.
> >
> > This would just be a final confirmation of where the issue is
> > occurring, I will hopefully have time soon to get this fixed but I'm
> > wrapping up some other things right now.
> >
> >
> > ---------- Forwarded message ----------
> > From: Kumiko Yada <Ku...@ds-iq.com>
> > Date: Thu, Jan 14, 2016 at 12:53 PM
> > Subject: RE: Drill query does not return all results from HBase
> > To: Jason Altekruse <al...@gmail.com>
> >
> >
> > Jason,
> >
> >
> >
> > I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> > did differently; however your guess were correct.  When I did the one
> > column count, the row count was correct.  Here is the additional testing
> results.
> >
> >
> >
> > My company has been invested to use the drill, and it’s very important
> > for us that this is fixed.  Let me know if I can do anything to get
> > this issue to be fixed.  I really appreciate you that you are looking
> into issue!
> >
> > Hbase table (1 column family, 5 columns, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (1 column family, 6 columns,  10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (**returned 6724 rows)*
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (2 column family, 6 columns in each columns family,
> > 10000000
> > rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 3362 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbase table (2 column family, 2 columns in each columns family,
> > 10000000
> > rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbasetable (2 column family, 4 columns in one column family and 2
> > columns in other column family, 10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 6723 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbasetable (2 column family, 1 column in one column family and 3
> > columns in other column family, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> >
> >
> > Thanks
> >
> > Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:28 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > I also run the query to display only 1 column with no limit to try
> > force a full scan, but the result was the same, just 10000 rows
> > selected.  With the same table (contains 6 columns), I run the query
> > to display the row_key, and it display all records, 10,000,000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:24 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > Jason
> >
> >
> >
> > I run the query to display only 1 column for 100000 rows, and it only
> > returned 10000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Jason Altekruse [mailto:altekrusejason@gmail.com <
> > altekrusejason@gmail.com>]
> > *Sent:* Wednesday, January 13, 2016 6:39 PM
> > *To:* Kumiko Yada <Ku...@ds-iq.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> >
> > *Subject:* Re: Drill query does not return all results from HBase
> >
> >
> >
> > I know in a number of cases we have special optimizer rules that try
> > to skip reading the dataset all together if we have metadata for the
> > number of rows and all that is requested is a count(*). I assume that
> > this is the case with HBase, and this may be where we aren't doing
> something correctly.
> > Can you try to run a 'sum', or other aggregate query on one of the
> > columns to see if a full scan of the data is operating correctly?
> >
> >
> >
> > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > Thank you, Jason!
> >
> > Let me know if you need any help on this. I will be glad to help on
> > repro and/or test the fix.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Wednesday, January 13, 2016 6:24 PM
> > To: user <us...@drill.apache.org>
> >
> > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Thanks for filing the issue. I haven't worked much with HBase, but
> > this is a critical wrong results issues, so I will be taking a look at
> > this soon if no one else raises their hand.
> >
> > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > > I opened the bug on this.  The drill is returning the correct rows
> > > when the hbase contains 5 or less columns, but not 6 or more columns.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4271
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > > Sent: Wednesday, January 13, 2016 4:52 PM
> > > To: user@drill.apache.org
> > > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > > Kevin.Verhoeven@ds-iq.com>
> > > Subject: RE: Drill query does not return all results from HBase
> > >
> > > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > > row count returned when the Hbase table contains only 1 column
> > > family, 1 column, but the incorrect row count is returned for the
> > > Hbase table contains 1 column family, 6 columns.
> > >
> > > This looks like the Drill issue.  Has anyone found any workaround?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > > Sent: Tuesday, January 12, 2016 6:51 PM
> > > To: user <us...@drill.apache.org>
> > > Cc: Aditya Kishore <ad...@gmail.com>
> > > Subject: Re: Drill query does not return all results from HBase
> > >
> > > Well, the major version din't change if I remember it right, hence
> > > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > > now and don't see the issue. Also, I am on a MapR setup, which might
> > > not be comparable with their CDH setups.
> > >
> > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > > <altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > Abhishek,
> > > >
> > > > What version of HBase did you have the problem with, and what
> > > > version did you upgrade to that solved the problem? I assume this
> > > > would be useful information to compare your setup with Kevin's and
> > Kumiko's.
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com
> > > > > wrote:
> > > >
> > > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > > to fetch all records, whereas I was only able to see a small
> > > > > subset of records
> > > > when
> > > > > queried from Drill. Each time I inserted 1000 records, only
> > > > > about
> > > > > 50 of those would show up.
> > > > >
> > > > > Although I could repro' the problem consistently, it was
> > > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > > might not have to do with
> > > > Drill
> > > > > itself.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I'm not sure why this is happening, we have tests in our
> > > > > > automated
> > > > suite
> > > > > > that I believe run some pretty large queries against Hbase and
> > > > > > verify
> > > > the
> > > > > > results.
> > > > > >
> > > > > > Aditya, do you have some time available to try to reproduce
> > > > > > this and diagnose the problem?
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > > <Ku...@ds-iq.com>
> > > > > wrote:
> > > > > >
> > > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kumiko
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > > To: user@drill.apache.org
> > > > > > > Subject: Drill query does not return all results from HBase
> > > > > > >
> > > > > > > We have a problem where a Drill query against HBase does not
> > > > > > > return
> > > > all
> > > > > > > results. The following query should return over 100,000
> > > > > > > rows, but we
> > > > > only
> > > > > > > get about 1,030 back.
> > > > > > >
> > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > > customer_number =
> > > > > 800
> > > > > > >
> > > > > > > If we scan directly using the hbase shell we see over
> > > > > > > 100,000 rows,
> > > > but
> > > > > > > the same Drill query does not return a fraction of the
> > > > > > > expected
> > > > > results.
> > > > > > We
> > > > > > > have also run a count against the table and Drill returns
> > > > > > > the same
> > > > > 1,030
> > > > > > > number, which is far less than expect. What could be going
> wrong?
> > > > > > >
> > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > > (HBase
> > > > 1.0).
> > > > > > We
> > > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > > billion
> > > rows.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Kevin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Drill query does not return all results from HBase

Posted by Aditya <ad...@gmail.com>.
Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have
failed so far.

>From my mail exchange with Kevin on another thread, it appears that the
HBase scanner stops returning rows after a while which seem odd.

Probably it is unique to CDH distribution. I am planning to setup a single
node CDH cluster to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> Hello,
>
> I provided all information that was requested; however, I haven't heard
> back anything since February 24.
>
> Is anyone taking look at this?  Are there any workarounds?
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Aditya [mailto:adityakishore@gmail.com]
> Sent: Friday, February 19, 2016 12:48 PM
> To: user <us...@drill.apache.org>
> Cc: altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin
> Verhoeven <Ke...@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I apologies for not chiming in until now, considering that if there is a
> bug here it is most probably put in by me :)
>
> I've assigned the JIRA to myself and going to take a l look.
>
> Would it be possible for you to either attach to the JIRA or send me
> privately the Drill query profiles form both the correct and the incorrect
> executions?
>
> Regards,
> aditya...
>
> On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> > Hello,
> >
> > Does anyone have any update on this issue,
> > https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> > that this would be investigated/fixed?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Thursday, January 14, 2016 3:44 PM
> > To: user@drill.apache.org; altekrusejason@gmail.com
> > Subject: RE: Drill query does not return all results from HBase
> >
> > The query time was very short on the one with the incorrect result.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Thursday, January 14, 2016 1:25 PM
> > To: user <us...@drill.apache.org>
> > Subject: Fwd: Drill query does not return all results from HBase
> >
> > Thanks for the update, I'm forwarding your message back to the list.
> >
> > Just to confirm, was the query time longer on the the one with the
> > incorrect result? In the incorrect case I think we are just misreading
> > the HBase metadata during our optimization to return row counts
> > without reading any data. This should be really fast, and noticeably
> > different than running a complete query, even with a small dataset as
> > we have to read in your table and run an aggregation over it.
> >
> > This would just be a final confirmation of where the issue is
> > occurring, I will hopefully have time soon to get this fixed but I'm
> > wrapping up some other things right now.
> >
> >
> > ---------- Forwarded message ----------
> > From: Kumiko Yada <Ku...@ds-iq.com>
> > Date: Thu, Jan 14, 2016 at 12:53 PM
> > Subject: RE: Drill query does not return all results from HBase
> > To: Jason Altekruse <al...@gmail.com>
> >
> >
> > Jason,
> >
> >
> >
> > I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> > did differently; however your guess were correct.  When I did the one
> > column count, the row count was correct.  Here is the additional testing
> results.
> >
> >
> >
> > My company has been invested to use the drill, and it’s very important
> > for us that this is fixed.  Let me know if I can do anything to get
> > this issue to be fixed.  I really appreciate you that you are looking
> into issue!
> >
> > Hbase table (1 column family, 5 columns, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (1 column family, 6 columns,  10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (**returned 6724 rows)*
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (2 column family, 6 columns in each columns family,
> > 10000000
> > rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 3362 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbase table (2 column family, 2 columns in each columns family,
> > 10000000
> > rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbasetable (2 column family, 4 columns in one column family and 2
> > columns in other column family, 10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 6723 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbasetable (2 column family, 1 column in one column family and 3
> > columns in other column family, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> >
> >
> > Thanks
> >
> > Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:28 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > I also run the query to display only 1 column with no limit to try
> > force a full scan, but the result was the same, just 10000 rows
> > selected.  With the same table (contains 6 columns), I run the query
> > to display the row_key, and it display all records, 10,000,000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:24 PM
> > *To:* 'Jason Altekruse' <al...@gmail.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > Jason
> >
> >
> >
> > I run the query to display only 1 column for 100000 rows, and it only
> > returned 10000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Jason Altekruse [mailto:altekrusejason@gmail.com <
> > altekrusejason@gmail.com>]
> > *Sent:* Wednesday, January 13, 2016 6:39 PM
> > *To:* Kumiko Yada <Ku...@ds-iq.com>
> > *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> >
> > *Subject:* Re: Drill query does not return all results from HBase
> >
> >
> >
> > I know in a number of cases we have special optimizer rules that try
> > to skip reading the dataset all together if we have metadata for the
> > number of rows and all that is requested is a count(*). I assume that
> > this is the case with HBase, and this may be where we aren't doing
> something correctly.
> > Can you try to run a 'sum', or other aggregate query on one of the
> > columns to see if a full scan of the data is operating correctly?
> >
> >
> >
> > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > Thank you, Jason!
> >
> > Let me know if you need any help on this. I will be glad to help on
> > repro and/or test the fix.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Wednesday, January 13, 2016 6:24 PM
> > To: user <us...@drill.apache.org>
> >
> > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Thanks for filing the issue. I haven't worked much with HBase, but
> > this is a critical wrong results issues, so I will be taking a look at
> > this soon if no one else raises their hand.
> >
> > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> >
> > > I opened the bug on this.  The drill is returning the correct rows
> > > when the hbase contains 5 or less columns, but not 6 or more columns.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4271
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > > Sent: Wednesday, January 13, 2016 4:52 PM
> > > To: user@drill.apache.org
> > > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > > Kevin.Verhoeven@ds-iq.com>
> > > Subject: RE: Drill query does not return all results from HBase
> > >
> > > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > > row count returned when the Hbase table contains only 1 column
> > > family, 1 column, but the incorrect row count is returned for the
> > > Hbase table contains 1 column family, 6 columns.
> > >
> > > This looks like the Drill issue.  Has anyone found any workaround?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > > Sent: Tuesday, January 12, 2016 6:51 PM
> > > To: user <us...@drill.apache.org>
> > > Cc: Aditya Kishore <ad...@gmail.com>
> > > Subject: Re: Drill query does not return all results from HBase
> > >
> > > Well, the major version din't change if I remember it right, hence
> > > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > > now and don't see the issue. Also, I am on a MapR setup, which might
> > > not be comparable with their CDH setups.
> > >
> > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > > <altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > Abhishek,
> > > >
> > > > What version of HBase did you have the problem with, and what
> > > > version did you upgrade to that solved the problem? I assume this
> > > > would be useful information to compare your setup with Kevin's and
> > Kumiko's.
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com
> > > > > wrote:
> > > >
> > > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > > to fetch all records, whereas I was only able to see a small
> > > > > subset of records
> > > > when
> > > > > queried from Drill. Each time I inserted 1000 records, only
> > > > > about
> > > > > 50 of those would show up.
> > > > >
> > > > > Although I could repro' the problem consistently, it was
> > > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > > might not have to do with
> > > > Drill
> > > > > itself.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I'm not sure why this is happening, we have tests in our
> > > > > > automated
> > > > suite
> > > > > > that I believe run some pretty large queries against Hbase and
> > > > > > verify
> > > > the
> > > > > > results.
> > > > > >
> > > > > > Aditya, do you have some time available to try to reproduce
> > > > > > this and diagnose the problem?
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > > <Ku...@ds-iq.com>
> > > > > wrote:
> > > > > >
> > > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kumiko
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > > To: user@drill.apache.org
> > > > > > > Subject: Drill query does not return all results from HBase
> > > > > > >
> > > > > > > We have a problem where a Drill query against HBase does not
> > > > > > > return
> > > > all
> > > > > > > results. The following query should return over 100,000
> > > > > > > rows, but we
> > > > > only
> > > > > > > get about 1,030 back.
> > > > > > >
> > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > > customer_number =
> > > > > 800
> > > > > > >
> > > > > > > If we scan directly using the hbase shell we see over
> > > > > > > 100,000 rows,
> > > > but
> > > > > > > the same Drill query does not return a fraction of the
> > > > > > > expected
> > > > > results.
> > > > > > We
> > > > > > > have also run a count against the table and Drill returns
> > > > > > > the same
> > > > > 1,030
> > > > > > > number, which is far less than expect. What could be going
> wrong?
> > > > > > >
> > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > > (HBase
> > > > 1.0).
> > > > > > We
> > > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > > billion
> > > rows.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Kevin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.  

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com] 
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>
Cc: altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <Ke...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> Hello,
>
> Does anyone have any update on this issue, 
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan 
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org; altekrusejason@gmail.com
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the 
> incorrect result? In the incorrect case I think we are just misreading 
> the HBase metadata during our optimization to return row counts 
> without reading any data. This should be really fast, and noticeably 
> different than running a complete query, even with a small dataset as 
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is 
> occurring, I will hopefully have time soon to get this fixed but I'm 
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada <Ku...@ds-iq.com>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse <al...@gmail.com>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I 
> did differently; however your guess were correct.  When I did the one 
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important 
> for us that this is fixed.  Let me know if I can do anything to get 
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family, 
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family, 
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2 
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3 
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>
> *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try 
> force a full scan, but the result was the same, just 10000 rows 
> selected.  With the same table (contains 6 columns), I run the query 
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>
> *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only 
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse [mailto:altekrusejason@gmail.com < 
> altekrusejason@gmail.com>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>
> *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try 
> to skip reading the dataset all together if we have metadata for the 
> number of rows and all that is requested is a count(*). I assume that 
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the 
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on 
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>
>
> Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but 
> this is a critical wrong results issues, so I will be taking a look at 
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows 
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org
> > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven < 
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct 
> > row count returned when the Hbase table contains only 1 column 
> > family, 1 column, but the incorrect row count is returned for the 
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>
> > Cc: Aditya Kishore <ad...@gmail.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence 
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right 
> > now and don't see the issue. Also, I am on a MapR setup, which might 
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse 
> > <altekrusejason@gmail.com
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what 
> > > version did you upgrade to that solved the problem? I assume this 
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < 
> > > abhishek.girish@gmail.com
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able 
> > > > to fetch all records, whereas I was only able to see a small 
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only 
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was 
> > > > resolved once i updated my Hadoop setup. My guess is that it was 
> > > > a HBase bug which got resolved. Although strange as it seems, it 
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our 
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and 
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce 
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada 
> > > > > <Ku...@ds-iq.com>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not 
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000 
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE 
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over 
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the 
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns 
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3 
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
Hello,

I provided all information that was requested; however, I haven't heard back anything since February 24.  

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com] 
Sent: Friday, February 19, 2016 12:48 PM
To: user <us...@drill.apache.org>
Cc: altekrusejason@gmail.com; Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <Ke...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> Hello,
>
> Does anyone have any update on this issue, 
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan 
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org; altekrusejason@gmail.com
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the 
> incorrect result? In the incorrect case I think we are just misreading 
> the HBase metadata during our optimization to return row counts 
> without reading any data. This should be really fast, and noticeably 
> different than running a complete query, even with a small dataset as 
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is 
> occurring, I will hopefully have time soon to get this fixed but I'm 
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada <Ku...@ds-iq.com>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse <al...@gmail.com>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I 
> did differently; however your guess were correct.  When I did the one 
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important 
> for us that this is fixed.  Let me know if I can do anything to get 
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family, 
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family, 
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2 
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3 
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>
> *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try 
> force a full scan, but the result was the same, just 10000 rows 
> selected.  With the same table (contains 6 columns), I run the query 
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>
> *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only 
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse [mailto:altekrusejason@gmail.com < 
> altekrusejason@gmail.com>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>
> *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try 
> to skip reading the dataset all together if we have metadata for the 
> number of rows and all that is requested is a count(*). I assume that 
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the 
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on 
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>
>
> Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but 
> this is a critical wrong results issues, so I will be taking a look at 
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows 
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org
> > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven < 
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct 
> > row count returned when the Hbase table contains only 1 column 
> > family, 1 column, but the incorrect row count is returned for the 
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>
> > Cc: Aditya Kishore <ad...@gmail.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence 
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right 
> > now and don't see the issue. Also, I am on a MapR setup, which might 
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse 
> > <altekrusejason@gmail.com
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what 
> > > version did you upgrade to that solved the problem? I assume this 
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < 
> > > abhishek.girish@gmail.com
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able 
> > > > to fetch all records, whereas I was only able to see a small 
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only 
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was 
> > > > resolved once i updated my Hadoop setup. My guess is that it was 
> > > > a HBase bug which got resolved. Although strange as it seems, it 
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our 
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and 
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce 
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada 
> > > > > <Ku...@ds-iq.com>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not 
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000 
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE 
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over 
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the 
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns 
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3 
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Drill query does not return all results from HBase

Posted by Aditya <ad...@gmail.com>.
Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a
bug here it is most probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me
privately the Drill query profiles form both the correct and the incorrect
executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> Hello,
>
> Does anyone have any update on this issue,
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org; altekrusejason@gmail.com
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <us...@drill.apache.org>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the
> incorrect result? In the incorrect case I think we are just misreading the
> HBase metadata during our optimization to return row counts without reading
> any data. This should be really fast, and noticeably different than running
> a complete query, even with a small dataset as we have to read in your
> table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is occurring, I
> will hopefully have time soon to get this fixed but I'm wrapping up some
> other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada <Ku...@ds-iq.com>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse <al...@gmail.com>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I did
> differently; however your guess were correct.  When I did the one column
> count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important for
> us that this is fixed.  Let me know if I can do anything to get this issue
> to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family, 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family, 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2 columns
> in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3 columns
> in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>
> *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try force a
> full scan, but the result was the same, just 10000 rows selected.  With the
> same table (contains 6 columns), I run the query to display the row_key,
> and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' <al...@gmail.com>
> *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse [mailto:altekrusejason@gmail.com <
> altekrusejason@gmail.com>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada <Ku...@ds-iq.com>
> *Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try to
> skip reading the dataset all together if we have metadata for the number of
> rows and all that is requested is a count(*). I assume that this is the
> case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the columns
> to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on repro
> and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <us...@drill.apache.org>
>
> Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but this is
> a critical wrong results issues, so I will be taking a look at this soon if
> no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org
> > Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct row
> > count returned when the Hbase table contains only 1 column family, 1
> > column, but the incorrect row count is returned for the Hbase table
> > contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <us...@drill.apache.org>
> > Cc: Aditya Kishore <ad...@gmail.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence did
> > not share the info in my previous mail. I'm on HBase 1.1.1 right now
> > and don't see the issue. Also, I am on a MapR setup, which might not
> > be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > <altekrusejason@gmail.com
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what
> > > version did you upgrade to that solved the problem? I assume this
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > to fetch all records, whereas I was only able to see a small
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was resolved
> > > > once i updated my Hadoop setup. My guess is that it was a HBase
> > > > bug which got resolved. Although strange as it seems, it might not
> > > > have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce this
> > > > > and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > <Ku...@ds-iq.com>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000 rows,
> > > > > > but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over 100,000
> > > > > > rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns the
> > > > > > same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
Hello,

Does anyone have any update on this issue, https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan that this would be investigated/fixed?

Thanks
Kumiko

-----Original Message-----
From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com] 
Sent: Thursday, January 14, 2016 3:44 PM
To: user@drill.apache.org; altekrusejason@gmail.com
Subject: RE: Drill query does not return all results from HBase

The query time was very short on the one with the incorrect result.

Thanks
Kumiko

-----Original Message-----
From: Jason Altekruse [mailto:altekrusejason@gmail.com]
Sent: Thursday, January 14, 2016 1:25 PM
To: user <us...@drill.apache.org>
Subject: Fwd: Drill query does not return all results from HBase

Thanks for the update, I'm forwarding your message back to the list.

Just to confirm, was the query time longer on the the one with the incorrect result? In the incorrect case I think we are just misreading the HBase metadata during our optimization to return row counts without reading any data. This should be really fast, and noticeably different than running a complete query, even with a small dataset as we have to read in your table and run an aggregation over it.

This would just be a final confirmation of where the issue is occurring, I will hopefully have time soon to get this fixed but I'm wrapping up some other things right now.


---------- Forwarded message ----------
From: Kumiko Yada <Ku...@ds-iq.com>
Date: Thu, Jan 14, 2016 at 12:53 PM
Subject: RE: Drill query does not return all results from HBase
To: Jason Altekruse <al...@gmail.com>


Jason,



I’m sorry.  My testing was incorrect last night.  I’m not sure what I did differently; however your guess were correct.  When I did the one column count, the row count was correct.  Here is the additional testing results.



My company has been invested to use the drill, and it’s very important for us that this is fixed.  Let me know if I can do anything to get this issue to be fixed.  I really appreciate you that you are looking into issue!

Hbase table (1 column family, 5 columns, 10000000 rows)

COUNT(*) - row count is correct

1 column count - row count is correct

*Hbase table (1 column family, 6 columns,  10000000 rows)*

*COUNT(*) - row count is incorrect (**returned 6724 rows)*

1 column count - row count is correct

*Hbase table (2 column family, 6 columns in each columns family, 10000000
rows)*

*COUNT(*) - row count is incorrect (returned 3362 rows)*

1 column count - row count is correct

Hbase table (2 column family, 2 columns in each columns family, 10000000
rows)

COUNT(*) - row count is correct

1 column count - row count is correct

*Hbasetable (2 column family, 4 columns in one column family and 2 columns in other column family, 10000000 rows)*

*COUNT(*) - row count is incorrect (returned 6723 rows)*

1 column count - row count is correct

Hbasetable (2 column family, 1 column in one column family and 3 columns in other column family, 10000000 rows)

COUNT(*) - row count is correct

1 column count - row count is correct



Thanks

Kumiko



*From:* Kumiko Yada
*Sent:* Wednesday, January 13, 2016 7:28 PM
*To:* 'Jason Altekruse' <al...@gmail.com>
*Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < Kevin.Verhoeven@ds-iq.com>
*Subject:* RE: Drill query does not return all results from HBase



I also run the query to display only 1 column with no limit to try force a full scan, but the result was the same, just 10000 rows selected.  With the same table (contains 6 columns), I run the query to display the row_key, and it display all records, 10,000,000 rows.



-Kumiko



*From:* Kumiko Yada
*Sent:* Wednesday, January 13, 2016 7:24 PM
*To:* 'Jason Altekruse' <al...@gmail.com>
*Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < Kevin.Verhoeven@ds-iq.com>
*Subject:* RE: Drill query does not return all results from HBase



Jason



I run the query to display only 1 column for 100000 rows, and it only returned 10000 rows.



-Kumiko



*From:* Jason Altekruse [mailto:altekrusejason@gmail.com <al...@gmail.com>]
*Sent:* Wednesday, January 13, 2016 6:39 PM
*To:* Kumiko Yada <Ku...@ds-iq.com>
*Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < Kevin.Verhoeven@ds-iq.com>

*Subject:* Re: Drill query does not return all results from HBase



I know in a number of cases we have special optimizer rules that try to skip reading the dataset all together if we have metadata for the number of rows and all that is requested is a count(*). I assume that this is the case with HBase, and this may be where we aren't doing something correctly.
Can you try to run a 'sum', or other aggregate query on one of the columns to see if a full scan of the data is operating correctly?



On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

Thank you, Jason!

Let me know if you need any help on this. I will be glad to help on repro and/or test the fix.

Thanks
Kumiko

-----Original Message-----
From: Jason Altekruse [mailto:altekrusejason@gmail.com]
Sent: Wednesday, January 13, 2016 6:24 PM
To: user <us...@drill.apache.org>

Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven < Kevin.Verhoeven@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Thanks for filing the issue. I haven't worked much with HBase, but this is a critical wrong results issues, so I will be taking a look at this soon if no one else raises their hand.

On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> I opened the bug on this.  The drill is returning the correct rows 
> when the hbase contains 5 or less columns, but not 6 or more columns.
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> Sent: Wednesday, January 13, 2016 4:52 PM
> To: user@drill.apache.org
> Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
> Subject: RE: Drill query does not return all results from HBase
>
> We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct row 
> count returned when the Hbase table contains only 1 column family, 1 
> column, but the incorrect row count is returned for the Hbase table 
> contains 1 column family, 6 columns.
>
> This looks like the Drill issue.  Has anyone found any workaround?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> Sent: Tuesday, January 12, 2016 6:51 PM
> To: user <us...@drill.apache.org>
> Cc: Aditya Kishore <ad...@gmail.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Well, the major version din't change if I remember it right, hence did 
> not share the info in my previous mail. I'm on HBase 1.1.1 right now 
> and don't see the issue. Also, I am on a MapR setup, which might not 
> be comparable with their CDH setups.
>
> On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse 
> <altekrusejason@gmail.com
> >
> wrote:
>
> > Abhishek,
> >
> > What version of HBase did you have the problem with, and what 
> > version did you upgrade to that solved the problem? I assume this 
> > would be useful information to compare your setup with Kevin's and
Kumiko's.
> >
> > - Jason
> >
> > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < 
> > abhishek.girish@gmail.com
> > > wrote:
> >
> > > I hit a very similar issue recently. Via HBase shell, i was able 
> > > to fetch all records, whereas I was only able to see a small 
> > > subset of records
> > when
> > > queried from Drill. Each time I inserted 1000 records, only about
> > > 50 of those would show up.
> > >
> > > Although I could repro' the problem consistently, it was resolved 
> > > once i updated my Hadoop setup. My guess is that it was a HBase 
> > > bug which got resolved. Although strange as it seems, it might not 
> > > have to do with
> > Drill
> > > itself.
> > >
> > > -Abhishek
> > >
> > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > I'm not sure why this is happening, we have tests in our 
> > > > automated
> > suite
> > > > that I believe run some pretty large queries against Hbase and 
> > > > verify
> > the
> > > > results.
> > > >
> > > > Aditya, do you have some time available to try to reproduce this 
> > > > and diagnose the problem?
> > > >
> > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada 
> > > > <Ku...@ds-iq.com>
> > > wrote:
> > > >
> > > > > I'm having the same issue.  Is there any workaround for this?
> > > > >
> > > > > Thanks
> > > > > Kumiko
> > > > >
> > > > > -----Original Message-----
> > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > To: user@drill.apache.org
> > > > > Subject: Drill query does not return all results from HBase
> > > > >
> > > > > We have a problem where a Drill query against HBase does not 
> > > > > return
> > all
> > > > > results. The following query should return over 100,000 rows, 
> > > > > but we
> > > only
> > > > > get about 1,030 back.
> > > > >
> > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE 
> > > > > customer_number =
> > > 800
> > > > >
> > > > > If we scan directly using the hbase shell we see over 100,000 
> > > > > rows,
> > but
> > > > > the same Drill query does not return a fraction of the 
> > > > > expected
> > > results.
> > > > We
> > > > > have also run a count against the table and Drill returns the 
> > > > > same
> > > 1,030
> > > > > number, which is far less than expect. What could be going wrong?
> > > > >
> > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 
> > > > > (HBase
> > 1.0).
> > > > We
> > > > > run HBase on six RegionServers, the table has about 1.3 
> > > > > billion
> rows.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Kevin
> > > > >
> > > > >
> > > >
> > >
> >
>

RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
The query time was very short on the one with the incorrect result.

Thanks
Kumiko

-----Original Message-----
From: Jason Altekruse [mailto:altekrusejason@gmail.com] 
Sent: Thursday, January 14, 2016 1:25 PM
To: user <us...@drill.apache.org>
Subject: Fwd: Drill query does not return all results from HBase

Thanks for the update, I'm forwarding your message back to the list.

Just to confirm, was the query time longer on the the one with the incorrect result? In the incorrect case I think we are just misreading the HBase metadata during our optimization to return row counts without reading any data. This should be really fast, and noticeably different than running a complete query, even with a small dataset as we have to read in your table and run an aggregation over it.

This would just be a final confirmation of where the issue is occurring, I will hopefully have time soon to get this fixed but I'm wrapping up some other things right now.


---------- Forwarded message ----------
From: Kumiko Yada <Ku...@ds-iq.com>
Date: Thu, Jan 14, 2016 at 12:53 PM
Subject: RE: Drill query does not return all results from HBase
To: Jason Altekruse <al...@gmail.com>


Jason,



I’m sorry.  My testing was incorrect last night.  I’m not sure what I did differently; however your guess were correct.  When I did the one column count, the row count was correct.  Here is the additional testing results.



My company has been invested to use the drill, and it’s very important for us that this is fixed.  Let me know if I can do anything to get this issue to be fixed.  I really appreciate you that you are looking into issue!

Hbase table (1 column family, 5 columns, 10000000 rows)

COUNT(*) - row count is correct

1 column count - row count is correct

*Hbase table (1 column family, 6 columns,  10000000 rows)*

*COUNT(*) - row count is incorrect (**returned 6724 rows)*

1 column count - row count is correct

*Hbase table (2 column family, 6 columns in each columns family, 10000000
rows)*

*COUNT(*) - row count is incorrect (returned 3362 rows)*

1 column count - row count is correct

Hbase table (2 column family, 2 columns in each columns family, 10000000
rows)

COUNT(*) - row count is correct

1 column count - row count is correct

*Hbasetable (2 column family, 4 columns in one column family and 2 columns in other column family, 10000000 rows)*

*COUNT(*) - row count is incorrect (returned 6723 rows)*

1 column count - row count is correct

Hbasetable (2 column family, 1 column in one column family and 3 columns in other column family, 10000000 rows)

COUNT(*) - row count is correct

1 column count - row count is correct



Thanks

Kumiko



*From:* Kumiko Yada
*Sent:* Wednesday, January 13, 2016 7:28 PM
*To:* 'Jason Altekruse' <al...@gmail.com>
*Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < Kevin.Verhoeven@ds-iq.com>
*Subject:* RE: Drill query does not return all results from HBase



I also run the query to display only 1 column with no limit to try force a full scan, but the result was the same, just 10000 rows selected.  With the same table (contains 6 columns), I run the query to display the row_key, and it display all records, 10,000,000 rows.



-Kumiko



*From:* Kumiko Yada
*Sent:* Wednesday, January 13, 2016 7:24 PM
*To:* 'Jason Altekruse' <al...@gmail.com>
*Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < Kevin.Verhoeven@ds-iq.com>
*Subject:* RE: Drill query does not return all results from HBase



Jason



I run the query to display only 1 column for 100000 rows, and it only returned 10000 rows.



-Kumiko



*From:* Jason Altekruse [mailto:altekrusejason@gmail.com <al...@gmail.com>]
*Sent:* Wednesday, January 13, 2016 6:39 PM
*To:* Kumiko Yada <Ku...@ds-iq.com>
*Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven < Kevin.Verhoeven@ds-iq.com>

*Subject:* Re: Drill query does not return all results from HBase



I know in a number of cases we have special optimizer rules that try to skip reading the dataset all together if we have metadata for the number of rows and all that is requested is a count(*). I assume that this is the case with HBase, and this may be where we aren't doing something correctly.
Can you try to run a 'sum', or other aggregate query on one of the columns to see if a full scan of the data is operating correctly?



On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

Thank you, Jason!

Let me know if you need any help on this. I will be glad to help on repro and/or test the fix.

Thanks
Kumiko

-----Original Message-----
From: Jason Altekruse [mailto:altekrusejason@gmail.com]
Sent: Wednesday, January 13, 2016 6:24 PM
To: user <us...@drill.apache.org>

Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven < Kevin.Verhoeven@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Thanks for filing the issue. I haven't worked much with HBase, but this is a critical wrong results issues, so I will be taking a look at this soon if no one else raises their hand.

On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> I opened the bug on this.  The drill is returning the correct rows 
> when the hbase contains 5 or less columns, but not 6 or more columns.
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> Sent: Wednesday, January 13, 2016 4:52 PM
> To: user@drill.apache.org
> Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven < 
> Kevin.Verhoeven@ds-iq.com>
> Subject: RE: Drill query does not return all results from HBase
>
> We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct row 
> count returned when the Hbase table contains only 1 column family, 1 
> column, but the incorrect row count is returned for the Hbase table 
> contains 1 column family, 6 columns.
>
> This looks like the Drill issue.  Has anyone found any workaround?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> Sent: Tuesday, January 12, 2016 6:51 PM
> To: user <us...@drill.apache.org>
> Cc: Aditya Kishore <ad...@gmail.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Well, the major version din't change if I remember it right, hence did 
> not share the info in my previous mail. I'm on HBase 1.1.1 right now 
> and don't see the issue. Also, I am on a MapR setup, which might not 
> be comparable with their CDH setups.
>
> On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse 
> <altekrusejason@gmail.com
> >
> wrote:
>
> > Abhishek,
> >
> > What version of HBase did you have the problem with, and what 
> > version did you upgrade to that solved the problem? I assume this 
> > would be useful information to compare your setup with Kevin's and
Kumiko's.
> >
> > - Jason
> >
> > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < 
> > abhishek.girish@gmail.com
> > > wrote:
> >
> > > I hit a very similar issue recently. Via HBase shell, i was able 
> > > to fetch all records, whereas I was only able to see a small 
> > > subset of records
> > when
> > > queried from Drill. Each time I inserted 1000 records, only about
> > > 50 of those would show up.
> > >
> > > Although I could repro' the problem consistently, it was resolved 
> > > once i updated my Hadoop setup. My guess is that it was a HBase 
> > > bug which got resolved. Although strange as it seems, it might not 
> > > have to do with
> > Drill
> > > itself.
> > >
> > > -Abhishek
> > >
> > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > I'm not sure why this is happening, we have tests in our 
> > > > automated
> > suite
> > > > that I believe run some pretty large queries against Hbase and 
> > > > verify
> > the
> > > > results.
> > > >
> > > > Aditya, do you have some time available to try to reproduce this 
> > > > and diagnose the problem?
> > > >
> > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada 
> > > > <Ku...@ds-iq.com>
> > > wrote:
> > > >
> > > > > I'm having the same issue.  Is there any workaround for this?
> > > > >
> > > > > Thanks
> > > > > Kumiko
> > > > >
> > > > > -----Original Message-----
> > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > To: user@drill.apache.org
> > > > > Subject: Drill query does not return all results from HBase
> > > > >
> > > > > We have a problem where a Drill query against HBase does not 
> > > > > return
> > all
> > > > > results. The following query should return over 100,000 rows, 
> > > > > but we
> > > only
> > > > > get about 1,030 back.
> > > > >
> > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE 
> > > > > customer_number =
> > > 800
> > > > >
> > > > > If we scan directly using the hbase shell we see over 100,000 
> > > > > rows,
> > but
> > > > > the same Drill query does not return a fraction of the 
> > > > > expected
> > > results.
> > > > We
> > > > > have also run a count against the table and Drill returns the 
> > > > > same
> > > 1,030
> > > > > number, which is far less than expect. What could be going wrong?
> > > > >
> > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 
> > > > > (HBase
> > 1.0).
> > > > We
> > > > > run HBase on six RegionServers, the table has about 1.3 
> > > > > billion
> rows.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Kevin
> > > > >
> > > > >
> > > >
> > >
> >
>

Fwd: Drill query does not return all results from HBase

Posted by Jason Altekruse <al...@gmail.com>.
Thanks for the update, I'm forwarding your message back to the list.

Just to confirm, was the query time longer on the the one with the
incorrect result? In the incorrect case I think we are just misreading the
HBase metadata during our optimization to return row counts without reading
any data. This should be really fast, and noticeably different than running
a complete query, even with a small dataset as we have to read in your
table and run an aggregation over it.

This would just be a final confirmation of where the issue is occurring, I
will hopefully have time soon to get this fixed but I'm wrapping up some
other things right now.


---------- Forwarded message ----------
From: Kumiko Yada <Ku...@ds-iq.com>
Date: Thu, Jan 14, 2016 at 12:53 PM
Subject: RE: Drill query does not return all results from HBase
To: Jason Altekruse <al...@gmail.com>


Jason,



I’m sorry.  My testing was incorrect last night.  I’m not sure what I did
differently; however your guess were correct.  When I did the one column
count, the row count was correct.  Here is the additional testing results.



My company has been invested to use the drill, and it’s very important for
us that this is fixed.  Let me know if I can do anything to get this issue
to be fixed.  I really appreciate you that you are looking into issue!

Hbase table (1 column family, 5 columns, 10000000 rows)

COUNT(*) - row count is correct

1 column count - row count is correct

*Hbase table (1 column family, 6 columns,  10000000 rows)*

*COUNT(*) - row count is incorrect (**returned 6724 rows)*

1 column count - row count is correct

*Hbase table (2 column family, 6 columns in each columns family, 10000000
rows)*

*COUNT(*) - row count is incorrect (returned 3362 rows)*

1 column count - row count is correct

Hbase table (2 column family, 2 columns in each columns family, 10000000
rows)

COUNT(*) - row count is correct

1 column count - row count is correct

*Hbasetable (2 column family, 4 columns in one column family and 2 columns
in other column family, 10000000 rows)*

*COUNT(*) - row count is incorrect (returned 6723 rows)*

1 column count - row count is correct

Hbasetable (2 column family, 1 column in one column family and 3 columns in
other column family, 10000000 rows)

COUNT(*) - row count is correct

1 column count - row count is correct



Thanks

Kumiko



*From:* Kumiko Yada
*Sent:* Wednesday, January 13, 2016 7:28 PM
*To:* 'Jason Altekruse' <al...@gmail.com>
*Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
Kevin.Verhoeven@ds-iq.com>
*Subject:* RE: Drill query does not return all results from HBase



I also run the query to display only 1 column with no limit to try force a
full scan, but the result was the same, just 10000 rows selected.  With the
same table (contains 6 columns), I run the query to display the row_key,
and it display all records, 10,000,000 rows.



-Kumiko



*From:* Kumiko Yada
*Sent:* Wednesday, January 13, 2016 7:24 PM
*To:* 'Jason Altekruse' <al...@gmail.com>
*Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
Kevin.Verhoeven@ds-iq.com>
*Subject:* RE: Drill query does not return all results from HBase



Jason



I run the query to display only 1 column for 100000 rows, and it only
returned 10000 rows.



-Kumiko



*From:* Jason Altekruse [mailto:altekrusejason@gmail.com
<al...@gmail.com>]
*Sent:* Wednesday, January 13, 2016 6:39 PM
*To:* Kumiko Yada <Ku...@ds-iq.com>
*Cc:* Ki Kang <Ki...@ds-iq.com>; Kevin Verhoeven <
Kevin.Verhoeven@ds-iq.com>

*Subject:* Re: Drill query does not return all results from HBase



I know in a number of cases we have special optimizer rules that try to
skip reading the dataset all together if we have metadata for the number of
rows and all that is requested is a count(*). I assume that this is the
case with HBase, and this may be where we aren't doing something correctly.
Can you try to run a 'sum', or other aggregate query on one of the columns
to see if a full scan of the data is operating correctly?



On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

Thank you, Jason!

Let me know if you need any help on this. I will be glad to help on repro
and/or test the fix.

Thanks
Kumiko

-----Original Message-----
From: Jason Altekruse [mailto:altekrusejason@gmail.com]
Sent: Wednesday, January 13, 2016 6:24 PM
To: user <us...@drill.apache.org>

Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
Kevin.Verhoeven@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Thanks for filing the issue. I haven't worked much with HBase, but this is
a critical wrong results issues, so I will be taking a look at this soon if
no one else raises their hand.

On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> I opened the bug on this.  The drill is returning the correct rows
> when the hbase contains 5 or less columns, but not 6 or more columns.
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> Sent: Wednesday, January 13, 2016 4:52 PM
> To: user@drill.apache.org
> Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
> Subject: RE: Drill query does not return all results from HBase
>
> We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct row
> count returned when the Hbase table contains only 1 column family, 1
> column, but the incorrect row count is returned for the Hbase table
> contains 1 column family, 6 columns.
>
> This looks like the Drill issue.  Has anyone found any workaround?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> Sent: Tuesday, January 12, 2016 6:51 PM
> To: user <us...@drill.apache.org>
> Cc: Aditya Kishore <ad...@gmail.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Well, the major version din't change if I remember it right, hence did
> not share the info in my previous mail. I'm on HBase 1.1.1 right now
> and don't see the issue. Also, I am on a MapR setup, which might not
> be comparable with their CDH setups.
>
> On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> <altekrusejason@gmail.com
> >
> wrote:
>
> > Abhishek,
> >
> > What version of HBase did you have the problem with, and what
> > version did you upgrade to that solved the problem? I assume this
> > would be useful information to compare your setup with Kevin's and
Kumiko's.
> >
> > - Jason
> >
> > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > abhishek.girish@gmail.com
> > > wrote:
> >
> > > I hit a very similar issue recently. Via HBase shell, i was able
> > > to fetch all records, whereas I was only able to see a small
> > > subset of records
> > when
> > > queried from Drill. Each time I inserted 1000 records, only about
> > > 50 of those would show up.
> > >
> > > Although I could repro' the problem consistently, it was resolved
> > > once i updated my Hadoop setup. My guess is that it was a HBase
> > > bug which got resolved. Although strange as it seems, it might not
> > > have to do with
> > Drill
> > > itself.
> > >
> > > -Abhishek
> > >
> > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > I'm not sure why this is happening, we have tests in our
> > > > automated
> > suite
> > > > that I believe run some pretty large queries against Hbase and
> > > > verify
> > the
> > > > results.
> > > >
> > > > Aditya, do you have some time available to try to reproduce this
> > > > and diagnose the problem?
> > > >
> > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > <Ku...@ds-iq.com>
> > > wrote:
> > > >
> > > > > I'm having the same issue.  Is there any workaround for this?
> > > > >
> > > > > Thanks
> > > > > Kumiko
> > > > >
> > > > > -----Original Message-----
> > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > To: user@drill.apache.org
> > > > > Subject: Drill query does not return all results from HBase
> > > > >
> > > > > We have a problem where a Drill query against HBase does not
> > > > > return
> > all
> > > > > results. The following query should return over 100,000 rows,
> > > > > but we
> > > only
> > > > > get about 1,030 back.
> > > > >
> > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > customer_number =
> > > 800
> > > > >
> > > > > If we scan directly using the hbase shell we see over 100,000
> > > > > rows,
> > but
> > > > > the same Drill query does not return a fraction of the
> > > > > expected
> > > results.
> > > > We
> > > > > have also run a count against the table and Drill returns the
> > > > > same
> > > 1,030
> > > > > number, which is far less than expect. What could be going wrong?
> > > > >
> > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > (HBase
> > 1.0).
> > > > We
> > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > billion
> rows.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Kevin
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Drill query does not return all results from HBase

Posted by Jason Altekruse <al...@gmail.com>.
Thanks for filing the issue. I haven't worked much with HBase, but this is
a critical wrong results issues, so I will be taking a look at this soon if
no one else raises their hand.

On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> I opened the bug on this.  The drill is returning the correct rows when
> the hbase contains 5 or less columns, but not 6 or more columns.
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> Sent: Wednesday, January 13, 2016 4:52 PM
> To: user@drill.apache.org
> Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
> Subject: RE: Drill query does not return all results from HBase
>
> We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct row
> count returned when the Hbase table contains only 1 column family, 1
> column, but the incorrect row count is returned for the Hbase table
> contains 1 column family, 6 columns.
>
> This looks like the Drill issue.  Has anyone found any workaround?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> Sent: Tuesday, January 12, 2016 6:51 PM
> To: user <us...@drill.apache.org>
> Cc: Aditya Kishore <ad...@gmail.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Well, the major version din't change if I remember it right, hence did not
> share the info in my previous mail. I'm on HBase 1.1.1 right now and don't
> see the issue. Also, I am on a MapR setup, which might not be comparable
> with their CDH setups.
>
> On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse <altekrusejason@gmail.com
> >
> wrote:
>
> > Abhishek,
> >
> > What version of HBase did you have the problem with, and what version
> > did you upgrade to that solved the problem? I assume this would be
> > useful information to compare your setup with Kevin's and Kumiko's.
> >
> > - Jason
> >
> > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > abhishek.girish@gmail.com
> > > wrote:
> >
> > > I hit a very similar issue recently. Via HBase shell, i was able to
> > > fetch all records, whereas I was only able to see a small subset of
> > > records
> > when
> > > queried from Drill. Each time I inserted 1000 records, only about 50
> > > of those would show up.
> > >
> > > Although I could repro' the problem consistently, it was resolved
> > > once i updated my Hadoop setup. My guess is that it was a HBase bug
> > > which got resolved. Although strange as it seems, it might not have
> > > to do with
> > Drill
> > > itself.
> > >
> > > -Abhishek
> > >
> > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > I'm not sure why this is happening, we have tests in our automated
> > suite
> > > > that I believe run some pretty large queries against Hbase and
> > > > verify
> > the
> > > > results.
> > > >
> > > > Aditya, do you have some time available to try to reproduce this
> > > > and diagnose the problem?
> > > >
> > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > <Ku...@ds-iq.com>
> > > wrote:
> > > >
> > > > > I'm having the same issue.  Is there any workaround for this?
> > > > >
> > > > > Thanks
> > > > > Kumiko
> > > > >
> > > > > -----Original Message-----
> > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > To: user@drill.apache.org
> > > > > Subject: Drill query does not return all results from HBase
> > > > >
> > > > > We have a problem where a Drill query against HBase does not
> > > > > return
> > all
> > > > > results. The following query should return over 100,000 rows,
> > > > > but we
> > > only
> > > > > get about 1,030 back.
> > > > >
> > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > customer_number =
> > > 800
> > > > >
> > > > > If we scan directly using the hbase shell we see over 100,000
> > > > > rows,
> > but
> > > > > the same Drill query does not return a fraction of the expected
> > > results.
> > > > We
> > > > > have also run a count against the table and Drill returns the
> > > > > same
> > > 1,030
> > > > > number, which is far less than expect. What could be going wrong?
> > > > >
> > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > (HBase
> > 1.0).
> > > > We
> > > > > run HBase on six RegionServers, the table has about 1.3 billion
> rows.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Kevin
> > > > >
> > > > >
> > > >
> > >
> >
>

RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
I opened the bug on this.  The drill is returning the correct rows when the hbase contains 5 or less columns, but not 6 or more columns.

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com] 
Sent: Wednesday, January 13, 2016 4:52 PM
To: user@drill.apache.org
Cc: Aditya Kishore <ad...@gmail.com>; Kevin Verhoeven <Ke...@ds-iq.com>
Subject: RE: Drill query does not return all results from HBase

We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct row count returned when the Hbase table contains only 1 column family, 1 column, but the incorrect row count is returned for the Hbase table contains 1 column family, 6 columns.

This looks like the Drill issue.  Has anyone found any workaround?

Thanks
Kumiko

-----Original Message-----
From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
Sent: Tuesday, January 12, 2016 6:51 PM
To: user <us...@drill.apache.org>
Cc: Aditya Kishore <ad...@gmail.com>
Subject: Re: Drill query does not return all results from HBase

Well, the major version din't change if I remember it right, hence did not share the info in my previous mail. I'm on HBase 1.1.1 right now and don't see the issue. Also, I am on a MapR setup, which might not be comparable with their CDH setups.

On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse <al...@gmail.com>
wrote:

> Abhishek,
>
> What version of HBase did you have the problem with, and what version 
> did you upgrade to that solved the problem? I assume this would be 
> useful information to compare your setup with Kevin's and Kumiko's.
>
> - Jason
>
> On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < 
> abhishek.girish@gmail.com
> > wrote:
>
> > I hit a very similar issue recently. Via HBase shell, i was able to 
> > fetch all records, whereas I was only able to see a small subset of 
> > records
> when
> > queried from Drill. Each time I inserted 1000 records, only about 50 
> > of those would show up.
> >
> > Although I could repro' the problem consistently, it was resolved 
> > once i updated my Hadoop setup. My guess is that it was a HBase bug 
> > which got resolved. Although strange as it seems, it might not have 
> > to do with
> Drill
> > itself.
> >
> > -Abhishek
> >
> > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> altekrusejason@gmail.com
> > >
> > wrote:
> >
> > > I'm not sure why this is happening, we have tests in our automated
> suite
> > > that I believe run some pretty large queries against Hbase and 
> > > verify
> the
> > > results.
> > >
> > > Aditya, do you have some time available to try to reproduce this 
> > > and diagnose the problem?
> > >
> > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada 
> > > <Ku...@ds-iq.com>
> > wrote:
> > >
> > > > I'm having the same issue.  Is there any workaround for this?
> > > >
> > > > Thanks
> > > > Kumiko
> > > >
> > > > -----Original Message-----
> > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > To: user@drill.apache.org
> > > > Subject: Drill query does not return all results from HBase
> > > >
> > > > We have a problem where a Drill query against HBase does not 
> > > > return
> all
> > > > results. The following query should return over 100,000 rows, 
> > > > but we
> > only
> > > > get about 1,030 back.
> > > >
> > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE 
> > > > customer_number =
> > 800
> > > >
> > > > If we scan directly using the hbase shell we see over 100,000 
> > > > rows,
> but
> > > > the same Drill query does not return a fraction of the expected
> > results.
> > > We
> > > > have also run a count against the table and Drill returns the 
> > > > same
> > 1,030
> > > > number, which is far less than expect. What could be going wrong?
> > > >
> > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 
> > > > (HBase
> 1.0).
> > > We
> > > > run HBase on six RegionServers, the table has about 1.3 billion rows.
> > > >
> > > > Thanks,
> > > >
> > > > Kevin
> > > >
> > > >
> > >
> >
>

RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct row count returned when the Hbase table contains only 1 column family, 1 column, but the incorrect row count is returned for the Hbase table contains 1 column family, 6 columns.

This looks like the Drill issue.  Has anyone found any workaround?

Thanks
Kumiko

-----Original Message-----
From: Abhishek Girish [mailto:abhishek.girish@gmail.com] 
Sent: Tuesday, January 12, 2016 6:51 PM
To: user <us...@drill.apache.org>
Cc: Aditya Kishore <ad...@gmail.com>
Subject: Re: Drill query does not return all results from HBase

Well, the major version din't change if I remember it right, hence did not share the info in my previous mail. I'm on HBase 1.1.1 right now and don't see the issue. Also, I am on a MapR setup, which might not be comparable with their CDH setups.

On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse <al...@gmail.com>
wrote:

> Abhishek,
>
> What version of HBase did you have the problem with, and what version 
> did you upgrade to that solved the problem? I assume this would be 
> useful information to compare your setup with Kevin's and Kumiko's.
>
> - Jason
>
> On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < 
> abhishek.girish@gmail.com
> > wrote:
>
> > I hit a very similar issue recently. Via HBase shell, i was able to 
> > fetch all records, whereas I was only able to see a small subset of 
> > records
> when
> > queried from Drill. Each time I inserted 1000 records, only about 50 
> > of those would show up.
> >
> > Although I could repro' the problem consistently, it was resolved 
> > once i updated my Hadoop setup. My guess is that it was a HBase bug 
> > which got resolved. Although strange as it seems, it might not have 
> > to do with
> Drill
> > itself.
> >
> > -Abhishek
> >
> > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> altekrusejason@gmail.com
> > >
> > wrote:
> >
> > > I'm not sure why this is happening, we have tests in our automated
> suite
> > > that I believe run some pretty large queries against Hbase and 
> > > verify
> the
> > > results.
> > >
> > > Aditya, do you have some time available to try to reproduce this 
> > > and diagnose the problem?
> > >
> > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada 
> > > <Ku...@ds-iq.com>
> > wrote:
> > >
> > > > I'm having the same issue.  Is there any workaround for this?
> > > >
> > > > Thanks
> > > > Kumiko
> > > >
> > > > -----Original Message-----
> > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > To: user@drill.apache.org
> > > > Subject: Drill query does not return all results from HBase
> > > >
> > > > We have a problem where a Drill query against HBase does not 
> > > > return
> all
> > > > results. The following query should return over 100,000 rows, 
> > > > but we
> > only
> > > > get about 1,030 back.
> > > >
> > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE 
> > > > customer_number =
> > 800
> > > >
> > > > If we scan directly using the hbase shell we see over 100,000 
> > > > rows,
> but
> > > > the same Drill query does not return a fraction of the expected
> > results.
> > > We
> > > > have also run a count against the table and Drill returns the 
> > > > same
> > 1,030
> > > > number, which is far less than expect. What could be going wrong?
> > > >
> > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 
> > > > (HBase
> 1.0).
> > > We
> > > > run HBase on six RegionServers, the table has about 1.3 billion rows.
> > > >
> > > > Thanks,
> > > >
> > > > Kevin
> > > >
> > > >
> > >
> >
>

Re: Drill query does not return all results from HBase

Posted by Abhishek Girish <ab...@gmail.com>.
Well, the major version din't change if I remember it right, hence did not
share the info in my previous mail. I'm on HBase 1.1.1 right now and don't
see the issue. Also, I am on a MapR setup, which might not be comparable
with their CDH setups.

On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse <al...@gmail.com>
wrote:

> Abhishek,
>
> What version of HBase did you have the problem with, and what version did
> you upgrade to that solved the problem? I assume this would be useful
> information to compare your setup with Kevin's and Kumiko's.
>
> - Jason
>
> On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> abhishek.girish@gmail.com
> > wrote:
>
> > I hit a very similar issue recently. Via HBase shell, i was able to fetch
> > all records, whereas I was only able to see a small subset of records
> when
> > queried from Drill. Each time I inserted 1000 records, only about 50 of
> > those would show up.
> >
> > Although I could repro' the problem consistently, it was resolved once i
> > updated my Hadoop setup. My guess is that it was a HBase bug which got
> > resolved. Although strange as it seems, it might not have to do with
> Drill
> > itself.
> >
> > -Abhishek
> >
> > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> altekrusejason@gmail.com
> > >
> > wrote:
> >
> > > I'm not sure why this is happening, we have tests in our automated
> suite
> > > that I believe run some pretty large queries against Hbase and verify
> the
> > > results.
> > >
> > > Aditya, do you have some time available to try to reproduce this and
> > > diagnose the problem?
> > >
> > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada <Ku...@ds-iq.com>
> > wrote:
> > >
> > > > I'm having the same issue.  Is there any workaround for this?
> > > >
> > > > Thanks
> > > > Kumiko
> > > >
> > > > -----Original Message-----
> > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > To: user@drill.apache.org
> > > > Subject: Drill query does not return all results from HBase
> > > >
> > > > We have a problem where a Drill query against HBase does not return
> all
> > > > results. The following query should return over 100,000 rows, but we
> > only
> > > > get about 1,030 back.
> > > >
> > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE customer_number =
> > 800
> > > >
> > > > If we scan directly using the hbase shell we see over 100,000 rows,
> but
> > > > the same Drill query does not return a fraction of the expected
> > results.
> > > We
> > > > have also run a count against the table and Drill returns the same
> > 1,030
> > > > number, which is far less than expect. What could be going wrong?
> > > >
> > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 (HBase
> 1.0).
> > > We
> > > > run HBase on six RegionServers, the table has about 1.3 billion rows.
> > > >
> > > > Thanks,
> > > >
> > > > Kevin
> > > >
> > > >
> > >
> >
>

Re: Drill query does not return all results from HBase

Posted by Jason Altekruse <al...@gmail.com>.
Abhishek,

What version of HBase did you have the problem with, and what version did
you upgrade to that solved the problem? I assume this would be useful
information to compare your setup with Kevin's and Kumiko's.

- Jason

On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <abhishek.girish@gmail.com
> wrote:

> I hit a very similar issue recently. Via HBase shell, i was able to fetch
> all records, whereas I was only able to see a small subset of records when
> queried from Drill. Each time I inserted 1000 records, only about 50 of
> those would show up.
>
> Although I could repro' the problem consistently, it was resolved once i
> updated my Hadoop setup. My guess is that it was a HBase bug which got
> resolved. Although strange as it seems, it might not have to do with Drill
> itself.
>
> -Abhishek
>
> On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <altekrusejason@gmail.com
> >
> wrote:
>
> > I'm not sure why this is happening, we have tests in our automated suite
> > that I believe run some pretty large queries against Hbase and verify the
> > results.
> >
> > Aditya, do you have some time available to try to reproduce this and
> > diagnose the problem?
> >
> > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada <Ku...@ds-iq.com>
> wrote:
> >
> > > I'm having the same issue.  Is there any workaround for this?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > Sent: Monday, December 21, 2015 10:37 AM
> > > To: user@drill.apache.org
> > > Subject: Drill query does not return all results from HBase
> > >
> > > We have a problem where a Drill query against HBase does not return all
> > > results. The following query should return over 100,000 rows, but we
> only
> > > get about 1,030 back.
> > >
> > > SELECT row_key FROM `hbase`.`customer_staged` WHERE customer_number =
> 800
> > >
> > > If we scan directly using the hbase shell we see over 100,000 rows, but
> > > the same Drill query does not return a fraction of the expected
> results.
> > We
> > > have also run a count against the table and Drill returns the same
> 1,030
> > > number, which is far less than expect. What could be going wrong?
> > >
> > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 (HBase 1.0).
> > We
> > > run HBase on six RegionServers, the table has about 1.3 billion rows.
> > >
> > > Thanks,
> > >
> > > Kevin
> > >
> > >
> >
>

Re: Drill query does not return all results from HBase

Posted by Abhishek Girish <ab...@gmail.com>.
I hit a very similar issue recently. Via HBase shell, i was able to fetch
all records, whereas I was only able to see a small subset of records when
queried from Drill. Each time I inserted 1000 records, only about 50 of
those would show up.

Although I could repro' the problem consistently, it was resolved once i
updated my Hadoop setup. My guess is that it was a HBase bug which got
resolved. Although strange as it seems, it might not have to do with Drill
itself.

-Abhishek

On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <al...@gmail.com>
wrote:

> I'm not sure why this is happening, we have tests in our automated suite
> that I believe run some pretty large queries against Hbase and verify the
> results.
>
> Aditya, do you have some time available to try to reproduce this and
> diagnose the problem?
>
> On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:
>
> > I'm having the same issue.  Is there any workaround for this?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > Sent: Monday, December 21, 2015 10:37 AM
> > To: user@drill.apache.org
> > Subject: Drill query does not return all results from HBase
> >
> > We have a problem where a Drill query against HBase does not return all
> > results. The following query should return over 100,000 rows, but we only
> > get about 1,030 back.
> >
> > SELECT row_key FROM `hbase`.`customer_staged` WHERE customer_number = 800
> >
> > If we scan directly using the hbase shell we see over 100,000 rows, but
> > the same Drill query does not return a fraction of the expected results.
> We
> > have also run a count against the table and Drill returns the same 1,030
> > number, which is far less than expect. What could be going wrong?
> >
> > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 (HBase 1.0).
> We
> > run HBase on six RegionServers, the table has about 1.3 billion rows.
> >
> > Thanks,
> >
> > Kevin
> >
> >
>

Re: Drill query does not return all results from HBase

Posted by Jason Altekruse <al...@gmail.com>.
I'm not sure why this is happening, we have tests in our automated suite
that I believe run some pretty large queries against Hbase and verify the
results.

Aditya, do you have some time available to try to reproduce this and
diagnose the problem?

On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:

> I'm having the same issue.  Is there any workaround for this?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> Sent: Monday, December 21, 2015 10:37 AM
> To: user@drill.apache.org
> Subject: Drill query does not return all results from HBase
>
> We have a problem where a Drill query against HBase does not return all
> results. The following query should return over 100,000 rows, but we only
> get about 1,030 back.
>
> SELECT row_key FROM `hbase`.`customer_staged` WHERE customer_number = 800
>
> If we scan directly using the hbase shell we see over 100,000 rows, but
> the same Drill query does not return a fraction of the expected results. We
> have also run a count against the table and Drill returns the same 1,030
> number, which is far less than expect. What could be going wrong?
>
> We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 (HBase 1.0). We
> run HBase on six RegionServers, the table has about 1.3 billion rows.
>
> Thanks,
>
> Kevin
>
>

RE: Drill query does not return all results from HBase

Posted by Kumiko Yada <Ku...@ds-iq.com>.
I'm having the same issue.  Is there any workaround for this?

Thanks
Kumiko

-----Original Message-----
From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com] 
Sent: Monday, December 21, 2015 10:37 AM
To: user@drill.apache.org
Subject: Drill query does not return all results from HBase

We have a problem where a Drill query against HBase does not return all results. The following query should return over 100,000 rows, but we only get about 1,030 back.

SELECT row_key FROM `hbase`.`customer_staged` WHERE customer_number = 800

If we scan directly using the hbase shell we see over 100,000 rows, but the same Drill query does not return a fraction of the expected results. We have also run a count against the table and Drill returns the same 1,030 number, which is far less than expect. What could be going wrong?

We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 (HBase 1.0). We run HBase on six RegionServers, the table has about 1.3 billion rows.

Thanks,

Kevin