You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Anant Akinchan <An...@acupera.com> on 2015/12/17 18:56:45 UTC

Problems with queries when using drill with hbase table with thousands of columns in a column family.

Hi,
I have set up a small hbase cluster of six nodes on azure. And installed and configured a drillbit on each of these nodes. The hbase tables contains a column family with around 1000 columns. We want to expose these columns to tableau. So, I have created a view on this hbase table. I am able to list the columns in tableau and also run a few simple reports using them. They seem to be very slow though. My bigger problem is when I apply filters, the reports become very slow and sometimes come back with inconsistent results. For ex. If I set a filter on a particular value of the row_key, it does not return any result, even though the table contains a row for that key.
The other thing that I noticed using explain plan is that even if the filter is set on the row_key column and there is a single value provided like ‘ select * from table where row_key = ‘1’, the query still uses hbase’s scan method instead of get method. Can this be fixed?
So any suggestions to improve  over-all performance of queries and how to go about fixing the problem with filters would be very helpful.
Thanks,
Anant.