You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by qiang li <ti...@gmail.com> on 2016/05/17 11:50:54 UTC

query from hbase issue

Hi ,

I recently meet a issue that can not query the correct data from hbase with
sql by drill, can anybody help me.

I test with the drill 1.6.
My hbase scheme:
rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
cf : v
qualifier: v, e0, e1

The wrong result only happened when I use group by clause.

This sql will not return correct result:
select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
Part of explain of this sql is:

0: jdbc:drill:zk=rfdc5> explain plan for select CONVERT_FROM(a.`v`.`e0`,
'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
a.row_key > '0'  group by a.`v`.`e0`;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(k=[$0], p=[$1])
00-02        UnionExchange
01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
01-03              Project($f0=[$0], p=[$1])
01-04                HashToRandomExchange(dist0=[[$0]])
02-01                  UnorderedMuxExchange
03-01                    Project($f0=[$0], p=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
03-03                        Project($f0=[ITEM($1, 'e0')])
03-04                          Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
stopRow=, filter=null], columns=[`*`]]])

The data return very quickly , the result of this sql is :
+------+--------+
|  k   |   p    |
+------+--------+
| pay  | 12180  |
+------+--------

But I have millons of data in the table.

I tried to change the physical plan.  if I change the json explain *"columns"
: [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the correct
result.

It seems the physical plan is not correct.
I also try to debug the sql parser to find out the reason, but its too
complicate. Can anyone help me.

Also this sql have the same issue.
select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
count(a.row_key) p from hbase.browser_action2 a group by
BYTE_SUBSTR(a.row_key, 1 , 9);
I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
"`row_key`" ] *, it will return the correct result.

Re: query from hbase issue

Posted by qiang li <ti...@gmail.com>.
Hi ,

I am use drill to query hbase. But its very slow , I think one of the
reason is when have multiple qualifiers the query plan scan all the data
instead of the columns specified in the SQL.
I debuged the process, and find out that the main logic is in calcite and
hard to update it.

Can I get the physical plan and update it at client side ? I do some
research , the class DrillWorker can get plan , but I have to init the
context etc.? Does there have easy way I can get the physical plan ? So I
can update it based on my query condition.


2016-05-24 13:30 GMT+08:00 qiang li <ti...@gmail.com>:

> I upgrade to latest cdh version which is HBase 1.2.0-cdh5.7.0 and test it
> again, The result are correct now. Thanks for the help.
>
> Even though, I think the query plan still can be optimized.
>
> Here is what I think can improve:
> a. specify the columns when need
> b. remove the rowFilter when scan startRow and stopRow can meet it.
>     for example : select row_key from hbase.browser_action a where
> a.row_key >'0' and  a.row_key < '1' , I think sql like this will query
> faster without rowfilter.
>
>
>
> 2016-05-24 9:29 GMT+08:00 qiang li <ti...@gmail.com>:
>
>> Yes, Its seems like the same issue. I will upgrade it and test again. But
>> do you think we can update the physical plan too.  If we only want to query
>> one qualifier,  then the columns of the plan should only contains the
>> qualifier instead of "*". Maybe this plan will be query fast. Am I right?
>>
>> 2016-05-23 23:38 GMT+08:00 Krystal Nguyen <kn...@maprtech.com>:
>>
>>> Hi Qiang,
>>>
>>> Looks like you might be encountering this issue:
>>> https://issues.apache.org/jira/browse/DRILL-4271
>>>
>>> Thanks
>>>
>>> On Sun, May 22, 2016 at 8:38 PM, qiang li <ti...@gmail.com> wrote:
>>>
>>> > I test it step by step again. And finally I find out that the issue
>>> > happened only if the qualifier number is more than 3.
>>> >
>>> > It's werid, but this is the result I test.
>>> >
>>> > I tested about 10 thousands row of data. The length of the event is
>>> 6,the
>>> > code I used to test is like below:
>>> >
>>> > String[] earr = action.getEvent().geteArr();
>>> > for(int i=0;i<6;i++){
>>> >     put.addColumn(family, Bytes.toBytes("e"+i),
>>> Bytes.toBytes(earr[i]));
>>> > }
>>> >
>>> > Then I test step by step like below :
>>> >
>>> > put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
>>> > put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
>>> > put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
>>> > put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));
>>> >
>>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>>> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
>>> >  group by a.`v`.`e0`;
>>> > +-----+-------+
>>> > |  k  |   p   |
>>> > +-----+-------+
>>> > | e0  | 3856  |
>>> > +-----+-------+
>>> >
>>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>>> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
>>> > +-----+-------+
>>> > |  k  |   p   |
>>> > +-----+-------+
>>> > | e0  | 9094  |
>>> > +-----+-------+
>>> >
>>> >
>>> > if I put the qualifier "e3" into the table(that is more than 3
>>> qulifier),
>>> > then the issue reproduced.
>>> >
>>> >
>>> > 2016-05-23 9:23 GMT+08:00 qiang li <ti...@gmail.com>:
>>> >
>>> > > Sorry late.
>>> > >
>>> > > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
>>> > >
>>> > > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kn...@maprtech.com>:
>>> > >
>>> > >> Qiang, Can you please let us know the hbase version and hadoop
>>> > >> distribution
>>> > >> version that you are using.
>>> > >>
>>> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <
>>> knguyen@maprtech.com>
>>> > >> wrote:
>>> > >>
>>> > >> > Can you please let us know the hbase version and hadoop
>>> distribution
>>> > >> > version that you are using.
>>> > >> >
>>> > >> >
>>> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li <ti...@gmail.com>
>>> > wrote:
>>> > >> >
>>> > >> >> Khurram , I send the mail again, the last mail forget to cc to
>>> > >> >> user@drill.apache.org
>>> > >> >>
>>> > >> >> The main process is the same, but my rowkey is more complicate,
>>> > >> >> Here is the detail I tested.
>>> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
>>> > >> [event] +
>>> > >> >> [uid long] + [ts long]
>>> > >> >> also I have other qualifiers, only qualifier v:v is integer, the
>>> > others
>>> > >> >> are
>>> > >> >> string.
>>> > >> >>
>>> > >> >> example:
>>> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
>>> > >> >> ROW
>>> > >> COLUMN+CELL
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e0,
>>> > >> >> timestamp=1461839343076, value=pay
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e1,
>>> > >> >> timestamp=1461839343076, value=bijia
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e2,
>>> > >> >> timestamp=1461839343076, value=browser
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e3,
>>> > >> >> timestamp=1461839343076, value=*
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e4,
>>> > >> >> timestamp=1461839343076, value=*
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e5,
>>> > >> >> timestamp=1461839343076, value=*
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:h,
>>> > >> >> timestamp=1459771200000, value=20
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:m,
>>> > >> >> timestamp=1459771200000, value=0
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:v,
>>> > >> >> timestamp=1459771200000, value=\x00\x00\x00\x17
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 1 row(s) in 0.0410 seconds
>>> > >> >>
>>> > >> >>
>>> > >> >> Here is the example I how the issue look like:
>>> > >> >>
>>> > >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS =>
>>> ['v:e0'],
>>> > >> >> STARTROW=> '0'}
>>> > >> >> ........
>>> > >> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
>>> > >> column=v:e0,
>>> > >> >> timestamp=1463723029448, value=visit
>>> > >> >>
>>> > >> >>
>>> > >> >>  x01T\x00\x0A\xFA\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
>>> > >> column=v:e0,
>>> > >> >> timestamp=1463723029217, value=visit
>>> > >> >>
>>> > >> >>
>>> > >> >>  x01T\x00\x0A\xFA\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
>>> > >> column=v:e0,
>>> > >> >> timestamp=1463723029295, value=visit
>>> > >> >>
>>> > >> >>
>>> > >> >>  x01T\x00\x0A\xFA\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 9994 row(s) in 123.8650 seconds
>>> > >> >>
>>> > >> >> the drill result:
>>> > >> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8')
>>> as k,
>>> > >> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key
>>> >
>>> > '0'
>>> > >> >>  group by a.`v`.`e0`;
>>> > >> >> +--------+-------+
>>> > >> >> |   k    |   p   |
>>> > >> >> +--------+-------+
>>> > >> >> | visit  | 1216  |
>>> > >> >> +--------+-------+
>>> > >> >>
>>> > >> >>
>>> > >> >> I find out that if the row size larger than 10000  will have the
>>> > issue.
>>> > >> >> The
>>> > >> >> result is right if less that 1000 rows. But not always that way.
>>> > >> >> What I can make sure is if I updated the columns in the physical
>>> plan
>>> > >> and
>>> > >> >> query by web UI , the result will be correct.
>>> > >> >>
>>> > >> >>
>>> > >> >> Thanks
>>> > >> >>
>>> > >> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfaraaz@maprtech.com
>>> >:
>>> > >> >>
>>> > >> >> > Qiang, can you please take a look at DRILL-4686 and confirm if
>>> the
>>> > >> data
>>> > >> >> > set used in my repro is the same as the one you have used. If
>>> the
>>> > >> data
>>> > >> >> set
>>> > >> >> > is different please let us know the type of data that you have
>>> used
>>> > >> in
>>> > >> >> your
>>> > >> >> > table.
>>> > >> >> >
>>> > >> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share
>>> > >> results.
>>> > >> >> >
>>> > >> >> > Thanks,
>>> > >> >> > Khurram
>>> > >> >> >
>>> > >> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <
>>> amansinha@apache.org
>>> > >
>>> > >> >> wrote:
>>> > >> >> >
>>> > >> >> >> Khurram,  DRILL-4686 seems like a different issue...it is
>>> > reporting
>>> > >> an
>>> > >> >> >> error whereas the original problem from qiang was an incorrect
>>> > >> result.
>>> > >> >> >> Can
>>> > >> >> >> you use the same version (1.6) that he was using.  Also, is
>>> the
>>> > data
>>> > >> >> set
>>> > >> >> >> similar ? If you are unable to repro the exact same issue,
>>> > perhaps
>>> > >> >> qiang
>>> > >> >> >> should file a JIRA with a smaller repro if possible.
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <
>>> > >> kfaraaz@maprtech.com>
>>> > >> >> >> wrote:
>>> > >> >> >>
>>> > >> >> >> > Hello Qiang,
>>> > >> >> >> >
>>> > >> >> >> > DRILL-4686 is reported to track this problem.
>>> > >> >> >> >
>>> > >> >> >> > Thanks,
>>> > >> >> >> > Khurram
>>> > >> >> >> >
>>> > >> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <
>>> tiredqiang@gmail.com
>>> > >
>>> > >> >> wrote:
>>> > >> >> >> >
>>> > >> >> >> >> Ok, Thanks very much.
>>> > >> >> >> >>
>>> > >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <
>>> > kfaraaz@maprtech.com
>>> > >> >:
>>> > >> >> >> >>
>>> > >> >> >> >>> Hello Qiang,
>>> > >> >> >> >>>
>>> > >> >> >> >>> Someone from our Drill team (in San Jose) will get back
>>> to you
>>> > >> >> soon. I
>>> > >> >> >> >>> work from the India lab and I am in a different time zone
>>> as
>>> > >> >> compared
>>> > >> >> >> to
>>> > >> >> >> >>> San Jose office, some one from MapR San Jose will get
>>> back to
>>> > >> you
>>> > >> >> as
>>> > >> >> >> soon
>>> > >> >> >> >>> as possible.
>>> > >> >> >> >>>
>>> > >> >> >> >>> Thanks,
>>> > >> >> >> >>> Khurram
>>> > >> >> >> >>>
>>> > >> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <
>>> > tiredqiang@gmail.com
>>> > >> >
>>> > >> >> >> wrote:
>>> > >> >> >> >>>
>>> > >> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's
>>> the
>>> > >> >> >> >>>> conclusion?
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> Any idea how to sovle it?
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <
>>> > >> kfaraaz@maprtech.com>:
>>> > >> >> >> >>>>
>>> > >> >> >> >>>>> So I tried to create the table using HBase API (with no
>>> data
>>> > >> >> >> inserted
>>> > >> >> >> >>>>> into table) and I got the query plan for drill 1.7.0
>>> > >> >> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>>> > >> >> >> >>>>> +--------------+------------+--------------+
>>> > >> >> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>>> > >> >> >> >>>>> +--------------+------------+--------------+
>>> > >> >> >> >>>>> | row_key      | ANY        | NO           |
>>> > >> >> >> >>>>> | v            | MAP        | NO           |
>>> > >> >> >> >>>>> +--------------+------------+--------------+
>>> > >> >> >> >>>>> 2 rows selected (1.665 seconds)
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> Table creation Java program
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>> public class PutIntDataToHBase {
>>> > >> >> >> >>>>>     public static void main(String args[]) throws
>>> > IOException
>>> > >> {
>>> > >> >> >> >>>>>         Configuration conf =
>>> HBaseConfiguration.create();
>>> > >> >> >> >>>>>
>>> > >>  conf.set("hbase.zookeeper.property.clientPort","5181");
>>> > >> >> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>>> > >> >> >> >>>>>         if (admin.tableExists("browser_action2")) {
>>> > >> >> >> >>>>>             admin.disableTable("browser_action2");
>>> > >> >> >> >>>>>             admin.deleteTable("browser_action2");
>>> > >> >> >> >>>>>         }
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>>         byte[][] SPLIT_KEYS =
>>> > >> >> >> >>>>>
>>> > {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>>> > >> >> >> >>>>>         HTableDescriptor tableDesc = new
>>> > >> >> >> >>>>>
>>> > >> >>  HTableDescriptor(TableName.valueOf("browser_action2"));
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>>> > >> >> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>>     }
>>> > >> >> >> >>>>> }
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> Query plan for the query that was reported as returning
>>> > wrong
>>> > >> >> >> results.
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>> > >> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>>> count(a.`v`.`e0`) p
>>> > >> from
>>> > >> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
>>> > >> >> a.`v`.`e0`;
>>> > >> >> >> >>>>> +------+------+
>>> > >> >> >> >>>>> | text | json |
>>> > >> >> >> >>>>> +------+------+
>>> > >> >> >> >>>>> | 00-00    Screen
>>> > >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>>> > >> >> >> >>>>> 00-02        UnionExchange
>>> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>> > >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>>> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
>>> > >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
>>> > p=[COUNT($0)])
>>> > >> >> >> >>>>> 03-03                        Project($f0=[ITEM($1,
>>> 'e0')])
>>> > >> >> >> >>>>> 03-04
>>> > Scan(groupscan=[HBaseGroupScan
>>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>> > >> >> >> startRow=0\x00,
>>> > >> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> and the query plan for the other problem query
>>> mentioned in
>>> > >> the
>>> > >> >> >> first
>>> > >> >> >> >>>>> email.
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>> > >> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as
>>> k,
>>> > >> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>>> > >> >> >> >>>>> hbase.browser_action2 a group by
>>> > >> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1
>>> , 9);
>>> > >> >> >> >>>>> +------+------+
>>> > >> >> >> >>>>> | text | json |
>>> > >> >> >> >>>>> +------+------+
>>> > >> >> >> >>>>> | 00-00    Screen
>>> > >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>>> > >> >> >> >>>>> 00-02        UnionExchange
>>> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>> > >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>>> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
>>> > >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
>>> > p=[COUNT($1)])
>>> > >> >> >> >>>>> 03-03
>>> Project($f0=[BYTE_SUBSTR($0, 1,
>>> > >> 9)],
>>> > >> >> >> >>>>> row_key=[$0])
>>> > >> >> >> >>>>> 03-04
>>> > Scan(groupscan=[HBaseGroupScan
>>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>> > >> >> >> startRow=null,
>>> > >> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> Thanks,
>>> > >> >> >> >>>>> Khurram
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <
>>> > >> tiredqiang@gmail.com>
>>> > >> >> >> >>>>> wrote:
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>>> Yes.
>>> > >> >> >> >>>>>> I use hbase API to create it.
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> The main code is:
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'},
>>> {'4'},
>>> > >> >> {'5'},
>>> > >> >> >> {'6'}, {'7'},{'8'}, {'9'} };
>>> > >> >> >> >>>>>> TableName tableName =
>>> TableName.valueOf("browser_action2");
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> HTableDescriptor tableDesc = new
>>> > HTableDescriptor(tableName);
>>> > >> >> >> >>>>>> HColumnDescriptor columnDesc = new
>>> HColumnDescriptor("v");
>>> > >> >> >> >>>>>> tableDesc.addFamily(columnDesc);
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>>> > >> >> >> >>>>>>
>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <
>>> zfong@maprtech.com
>>> > >:
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
>>> > >> >> reproduce
>>> > >> >> >> this
>>> > >> >> >> >>>>>>> problem so we can try to reproduce it on our end.
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> Thanks.
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> -- Zelaine
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
>>> > >> >> tiredqiang@gmail.com>
>>> > >> >> >> >>>>>>> wrote:
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> > Hi ,
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > I recently meet a issue that can not query the
>>> correct
>>> > >> data
>>> > >> >> from
>>> > >> >> >> >>>>>>> hbase with
>>> > >> >> >> >>>>>>> > sql by drill, can anybody help me.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > I test with the drill 1.6.
>>> > >> >> >> >>>>>>> > My hbase scheme:
>>> > >> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg:
>>> > 120160411visituidts
>>> > >> >> >> >>>>>>> > cf : v
>>> > >> >> >> >>>>>>> > qualifier: v, e0, e1
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > The wrong result only happened when I use group by
>>> > clause.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > This sql will not return correct result:
>>> > >> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>>> > >> >> count(a.`v`.`e0`)
>>> > >> >> >> p
>>> > >> >> >> >>>>>>> from
>>> > >> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'
>>> group by
>>> > >> >> >> a.`v`.`e0`;
>>> > >> >> >> >>>>>>> > Part of explain of this sql is:
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>>> > >> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>>> > >> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from
>>> > >> hbase.browser_action2
>>> > >> >> a
>>> > >> >> >> >>>>>>> where
>>> > >> >> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>>> > >> >> >> >>>>>>> > +------+------+
>>> > >> >> >> >>>>>>> > | text | json |
>>> > >> >> >> >>>>>>> > +------+------+
>>> > >> >> >> >>>>>>> > | 00-00    Screen
>>> > >> >> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
>>> > >> >> >> >>>>>>> > 00-02        UnionExchange
>>> > >> >> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)],
>>> p=[$1])
>>> > >> >> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>> > >> >> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>>> > >> >> >> >>>>>>> > 01-04
>>> HashToRandomExchange(dist0=[[$0]])
>>> > >> >> >> >>>>>>> > 02-01                  UnorderedMuxExchange
>>> > >> >> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>>> > >> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > >> >> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
>>> > >> >> p=[COUNT($0)])
>>> > >> >> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1,
>>> > 'e0')])
>>> > >> >> >> >>>>>>> > 03-04
>>> > >> Scan(groupscan=[HBaseGroupScan
>>> > >> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec
>>> [tableName=browser_action2,
>>> > >> >> >> >>>>>>> startRow=0\x00,
>>> > >> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > The data return very quickly , the result of this
>>> sql
>>> > is :
>>> > >> >> >> >>>>>>> > +------+--------+
>>> > >> >> >> >>>>>>> > |  k   |   p    |
>>> > >> >> >> >>>>>>> > +------+--------+
>>> > >> >> >> >>>>>>> > | pay  | 12180  |
>>> > >> >> >> >>>>>>> > +------+--------
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > But I have millons of data in the table.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > I tried to change the physical plan.  if I change
>>> the
>>> > json
>>> > >> >> >> explain
>>> > >> >> >> >>>>>>> > *"columns"
>>> > >> >> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it
>>> will
>>> > >> >> return
>>> > >> >> >> the
>>> > >> >> >> >>>>>>> correct
>>> > >> >> >> >>>>>>> > result.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > It seems the physical plan is not correct.
>>> > >> >> >> >>>>>>> > I also try to debug the sql parser to find out the
>>> > reason,
>>> > >> >> but
>>> > >> >> >> its
>>> > >> >> >> >>>>>>> too
>>> > >> >> >> >>>>>>> > complicate. Can anyone help me.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > Also this sql have the same issue.
>>> > >> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9),
>>> > 'UTF8')
>>> > >> >> as
>>> > >> >> >> k,
>>> > >> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a
>>> group by
>>> > >> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>>> > >> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*
>>> to
>>> > >> >> >> *"columns" :
>>> > >> >> >> >>>>>>> [
>>> > >> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>
>>> > >> >> >> >>>
>>> > >> >> >> >>
>>> > >> >> >> >
>>> > >> >> >>
>>> > >> >> >
>>> > >> >> >
>>> > >> >>
>>> > >> >
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> >
>>>
>>
>>
>

Re: query from hbase issue

Posted by qiang li <ti...@gmail.com>.
I upgrade to latest cdh version which is HBase 1.2.0-cdh5.7.0 and test it
again, The result are correct now. Thanks for the help.

Even though, I think the query plan still can be optimized.

Here is what I think can improve:
a. specify the columns when need
b. remove the rowFilter when scan startRow and stopRow can meet it.
    for example : select row_key from hbase.browser_action a where
a.row_key >'0' and  a.row_key < '1' , I think sql like this will query
faster without rowfilter.



2016-05-24 9:29 GMT+08:00 qiang li <ti...@gmail.com>:

> Yes, Its seems like the same issue. I will upgrade it and test again. But
> do you think we can update the physical plan too.  If we only want to query
> one qualifier,  then the columns of the plan should only contains the
> qualifier instead of "*". Maybe this plan will be query fast. Am I right?
>
> 2016-05-23 23:38 GMT+08:00 Krystal Nguyen <kn...@maprtech.com>:
>
>> Hi Qiang,
>>
>> Looks like you might be encountering this issue:
>> https://issues.apache.org/jira/browse/DRILL-4271
>>
>> Thanks
>>
>> On Sun, May 22, 2016 at 8:38 PM, qiang li <ti...@gmail.com> wrote:
>>
>> > I test it step by step again. And finally I find out that the issue
>> > happened only if the qualifier number is more than 3.
>> >
>> > It's werid, but this is the result I test.
>> >
>> > I tested about 10 thousands row of data. The length of the event is
>> 6,the
>> > code I used to test is like below:
>> >
>> > String[] earr = action.getEvent().geteArr();
>> > for(int i=0;i<6;i++){
>> >     put.addColumn(family, Bytes.toBytes("e"+i),
>> Bytes.toBytes(earr[i]));
>> > }
>> >
>> > Then I test step by step like below :
>> >
>> > put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
>> > put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
>> > put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
>> > put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));
>> >
>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
>> >  group by a.`v`.`e0`;
>> > +-----+-------+
>> > |  k  |   p   |
>> > +-----+-------+
>> > | e0  | 3856  |
>> > +-----+-------+
>> >
>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
>> > +-----+-------+
>> > |  k  |   p   |
>> > +-----+-------+
>> > | e0  | 9094  |
>> > +-----+-------+
>> >
>> >
>> > if I put the qualifier "e3" into the table(that is more than 3
>> qulifier),
>> > then the issue reproduced.
>> >
>> >
>> > 2016-05-23 9:23 GMT+08:00 qiang li <ti...@gmail.com>:
>> >
>> > > Sorry late.
>> > >
>> > > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
>> > >
>> > > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kn...@maprtech.com>:
>> > >
>> > >> Qiang, Can you please let us know the hbase version and hadoop
>> > >> distribution
>> > >> version that you are using.
>> > >>
>> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <
>> knguyen@maprtech.com>
>> > >> wrote:
>> > >>
>> > >> > Can you please let us know the hbase version and hadoop
>> distribution
>> > >> > version that you are using.
>> > >> >
>> > >> >
>> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li <ti...@gmail.com>
>> > wrote:
>> > >> >
>> > >> >> Khurram , I send the mail again, the last mail forget to cc to
>> > >> >> user@drill.apache.org
>> > >> >>
>> > >> >> The main process is the same, but my rowkey is more complicate,
>> > >> >> Here is the detail I tested.
>> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
>> > >> [event] +
>> > >> >> [uid long] + [ts long]
>> > >> >> also I have other qualifiers, only qualifier v:v is integer, the
>> > others
>> > >> >> are
>> > >> >> string.
>> > >> >>
>> > >> >> example:
>> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
>> > >> >> ROW
>> > >> COLUMN+CELL
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e0,
>> > >> >> timestamp=1461839343076, value=pay
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e1,
>> > >> >> timestamp=1461839343076, value=bijia
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e2,
>> > >> >> timestamp=1461839343076, value=browser
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e3,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e4,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e5,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:h,
>> > >> >> timestamp=1459771200000, value=20
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:m,
>> > >> >> timestamp=1459771200000, value=0
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:v,
>> > >> >> timestamp=1459771200000, value=\x00\x00\x00\x17
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 1 row(s) in 0.0410 seconds
>> > >> >>
>> > >> >>
>> > >> >> Here is the example I how the issue look like:
>> > >> >>
>> > >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
>> > >> >> STARTROW=> '0'}
>> > >> >> ........
>> > >> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
>> > >> column=v:e0,
>> > >> >> timestamp=1463723029448, value=visit
>> > >> >>
>> > >> >>
>> > >> >>  x01T\x00\x0A\xFA\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
>> > >> column=v:e0,
>> > >> >> timestamp=1463723029217, value=visit
>> > >> >>
>> > >> >>
>> > >> >>  x01T\x00\x0A\xFA\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
>> > >> column=v:e0,
>> > >> >> timestamp=1463723029295, value=visit
>> > >> >>
>> > >> >>
>> > >> >>  x01T\x00\x0A\xFA\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 9994 row(s) in 123.8650 seconds
>> > >> >>
>> > >> >> the drill result:
>> > >> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8')
>> as k,
>> > >> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key >
>> > '0'
>> > >> >>  group by a.`v`.`e0`;
>> > >> >> +--------+-------+
>> > >> >> |   k    |   p   |
>> > >> >> +--------+-------+
>> > >> >> | visit  | 1216  |
>> > >> >> +--------+-------+
>> > >> >>
>> > >> >>
>> > >> >> I find out that if the row size larger than 10000  will have the
>> > issue.
>> > >> >> The
>> > >> >> result is right if less that 1000 rows. But not always that way.
>> > >> >> What I can make sure is if I updated the columns in the physical
>> plan
>> > >> and
>> > >> >> query by web UI , the result will be correct.
>> > >> >>
>> > >> >>
>> > >> >> Thanks
>> > >> >>
>> > >> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>> > >> >>
>> > >> >> > Qiang, can you please take a look at DRILL-4686 and confirm if
>> the
>> > >> data
>> > >> >> > set used in my repro is the same as the one you have used. If
>> the
>> > >> data
>> > >> >> set
>> > >> >> > is different please let us know the type of data that you have
>> used
>> > >> in
>> > >> >> your
>> > >> >> > table.
>> > >> >> >
>> > >> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share
>> > >> results.
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> > Khurram
>> > >> >> >
>> > >> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <
>> amansinha@apache.org
>> > >
>> > >> >> wrote:
>> > >> >> >
>> > >> >> >> Khurram,  DRILL-4686 seems like a different issue...it is
>> > reporting
>> > >> an
>> > >> >> >> error whereas the original problem from qiang was an incorrect
>> > >> result.
>> > >> >> >> Can
>> > >> >> >> you use the same version (1.6) that he was using.  Also, is the
>> > data
>> > >> >> set
>> > >> >> >> similar ? If you are unable to repro the exact same issue,
>> > perhaps
>> > >> >> qiang
>> > >> >> >> should file a JIRA with a smaller repro if possible.
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <
>> > >> kfaraaz@maprtech.com>
>> > >> >> >> wrote:
>> > >> >> >>
>> > >> >> >> > Hello Qiang,
>> > >> >> >> >
>> > >> >> >> > DRILL-4686 is reported to track this problem.
>> > >> >> >> >
>> > >> >> >> > Thanks,
>> > >> >> >> > Khurram
>> > >> >> >> >
>> > >> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <
>> tiredqiang@gmail.com
>> > >
>> > >> >> wrote:
>> > >> >> >> >
>> > >> >> >> >> Ok, Thanks very much.
>> > >> >> >> >>
>> > >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <
>> > kfaraaz@maprtech.com
>> > >> >:
>> > >> >> >> >>
>> > >> >> >> >>> Hello Qiang,
>> > >> >> >> >>>
>> > >> >> >> >>> Someone from our Drill team (in San Jose) will get back to
>> you
>> > >> >> soon. I
>> > >> >> >> >>> work from the India lab and I am in a different time zone
>> as
>> > >> >> compared
>> > >> >> >> to
>> > >> >> >> >>> San Jose office, some one from MapR San Jose will get back
>> to
>> > >> you
>> > >> >> as
>> > >> >> >> soon
>> > >> >> >> >>> as possible.
>> > >> >> >> >>>
>> > >> >> >> >>> Thanks,
>> > >> >> >> >>> Khurram
>> > >> >> >> >>>
>> > >> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <
>> > tiredqiang@gmail.com
>> > >> >
>> > >> >> >> wrote:
>> > >> >> >> >>>
>> > >> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's
>> the
>> > >> >> >> >>>> conclusion?
>> > >> >> >> >>>>
>> > >> >> >> >>>> Any idea how to sovle it?
>> > >> >> >> >>>>
>> > >> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <
>> > >> kfaraaz@maprtech.com>:
>> > >> >> >> >>>>
>> > >> >> >> >>>>> So I tried to create the table using HBase API (with no
>> data
>> > >> >> >> inserted
>> > >> >> >> >>>>> into table) and I got the query plan for drill 1.7.0
>> > >> >> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> > >> >> >> >>>>> +--------------+------------+--------------+
>> > >> >> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>> > >> >> >> >>>>> +--------------+------------+--------------+
>> > >> >> >> >>>>> | row_key      | ANY        | NO           |
>> > >> >> >> >>>>> | v            | MAP        | NO           |
>> > >> >> >> >>>>> +--------------+------------+--------------+
>> > >> >> >> >>>>> 2 rows selected (1.665 seconds)
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Table creation Java program
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>> public class PutIntDataToHBase {
>> > >> >> >> >>>>>     public static void main(String args[]) throws
>> > IOException
>> > >> {
>> > >> >> >> >>>>>         Configuration conf = HBaseConfiguration.create();
>> > >> >> >> >>>>>
>> > >>  conf.set("hbase.zookeeper.property.clientPort","5181");
>> > >> >> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>> > >> >> >> >>>>>         if (admin.tableExists("browser_action2")) {
>> > >> >> >> >>>>>             admin.disableTable("browser_action2");
>> > >> >> >> >>>>>             admin.deleteTable("browser_action2");
>> > >> >> >> >>>>>         }
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>         byte[][] SPLIT_KEYS =
>> > >> >> >> >>>>>
>> > {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>> > >> >> >> >>>>>         HTableDescriptor tableDesc = new
>> > >> >> >> >>>>>
>> > >> >>  HTableDescriptor(TableName.valueOf("browser_action2"));
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>> > >> >> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>     }
>> > >> >> >> >>>>> }
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Query plan for the query that was reported as returning
>> > wrong
>> > >> >> >> results.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> > >> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`)
>> p
>> > >> from
>> > >> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
>> > >> >> a.`v`.`e0`;
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | text | json |
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | 00-00    Screen
>> > >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> > >> >> >> >>>>> 00-02        UnionExchange
>> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> > >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
>> > >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
>> > p=[COUNT($0)])
>> > >> >> >> >>>>> 03-03                        Project($f0=[ITEM($1,
>> 'e0')])
>> > >> >> >> >>>>> 03-04
>> > Scan(groupscan=[HBaseGroupScan
>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> > >> >> >> startRow=0\x00,
>> > >> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> and the query plan for the other problem query mentioned
>> in
>> > >> the
>> > >> >> >> first
>> > >> >> >> >>>>> email.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> > >> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as
>> k,
>> > >> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> > >> >> >> >>>>> hbase.browser_action2 a group by
>> > >> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 ,
>> 9);
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | text | json |
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | 00-00    Screen
>> > >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> > >> >> >> >>>>> 00-02        UnionExchange
>> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> > >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
>> > >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
>> > p=[COUNT($1)])
>> > >> >> >> >>>>> 03-03
>> Project($f0=[BYTE_SUBSTR($0, 1,
>> > >> 9)],
>> > >> >> >> >>>>> row_key=[$0])
>> > >> >> >> >>>>> 03-04
>> > Scan(groupscan=[HBaseGroupScan
>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> > >> >> >> startRow=null,
>> > >> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Thanks,
>> > >> >> >> >>>>> Khurram
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <
>> > >> tiredqiang@gmail.com>
>> > >> >> >> >>>>> wrote:
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>> Yes.
>> > >> >> >> >>>>>> I use hbase API to create it.
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> The main code is:
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'},
>> {'4'},
>> > >> >> {'5'},
>> > >> >> >> {'6'}, {'7'},{'8'}, {'9'} };
>> > >> >> >> >>>>>> TableName tableName =
>> TableName.valueOf("browser_action2");
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> HTableDescriptor tableDesc = new
>> > HTableDescriptor(tableName);
>> > >> >> >> >>>>>> HColumnDescriptor columnDesc = new
>> HColumnDescriptor("v");
>> > >> >> >> >>>>>> tableDesc.addFamily(columnDesc);
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> > >> >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <
>> zfong@maprtech.com
>> > >:
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
>> > >> >> reproduce
>> > >> >> >> this
>> > >> >> >> >>>>>>> problem so we can try to reproduce it on our end.
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> Thanks.
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> -- Zelaine
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
>> > >> >> tiredqiang@gmail.com>
>> > >> >> >> >>>>>>> wrote:
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> > Hi ,
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > I recently meet a issue that can not query the
>> correct
>> > >> data
>> > >> >> from
>> > >> >> >> >>>>>>> hbase with
>> > >> >> >> >>>>>>> > sql by drill, can anybody help me.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > I test with the drill 1.6.
>> > >> >> >> >>>>>>> > My hbase scheme:
>> > >> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg:
>> > 120160411visituidts
>> > >> >> >> >>>>>>> > cf : v
>> > >> >> >> >>>>>>> > qualifier: v, e0, e1
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > The wrong result only happened when I use group by
>> > clause.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > This sql will not return correct result:
>> > >> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > >> >> count(a.`v`.`e0`)
>> > >> >> >> p
>> > >> >> >> >>>>>>> from
>> > >> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group
>> by
>> > >> >> >> a.`v`.`e0`;
>> > >> >> >> >>>>>>> > Part of explain of this sql is:
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>> > >> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>> > >> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from
>> > >> hbase.browser_action2
>> > >> >> a
>> > >> >> >> >>>>>>> where
>> > >> >> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>> > >> >> >> >>>>>>> > +------+------+
>> > >> >> >> >>>>>>> > | text | json |
>> > >> >> >> >>>>>>> > +------+------+
>> > >> >> >> >>>>>>> > | 00-00    Screen
>> > >> >> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
>> > >> >> >> >>>>>>> > 00-02        UnionExchange
>> > >> >> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)],
>> p=[$1])
>> > >> >> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> > >> >> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>> > >> >> >> >>>>>>> > 01-04
>> HashToRandomExchange(dist0=[[$0]])
>> > >> >> >> >>>>>>> > 02-01                  UnorderedMuxExchange
>> > >> >> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>> > >> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > >> >> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
>> > >> >> p=[COUNT($0)])
>> > >> >> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1,
>> > 'e0')])
>> > >> >> >> >>>>>>> > 03-04
>> > >> Scan(groupscan=[HBaseGroupScan
>> > >> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec
>> [tableName=browser_action2,
>> > >> >> >> >>>>>>> startRow=0\x00,
>> > >> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > The data return very quickly , the result of this sql
>> > is :
>> > >> >> >> >>>>>>> > +------+--------+
>> > >> >> >> >>>>>>> > |  k   |   p    |
>> > >> >> >> >>>>>>> > +------+--------+
>> > >> >> >> >>>>>>> > | pay  | 12180  |
>> > >> >> >> >>>>>>> > +------+--------
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > But I have millons of data in the table.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > I tried to change the physical plan.  if I change the
>> > json
>> > >> >> >> explain
>> > >> >> >> >>>>>>> > *"columns"
>> > >> >> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it
>> will
>> > >> >> return
>> > >> >> >> the
>> > >> >> >> >>>>>>> correct
>> > >> >> >> >>>>>>> > result.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > It seems the physical plan is not correct.
>> > >> >> >> >>>>>>> > I also try to debug the sql parser to find out the
>> > reason,
>> > >> >> but
>> > >> >> >> its
>> > >> >> >> >>>>>>> too
>> > >> >> >> >>>>>>> > complicate. Can anyone help me.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > Also this sql have the same issue.
>> > >> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9),
>> > 'UTF8')
>> > >> >> as
>> > >> >> >> k,
>> > >> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a
>> group by
>> > >> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> > >> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
>> > >> >> >> *"columns" :
>> > >> >> >> >>>>>>> [
>> > >> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>
>> > >> >> >> >>>>
>> > >> >> >> >>>
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: query from hbase issue

Posted by qiang li <ti...@gmail.com>.
Yes, Its seems like the same issue. I will upgrade it and test again. But
do you think we can update the physical plan too.  If we only want to query
one qualifier,  then the columns of the plan should only contains the
qualifier instead of "*". Maybe this plan will be query fast. Am I right?

2016-05-23 23:38 GMT+08:00 Krystal Nguyen <kn...@maprtech.com>:

> Hi Qiang,
>
> Looks like you might be encountering this issue:
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
>
> On Sun, May 22, 2016 at 8:38 PM, qiang li <ti...@gmail.com> wrote:
>
> > I test it step by step again. And finally I find out that the issue
> > happened only if the qualifier number is more than 3.
> >
> > It's werid, but this is the result I test.
> >
> > I tested about 10 thousands row of data. The length of the event is 6,the
> > code I used to test is like below:
> >
> > String[] earr = action.getEvent().geteArr();
> > for(int i=0;i<6;i++){
> >     put.addColumn(family, Bytes.toBytes("e"+i),  Bytes.toBytes(earr[i]));
> > }
> >
> > Then I test step by step like below :
> >
> > put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
> > put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
> > put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
> > put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));
> >
> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
> >  group by a.`v`.`e0`;
> > +-----+-------+
> > |  k  |   p   |
> > +-----+-------+
> > | e0  | 3856  |
> > +-----+-------+
> >
> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
> > +-----+-------+
> > |  k  |   p   |
> > +-----+-------+
> > | e0  | 9094  |
> > +-----+-------+
> >
> >
> > if I put the qualifier "e3" into the table(that is more than 3 qulifier),
> > then the issue reproduced.
> >
> >
> > 2016-05-23 9:23 GMT+08:00 qiang li <ti...@gmail.com>:
> >
> > > Sorry late.
> > >
> > > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
> > >
> > > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kn...@maprtech.com>:
> > >
> > >> Qiang, Can you please let us know the hbase version and hadoop
> > >> distribution
> > >> version that you are using.
> > >>
> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <knguyen@maprtech.com
> >
> > >> wrote:
> > >>
> > >> > Can you please let us know the hbase version and hadoop distribution
> > >> > version that you are using.
> > >> >
> > >> >
> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li <ti...@gmail.com>
> > wrote:
> > >> >
> > >> >> Khurram , I send the mail again, the last mail forget to cc to
> > >> >> user@drill.apache.org
> > >> >>
> > >> >> The main process is the same, but my rowkey is more complicate,
> > >> >> Here is the detail I tested.
> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
> > >> [event] +
> > >> >> [uid long] + [ts long]
> > >> >> also I have other qualifiers, only qualifier v:v is integer, the
> > others
> > >> >> are
> > >> >> string.
> > >> >>
> > >> >> example:
> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
> > >> >> ROW
> > >> COLUMN+CELL
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e0,
> > >> >> timestamp=1461839343076, value=pay
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e1,
> > >> >> timestamp=1461839343076, value=bijia
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e2,
> > >> >> timestamp=1461839343076, value=browser
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e3,
> > >> >> timestamp=1461839343076, value=*
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e4,
> > >> >> timestamp=1461839343076, value=*
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:e5,
> > >> >> timestamp=1461839343076, value=*
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:h,
> > >> >> timestamp=1459771200000, value=20
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:m,
> > >> >> timestamp=1459771200000, value=0
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> > >> column=v:v,
> > >> >> timestamp=1459771200000, value=\x00\x00\x00\x17
> > >> >>
> > >> >>
> > >> >>  1$\xD2\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >> 1 row(s) in 0.0410 seconds
> > >> >>
> > >> >>
> > >> >> Here is the example I how the issue look like:
> > >> >>
> > >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
> > >> >> STARTROW=> '0'}
> > >> >> ........
> > >> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
> > >> column=v:e0,
> > >> >> timestamp=1463723029448, value=visit
> > >> >>
> > >> >>
> > >> >>  x01T\x00\x0A\xFA\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
> > >> column=v:e0,
> > >> >> timestamp=1463723029217, value=visit
> > >> >>
> > >> >>
> > >> >>  x01T\x00\x0A\xFA\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
> > >> column=v:e0,
> > >> >> timestamp=1463723029295, value=visit
> > >> >>
> > >> >>
> > >> >>  x01T\x00\x0A\xFA\x00
> > >> >>
> > >> >>
> > >> >>
> > >> >> 9994 row(s) in 123.8650 seconds
> > >> >>
> > >> >> the drill result:
> > >> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as
> k,
> > >> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key >
> > '0'
> > >> >>  group by a.`v`.`e0`;
> > >> >> +--------+-------+
> > >> >> |   k    |   p   |
> > >> >> +--------+-------+
> > >> >> | visit  | 1216  |
> > >> >> +--------+-------+
> > >> >>
> > >> >>
> > >> >> I find out that if the row size larger than 10000  will have the
> > issue.
> > >> >> The
> > >> >> result is right if less that 1000 rows. But not always that way.
> > >> >> What I can make sure is if I updated the columns in the physical
> plan
> > >> and
> > >> >> query by web UI , the result will be correct.
> > >> >>
> > >> >>
> > >> >> Thanks
> > >> >>
> > >> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
> > >> >>
> > >> >> > Qiang, can you please take a look at DRILL-4686 and confirm if
> the
> > >> data
> > >> >> > set used in my repro is the same as the one you have used. If the
> > >> data
> > >> >> set
> > >> >> > is different please let us know the type of data that you have
> used
> > >> in
> > >> >> your
> > >> >> > table.
> > >> >> >
> > >> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share
> > >> results.
> > >> >> >
> > >> >> > Thanks,
> > >> >> > Khurram
> > >> >> >
> > >> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <
> amansinha@apache.org
> > >
> > >> >> wrote:
> > >> >> >
> > >> >> >> Khurram,  DRILL-4686 seems like a different issue...it is
> > reporting
> > >> an
> > >> >> >> error whereas the original problem from qiang was an incorrect
> > >> result.
> > >> >> >> Can
> > >> >> >> you use the same version (1.6) that he was using.  Also, is the
> > data
> > >> >> set
> > >> >> >> similar ? If you are unable to repro the exact same issue,
> > perhaps
> > >> >> qiang
> > >> >> >> should file a JIRA with a smaller repro if possible.
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <
> > >> kfaraaz@maprtech.com>
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >> > Hello Qiang,
> > >> >> >> >
> > >> >> >> > DRILL-4686 is reported to track this problem.
> > >> >> >> >
> > >> >> >> > Thanks,
> > >> >> >> > Khurram
> > >> >> >> >
> > >> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <
> tiredqiang@gmail.com
> > >
> > >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Ok, Thanks very much.
> > >> >> >> >>
> > >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <
> > kfaraaz@maprtech.com
> > >> >:
> > >> >> >> >>
> > >> >> >> >>> Hello Qiang,
> > >> >> >> >>>
> > >> >> >> >>> Someone from our Drill team (in San Jose) will get back to
> you
> > >> >> soon. I
> > >> >> >> >>> work from the India lab and I am in a different time zone as
> > >> >> compared
> > >> >> >> to
> > >> >> >> >>> San Jose office, some one from MapR San Jose will get back
> to
> > >> you
> > >> >> as
> > >> >> >> soon
> > >> >> >> >>> as possible.
> > >> >> >> >>>
> > >> >> >> >>> Thanks,
> > >> >> >> >>> Khurram
> > >> >> >> >>>
> > >> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <
> > tiredqiang@gmail.com
> > >> >
> > >> >> >> wrote:
> > >> >> >> >>>
> > >> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
> > >> >> >> >>>> conclusion?
> > >> >> >> >>>>
> > >> >> >> >>>> Any idea how to sovle it?
> > >> >> >> >>>>
> > >> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <
> > >> kfaraaz@maprtech.com>:
> > >> >> >> >>>>
> > >> >> >> >>>>> So I tried to create the table using HBase API (with no
> data
> > >> >> >> inserted
> > >> >> >> >>>>> into table) and I got the query plan for drill 1.7.0
> > >> >> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
> > >> >> >> >>>>>
> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> > >> >> >> >>>>> +--------------+------------+--------------+
> > >> >> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> > >> >> >> >>>>> +--------------+------------+--------------+
> > >> >> >> >>>>> | row_key      | ANY        | NO           |
> > >> >> >> >>>>> | v            | MAP        | NO           |
> > >> >> >> >>>>> +--------------+------------+--------------+
> > >> >> >> >>>>> 2 rows selected (1.665 seconds)
> > >> >> >> >>>>>
> > >> >> >> >>>>> Table creation Java program
> > >> >> >> >>>>>
> > >> >> >> >>>>> {noformat}
> > >> >> >> >>>>> public class PutIntDataToHBase {
> > >> >> >> >>>>>     public static void main(String args[]) throws
> > IOException
> > >> {
> > >> >> >> >>>>>         Configuration conf = HBaseConfiguration.create();
> > >> >> >> >>>>>
> > >>  conf.set("hbase.zookeeper.property.clientPort","5181");
> > >> >> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
> > >> >> >> >>>>>         if (admin.tableExists("browser_action2")) {
> > >> >> >> >>>>>             admin.disableTable("browser_action2");
> > >> >> >> >>>>>             admin.deleteTable("browser_action2");
> > >> >> >> >>>>>         }
> > >> >> >> >>>>>
> > >> >> >> >>>>>         byte[][] SPLIT_KEYS =
> > >> >> >> >>>>>
> > {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
> > >> >> >> >>>>>         HTableDescriptor tableDesc = new
> > >> >> >> >>>>>
> > >> >>  HTableDescriptor(TableName.valueOf("browser_action2"));
> > >> >> >> >>>>>
> > >> >> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
> > >> >> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
> > >> >> >> >>>>>
> > >> >> >> >>>>>     }
> > >> >> >> >>>>> }
> > >> >> >> >>>>> {noformat}
> > >> >> >> >>>>>
> > >> >> >> >>>>> Query plan for the query that was reported as returning
> > wrong
> > >> >> >> results.
> > >> >> >> >>>>>
> > >> >> >> >>>>> {noformat}
> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> > >> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p
> > >> from
> > >> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
> > >> >> a.`v`.`e0`;
> > >> >> >> >>>>> +------+------+
> > >> >> >> >>>>> | text | json |
> > >> >> >> >>>>> +------+------+
> > >> >> >> >>>>> | 00-00    Screen
> > >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> > >> >> >> >>>>> 00-02        UnionExchange
> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> > >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
> > >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> > >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
> > p=[COUNT($0)])
> > >> >> >> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
> > >> >> >> >>>>> 03-04
> > Scan(groupscan=[HBaseGroupScan
> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> > >> >> >> startRow=0\x00,
> > >> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
> > >> >> >> >>>>> {noformat}
> > >> >> >> >>>>>
> > >> >> >> >>>>> and the query plan for the other problem query mentioned
> in
> > >> the
> > >> >> >> first
> > >> >> >> >>>>> email.
> > >> >> >> >>>>>
> > >> >> >> >>>>> {noformat}
> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> > >> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> > >> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
> > >> >> >> >>>>> hbase.browser_action2 a group by
> > >> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 ,
> 9);
> > >> >> >> >>>>> +------+------+
> > >> >> >> >>>>> | text | json |
> > >> >> >> >>>>> +------+------+
> > >> >> >> >>>>> | 00-00    Screen
> > >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> > >> >> >> >>>>> 00-02        UnionExchange
> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> > >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
> > >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> > >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
> > p=[COUNT($1)])
> > >> >> >> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0,
> 1,
> > >> 9)],
> > >> >> >> >>>>> row_key=[$0])
> > >> >> >> >>>>> 03-04
> > Scan(groupscan=[HBaseGroupScan
> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> > >> >> >> startRow=null,
> > >> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
> > >> >> >> >>>>> {noformat}
> > >> >> >> >>>>>
> > >> >> >> >>>>> Thanks,
> > >> >> >> >>>>> Khurram
> > >> >> >> >>>>>
> > >> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <
> > >> tiredqiang@gmail.com>
> > >> >> >> >>>>> wrote:
> > >> >> >> >>>>>
> > >> >> >> >>>>>> Yes.
> > >> >> >> >>>>>> I use hbase API to create it.
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> The main code is:
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'},
> {'4'},
> > >> >> {'5'},
> > >> >> >> {'6'}, {'7'},{'8'}, {'9'} };
> > >> >> >> >>>>>> TableName tableName =
> TableName.valueOf("browser_action2");
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> HTableDescriptor tableDesc = new
> > HTableDescriptor(tableName);
> > >> >> >> >>>>>> HColumnDescriptor columnDesc = new
> HColumnDescriptor("v");
> > >> >> >> >>>>>> tableDesc.addFamily(columnDesc);
> > >> >> >> >>>>>>
> > >> >> >> >>>>>>
> > columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> > >> >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
> > >> >> >> >>>>>>
> > >> >> >> >>>>>>
> > >> >> >> >>>>>>
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <
> zfong@maprtech.com
> > >:
> > >> >> >> >>>>>>
> > >> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
> > >> >> reproduce
> > >> >> >> this
> > >> >> >> >>>>>>> problem so we can try to reproduce it on our end.
> > >> >> >> >>>>>>>
> > >> >> >> >>>>>>> Thanks.
> > >> >> >> >>>>>>>
> > >> >> >> >>>>>>> -- Zelaine
> > >> >> >> >>>>>>>
> > >> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
> > >> >> tiredqiang@gmail.com>
> > >> >> >> >>>>>>> wrote:
> > >> >> >> >>>>>>>
> > >> >> >> >>>>>>> > Hi ,
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > I recently meet a issue that can not query the correct
> > >> data
> > >> >> from
> > >> >> >> >>>>>>> hbase with
> > >> >> >> >>>>>>> > sql by drill, can anybody help me.
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > I test with the drill 1.6.
> > >> >> >> >>>>>>> > My hbase scheme:
> > >> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg:
> > 120160411visituidts
> > >> >> >> >>>>>>> > cf : v
> > >> >> >> >>>>>>> > qualifier: v, e0, e1
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > The wrong result only happened when I use group by
> > clause.
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > This sql will not return correct result:
> > >> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> > >> >> count(a.`v`.`e0`)
> > >> >> >> p
> > >> >> >> >>>>>>> from
> > >> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group
> by
> > >> >> >> a.`v`.`e0`;
> > >> >> >> >>>>>>> > Part of explain of this sql is:
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
> > >> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
> > >> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from
> > >> hbase.browser_action2
> > >> >> a
> > >> >> >> >>>>>>> where
> > >> >> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
> > >> >> >> >>>>>>> > +------+------+
> > >> >> >> >>>>>>> > | text | json |
> > >> >> >> >>>>>>> > +------+------+
> > >> >> >> >>>>>>> > | 00-00    Screen
> > >> >> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
> > >> >> >> >>>>>>> > 00-02        UnionExchange
> > >> >> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)],
> p=[$1])
> > >> >> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> > >> >> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
> > >> >> >> >>>>>>> > 01-04
> HashToRandomExchange(dist0=[[$0]])
> > >> >> >> >>>>>>> > 02-01                  UnorderedMuxExchange
> > >> >> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
> > >> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> > >> >> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
> > >> >> p=[COUNT($0)])
> > >> >> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1,
> > 'e0')])
> > >> >> >> >>>>>>> > 03-04
> > >> Scan(groupscan=[HBaseGroupScan
> > >> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec
> [tableName=browser_action2,
> > >> >> >> >>>>>>> startRow=0\x00,
> > >> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > The data return very quickly , the result of this sql
> > is :
> > >> >> >> >>>>>>> > +------+--------+
> > >> >> >> >>>>>>> > |  k   |   p    |
> > >> >> >> >>>>>>> > +------+--------+
> > >> >> >> >>>>>>> > | pay  | 12180  |
> > >> >> >> >>>>>>> > +------+--------
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > But I have millons of data in the table.
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > I tried to change the physical plan.  if I change the
> > json
> > >> >> >> explain
> > >> >> >> >>>>>>> > *"columns"
> > >> >> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it
> will
> > >> >> return
> > >> >> >> the
> > >> >> >> >>>>>>> correct
> > >> >> >> >>>>>>> > result.
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > It seems the physical plan is not correct.
> > >> >> >> >>>>>>> > I also try to debug the sql parser to find out the
> > reason,
> > >> >> but
> > >> >> >> its
> > >> >> >> >>>>>>> too
> > >> >> >> >>>>>>> > complicate. Can anyone help me.
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>> > Also this sql have the same issue.
> > >> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9),
> > 'UTF8')
> > >> >> as
> > >> >> >> k,
> > >> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group
> by
> > >> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
> > >> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
> > >> >> >> *"columns" :
> > >> >> >> >>>>>>> [
> > >> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
> > >> >> >> >>>>>>> >
> > >> >> >> >>>>>>>
> > >> >> >> >>>>>>
> > >> >> >> >>>>>>
> > >> >> >> >>>>>
> > >> >> >> >>>>
> > >> >> >> >>>
> > >> >> >> >>
> > >> >> >> >
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: query from hbase issue

Posted by Krystal Nguyen <kn...@maprtech.com>.
Hi Qiang,

Looks like you might be encountering this issue:
https://issues.apache.org/jira/browse/DRILL-4271

Thanks

On Sun, May 22, 2016 at 8:38 PM, qiang li <ti...@gmail.com> wrote:

> I test it step by step again. And finally I find out that the issue
> happened only if the qualifier number is more than 3.
>
> It's werid, but this is the result I test.
>
> I tested about 10 thousands row of data. The length of the event is 6,the
> code I used to test is like below:
>
> String[] earr = action.getEvent().geteArr();
> for(int i=0;i<6;i++){
>     put.addColumn(family, Bytes.toBytes("e"+i),  Bytes.toBytes(earr[i]));
> }
>
> Then I test step by step like below :
>
> put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
> put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
> put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
> put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));
>
> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
>  group by a.`v`.`e0`;
> +-----+-------+
> |  k  |   p   |
> +-----+-------+
> | e0  | 3856  |
> +-----+-------+
>
> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
> +-----+-------+
> |  k  |   p   |
> +-----+-------+
> | e0  | 9094  |
> +-----+-------+
>
>
> if I put the qualifier "e3" into the table(that is more than 3 qulifier),
> then the issue reproduced.
>
>
> 2016-05-23 9:23 GMT+08:00 qiang li <ti...@gmail.com>:
>
> > Sorry late.
> >
> > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
> >
> > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kn...@maprtech.com>:
> >
> >> Qiang, Can you please let us know the hbase version and hadoop
> >> distribution
> >> version that you are using.
> >>
> >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <kn...@maprtech.com>
> >> wrote:
> >>
> >> > Can you please let us know the hbase version and hadoop distribution
> >> > version that you are using.
> >> >
> >> >
> >> > On Fri, May 20, 2016 at 1:35 AM, qiang li <ti...@gmail.com>
> wrote:
> >> >
> >> >> Khurram , I send the mail again, the last mail forget to cc to
> >> >> user@drill.apache.org
> >> >>
> >> >> The main process is the same, but my rowkey is more complicate,
> >> >> Here is the detail I tested.
> >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
> >> [event] +
> >> >> [uid long] + [ts long]
> >> >> also I have other qualifiers, only qualifier v:v is integer, the
> others
> >> >> are
> >> >> string.
> >> >>
> >> >> example:
> >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
> >> >> ROW
> >> COLUMN+CELL
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e0,
> >> >> timestamp=1461839343076, value=pay
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e1,
> >> >> timestamp=1461839343076, value=bijia
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e2,
> >> >> timestamp=1461839343076, value=browser
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e3,
> >> >> timestamp=1461839343076, value=*
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e4,
> >> >> timestamp=1461839343076, value=*
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:e5,
> >> >> timestamp=1461839343076, value=*
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:h,
> >> >> timestamp=1459771200000, value=20
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:m,
> >> >> timestamp=1459771200000, value=0
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> >> column=v:v,
> >> >> timestamp=1459771200000, value=\x00\x00\x00\x17
> >> >>
> >> >>
> >> >>  1$\xD2\x00
> >> >>
> >> >>
> >> >>
> >> >> 1 row(s) in 0.0410 seconds
> >> >>
> >> >>
> >> >> Here is the example I how the issue look like:
> >> >>
> >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
> >> >> STARTROW=> '0'}
> >> >> ........
> >> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
> >> column=v:e0,
> >> >> timestamp=1463723029448, value=visit
> >> >>
> >> >>
> >> >>  x01T\x00\x0A\xFA\x00
> >> >>
> >> >>
> >> >>
> >> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
> >> column=v:e0,
> >> >> timestamp=1463723029217, value=visit
> >> >>
> >> >>
> >> >>  x01T\x00\x0A\xFA\x00
> >> >>
> >> >>
> >> >>
> >> >>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
> >> column=v:e0,
> >> >> timestamp=1463723029295, value=visit
> >> >>
> >> >>
> >> >>  x01T\x00\x0A\xFA\x00
> >> >>
> >> >>
> >> >>
> >> >> 9994 row(s) in 123.8650 seconds
> >> >>
> >> >> the drill result:
> >> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> >> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key >
> '0'
> >> >>  group by a.`v`.`e0`;
> >> >> +--------+-------+
> >> >> |   k    |   p   |
> >> >> +--------+-------+
> >> >> | visit  | 1216  |
> >> >> +--------+-------+
> >> >>
> >> >>
> >> >> I find out that if the row size larger than 10000  will have the
> issue.
> >> >> The
> >> >> result is right if less that 1000 rows. But not always that way.
> >> >> What I can make sure is if I updated the columns in the physical plan
> >> and
> >> >> query by web UI , the result will be correct.
> >> >>
> >> >>
> >> >> Thanks
> >> >>
> >> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
> >> >>
> >> >> > Qiang, can you please take a look at DRILL-4686 and confirm if the
> >> data
> >> >> > set used in my repro is the same as the one you have used. If the
> >> data
> >> >> set
> >> >> > is different please let us know the type of data that you have used
> >> in
> >> >> your
> >> >> > table.
> >> >> >
> >> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share
> >> results.
> >> >> >
> >> >> > Thanks,
> >> >> > Khurram
> >> >> >
> >> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <amansinha@apache.org
> >
> >> >> wrote:
> >> >> >
> >> >> >> Khurram,  DRILL-4686 seems like a different issue...it is
> reporting
> >> an
> >> >> >> error whereas the original problem from qiang was an incorrect
> >> result.
> >> >> >> Can
> >> >> >> you use the same version (1.6) that he was using.  Also, is the
> data
> >> >> set
> >> >> >> similar ? If you are unable to repro the exact same issue,
> perhaps
> >> >> qiang
> >> >> >> should file a JIRA with a smaller repro if possible.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <
> >> kfaraaz@maprtech.com>
> >> >> >> wrote:
> >> >> >>
> >> >> >> > Hello Qiang,
> >> >> >> >
> >> >> >> > DRILL-4686 is reported to track this problem.
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Khurram
> >> >> >> >
> >> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <tiredqiang@gmail.com
> >
> >> >> wrote:
> >> >> >> >
> >> >> >> >> Ok, Thanks very much.
> >> >> >> >>
> >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <
> kfaraaz@maprtech.com
> >> >:
> >> >> >> >>
> >> >> >> >>> Hello Qiang,
> >> >> >> >>>
> >> >> >> >>> Someone from our Drill team (in San Jose) will get back to you
> >> >> soon. I
> >> >> >> >>> work from the India lab and I am in a different time zone as
> >> >> compared
> >> >> >> to
> >> >> >> >>> San Jose office, some one from MapR San Jose will get back to
> >> you
> >> >> as
> >> >> >> soon
> >> >> >> >>> as possible.
> >> >> >> >>>
> >> >> >> >>> Thanks,
> >> >> >> >>> Khurram
> >> >> >> >>>
> >> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <
> tiredqiang@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >> >>>
> >> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
> >> >> >> >>>> conclusion?
> >> >> >> >>>>
> >> >> >> >>>> Any idea how to sovle it?
> >> >> >> >>>>
> >> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <
> >> kfaraaz@maprtech.com>:
> >> >> >> >>>>
> >> >> >> >>>>> So I tried to create the table using HBase API (with no data
> >> >> >> inserted
> >> >> >> >>>>> into table) and I got the query plan for drill 1.7.0
> >> >> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
> >> >> >> >>>>>
> >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> >> >> >> >>>>> +--------------+------------+--------------+
> >> >> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> >> >> >> >>>>> +--------------+------------+--------------+
> >> >> >> >>>>> | row_key      | ANY        | NO           |
> >> >> >> >>>>> | v            | MAP        | NO           |
> >> >> >> >>>>> +--------------+------------+--------------+
> >> >> >> >>>>> 2 rows selected (1.665 seconds)
> >> >> >> >>>>>
> >> >> >> >>>>> Table creation Java program
> >> >> >> >>>>>
> >> >> >> >>>>> {noformat}
> >> >> >> >>>>> public class PutIntDataToHBase {
> >> >> >> >>>>>     public static void main(String args[]) throws
> IOException
> >> {
> >> >> >> >>>>>         Configuration conf = HBaseConfiguration.create();
> >> >> >> >>>>>
> >>  conf.set("hbase.zookeeper.property.clientPort","5181");
> >> >> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
> >> >> >> >>>>>         if (admin.tableExists("browser_action2")) {
> >> >> >> >>>>>             admin.disableTable("browser_action2");
> >> >> >> >>>>>             admin.deleteTable("browser_action2");
> >> >> >> >>>>>         }
> >> >> >> >>>>>
> >> >> >> >>>>>         byte[][] SPLIT_KEYS =
> >> >> >> >>>>>
> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
> >> >> >> >>>>>         HTableDescriptor tableDesc = new
> >> >> >> >>>>>
> >> >>  HTableDescriptor(TableName.valueOf("browser_action2"));
> >> >> >> >>>>>
> >> >> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
> >> >> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
> >> >> >> >>>>>
> >> >> >> >>>>>     }
> >> >> >> >>>>> }
> >> >> >> >>>>> {noformat}
> >> >> >> >>>>>
> >> >> >> >>>>> Query plan for the query that was reported as returning
> wrong
> >> >> >> results.
> >> >> >> >>>>>
> >> >> >> >>>>> {noformat}
> >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p
> >> from
> >> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
> >> >> a.`v`.`e0`;
> >> >> >> >>>>> +------+------+
> >> >> >> >>>>> | text | json |
> >> >> >> >>>>> +------+------+
> >> >> >> >>>>> | 00-00    Screen
> >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >> >> >>>>> 00-02        UnionExchange
> >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
> p=[COUNT($0)])
> >> >> >> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
> >> >> >> >>>>> 03-04
> Scan(groupscan=[HBaseGroupScan
> >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >> >> startRow=0\x00,
> >> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
> >> >> >> >>>>> {noformat}
> >> >> >> >>>>>
> >> >> >> >>>>> and the query plan for the other problem query mentioned in
> >> the
> >> >> >> first
> >> >> >> >>>>> email.
> >> >> >> >>>>>
> >> >> >> >>>>> {noformat}
> >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> >> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
> >> >> >> >>>>> hbase.browser_action2 a group by
> >> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >> >> >>>>> +------+------+
> >> >> >> >>>>> | text | json |
> >> >> >> >>>>> +------+------+
> >> >> >> >>>>> | 00-00    Screen
> >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >> >> >>>>> 00-02        UnionExchange
> >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
> p=[COUNT($1)])
> >> >> >> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1,
> >> 9)],
> >> >> >> >>>>> row_key=[$0])
> >> >> >> >>>>> 03-04
> Scan(groupscan=[HBaseGroupScan
> >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >> >> startRow=null,
> >> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
> >> >> >> >>>>> {noformat}
> >> >> >> >>>>>
> >> >> >> >>>>> Thanks,
> >> >> >> >>>>> Khurram
> >> >> >> >>>>>
> >> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <
> >> tiredqiang@gmail.com>
> >> >> >> >>>>> wrote:
> >> >> >> >>>>>
> >> >> >> >>>>>> Yes.
> >> >> >> >>>>>> I use hbase API to create it.
> >> >> >> >>>>>>
> >> >> >> >>>>>> The main code is:
> >> >> >> >>>>>>
> >> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'},
> >> >> {'5'},
> >> >> >> {'6'}, {'7'},{'8'}, {'9'} };
> >> >> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
> >> >> >> >>>>>>
> >> >> >> >>>>>> HTableDescriptor tableDesc = new
> HTableDescriptor(tableName);
> >> >> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> >> >> >> >>>>>> tableDesc.addFamily(columnDesc);
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> >> >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
> >> >> >> >>>>>>
> >> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zfong@maprtech.com
> >:
> >> >> >> >>>>>>
> >> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
> >> >> reproduce
> >> >> >> this
> >> >> >> >>>>>>> problem so we can try to reproduce it on our end.
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> Thanks.
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> -- Zelaine
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
> >> >> tiredqiang@gmail.com>
> >> >> >> >>>>>>> wrote:
> >> >> >> >>>>>>>
> >> >> >> >>>>>>> > Hi ,
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > I recently meet a issue that can not query the correct
> >> data
> >> >> from
> >> >> >> >>>>>>> hbase with
> >> >> >> >>>>>>> > sql by drill, can anybody help me.
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > I test with the drill 1.6.
> >> >> >> >>>>>>> > My hbase scheme:
> >> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg:
> 120160411visituidts
> >> >> >> >>>>>>> > cf : v
> >> >> >> >>>>>>> > qualifier: v, e0, e1
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > The wrong result only happened when I use group by
> clause.
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > This sql will not return correct result:
> >> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> >> >> count(a.`v`.`e0`)
> >> >> >> p
> >> >> >> >>>>>>> from
> >> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
> >> >> >> a.`v`.`e0`;
> >> >> >> >>>>>>> > Part of explain of this sql is:
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
> >> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
> >> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from
> >> hbase.browser_action2
> >> >> a
> >> >> >> >>>>>>> where
> >> >> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
> >> >> >> >>>>>>> > +------+------+
> >> >> >> >>>>>>> > | text | json |
> >> >> >> >>>>>>> > +------+------+
> >> >> >> >>>>>>> > | 00-00    Screen
> >> >> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
> >> >> >> >>>>>>> > 00-02        UnionExchange
> >> >> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
> >> >> >> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >> >> >>>>>>> > 02-01                  UnorderedMuxExchange
> >> >> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
> >> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
> >> >> p=[COUNT($0)])
> >> >> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1,
> 'e0')])
> >> >> >> >>>>>>> > 03-04
> >> Scan(groupscan=[HBaseGroupScan
> >> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >> >> >>>>>>> startRow=0\x00,
> >> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > The data return very quickly , the result of this sql
> is :
> >> >> >> >>>>>>> > +------+--------+
> >> >> >> >>>>>>> > |  k   |   p    |
> >> >> >> >>>>>>> > +------+--------+
> >> >> >> >>>>>>> > | pay  | 12180  |
> >> >> >> >>>>>>> > +------+--------
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > But I have millons of data in the table.
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > I tried to change the physical plan.  if I change the
> json
> >> >> >> explain
> >> >> >> >>>>>>> > *"columns"
> >> >> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will
> >> >> return
> >> >> >> the
> >> >> >> >>>>>>> correct
> >> >> >> >>>>>>> > result.
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > It seems the physical plan is not correct.
> >> >> >> >>>>>>> > I also try to debug the sql parser to find out the
> reason,
> >> >> but
> >> >> >> its
> >> >> >> >>>>>>> too
> >> >> >> >>>>>>> > complicate. Can anyone help me.
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>> > Also this sql have the same issue.
> >> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9),
> 'UTF8')
> >> >> as
> >> >> >> k,
> >> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
> >> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
> >> >> >> *"columns" :
> >> >> >> >>>>>>> [
> >> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
> >> >> >> >>>>>>> >
> >> >> >> >>>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>>
> >> >> >> >>>>>
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: query from hbase issue

Posted by qiang li <ti...@gmail.com>.
I test it step by step again. And finally I find out that the issue
happened only if the qualifier number is more than 3.

It's werid, but this is the result I test.

I tested about 10 thousands row of data. The length of the event is 6,the
code I used to test is like below:

String[] earr = action.getEvent().geteArr();
for(int i=0;i<6;i++){
    put.addColumn(family, Bytes.toBytes("e"+i),  Bytes.toBytes(earr[i]));
}

Then I test step by step like below :

put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));

0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
 group by a.`v`.`e0`;
+-----+-------+
|  k  |   p   |
+-----+-------+
| e0  | 3856  |
+-----+-------+

0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
+-----+-------+
|  k  |   p   |
+-----+-------+
| e0  | 9094  |
+-----+-------+


if I put the qualifier "e3" into the table(that is more than 3 qulifier),
then the issue reproduced.


2016-05-23 9:23 GMT+08:00 qiang li <ti...@gmail.com>:

> Sorry late.
>
> Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
>
> 2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kn...@maprtech.com>:
>
>> Qiang, Can you please let us know the hbase version and hadoop
>> distribution
>> version that you are using.
>>
>> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <kn...@maprtech.com>
>> wrote:
>>
>> > Can you please let us know the hbase version and hadoop distribution
>> > version that you are using.
>> >
>> >
>> > On Fri, May 20, 2016 at 1:35 AM, qiang li <ti...@gmail.com> wrote:
>> >
>> >> Khurram , I send the mail again, the last mail forget to cc to
>> >> user@drill.apache.org
>> >>
>> >> The main process is the same, but my rowkey is more complicate,
>> >> Here is the detail I tested.
>> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
>> [event] +
>> >> [uid long] + [ts long]
>> >> also I have other qualifiers, only qualifier v:v is integer, the others
>> >> are
>> >> string.
>> >>
>> >> example:
>> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
>> >> ROW
>> COLUMN+CELL
>> >>
>> >>
>> >>
>> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> column=v:e0,
>> >> timestamp=1461839343076, value=pay
>> >>
>> >>
>> >>  1$\xD2\x00
>> >>
>> >>
>> >>
>> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> column=v:e1,
>> >> timestamp=1461839343076, value=bijia
>> >>
>> >>
>> >>  1$\xD2\x00
>> >>
>> >>
>> >>
>> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> column=v:e2,
>> >> timestamp=1461839343076, value=browser
>> >>
>> >>
>> >>  1$\xD2\x00
>> >>
>> >>
>> >>
>> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> column=v:e3,
>> >> timestamp=1461839343076, value=*
>> >>
>> >>
>> >>  1$\xD2\x00
>> >>
>> >>
>> >>
>> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> column=v:e4,
>> >> timestamp=1461839343076, value=*
>> >>
>> >>
>> >>  1$\xD2\x00
>> >>
>> >>
>> >>
>> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> column=v:e5,
>> >> timestamp=1461839343076, value=*
>> >>
>> >>
>> >>  1$\xD2\x00
>> >>
>> >>
>> >>
>> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> column=v:h,
>> >> timestamp=1459771200000, value=20
>> >>
>> >>
>> >>  1$\xD2\x00
>> >>
>> >>
>> >>
>> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> column=v:m,
>> >> timestamp=1459771200000, value=0
>> >>
>> >>
>> >>  1$\xD2\x00
>> >>
>> >>
>> >>
>> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> column=v:v,
>> >> timestamp=1459771200000, value=\x00\x00\x00\x17
>> >>
>> >>
>> >>  1$\xD2\x00
>> >>
>> >>
>> >>
>> >> 1 row(s) in 0.0410 seconds
>> >>
>> >>
>> >> Here is the example I how the issue look like:
>> >>
>> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
>> >> STARTROW=> '0'}
>> >> ........
>> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
>> column=v:e0,
>> >> timestamp=1463723029448, value=visit
>> >>
>> >>
>> >>  x01T\x00\x0A\xFA\x00
>> >>
>> >>
>> >>
>> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
>> column=v:e0,
>> >> timestamp=1463723029217, value=visit
>> >>
>> >>
>> >>  x01T\x00\x0A\xFA\x00
>> >>
>> >>
>> >>
>> >>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
>> column=v:e0,
>> >> timestamp=1463723029295, value=visit
>> >>
>> >>
>> >>  x01T\x00\x0A\xFA\x00
>> >>
>> >>
>> >>
>> >> 9994 row(s) in 123.8650 seconds
>> >>
>> >> the drill result:
>> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0'
>> >>  group by a.`v`.`e0`;
>> >> +--------+-------+
>> >> |   k    |   p   |
>> >> +--------+-------+
>> >> | visit  | 1216  |
>> >> +--------+-------+
>> >>
>> >>
>> >> I find out that if the row size larger than 10000  will have the issue.
>> >> The
>> >> result is right if less that 1000 rows. But not always that way.
>> >> What I can make sure is if I updated the columns in the physical plan
>> and
>> >> query by web UI , the result will be correct.
>> >>
>> >>
>> >> Thanks
>> >>
>> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>> >>
>> >> > Qiang, can you please take a look at DRILL-4686 and confirm if the
>> data
>> >> > set used in my repro is the same as the one you have used. If the
>> data
>> >> set
>> >> > is different please let us know the type of data that you have used
>> in
>> >> your
>> >> > table.
>> >> >
>> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share
>> results.
>> >> >
>> >> > Thanks,
>> >> > Khurram
>> >> >
>> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <am...@apache.org>
>> >> wrote:
>> >> >
>> >> >> Khurram,  DRILL-4686 seems like a different issue...it is reporting
>> an
>> >> >> error whereas the original problem from qiang was an incorrect
>> result.
>> >> >> Can
>> >> >> you use the same version (1.6) that he was using.  Also, is the data
>> >> set
>> >> >> similar ? If you are unable to repro the exact same issue,  perhaps
>> >> qiang
>> >> >> should file a JIRA with a smaller repro if possible.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <
>> kfaraaz@maprtech.com>
>> >> >> wrote:
>> >> >>
>> >> >> > Hello Qiang,
>> >> >> >
>> >> >> > DRILL-4686 is reported to track this problem.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Khurram
>> >> >> >
>> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <ti...@gmail.com>
>> >> wrote:
>> >> >> >
>> >> >> >> Ok, Thanks very much.
>> >> >> >>
>> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfaraaz@maprtech.com
>> >:
>> >> >> >>
>> >> >> >>> Hello Qiang,
>> >> >> >>>
>> >> >> >>> Someone from our Drill team (in San Jose) will get back to you
>> >> soon. I
>> >> >> >>> work from the India lab and I am in a different time zone as
>> >> compared
>> >> >> to
>> >> >> >>> San Jose office, some one from MapR San Jose will get back to
>> you
>> >> as
>> >> >> soon
>> >> >> >>> as possible.
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>> Khurram
>> >> >> >>>
>> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqiang@gmail.com
>> >
>> >> >> wrote:
>> >> >> >>>
>> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
>> >> >> >>>> conclusion?
>> >> >> >>>>
>> >> >> >>>> Any idea how to sovle it?
>> >> >> >>>>
>> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <
>> kfaraaz@maprtech.com>:
>> >> >> >>>>
>> >> >> >>>>> So I tried to create the table using HBase API (with no data
>> >> >> inserted
>> >> >> >>>>> into table) and I got the query plan for drill 1.7.0
>> >> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>> >> >> >>>>>
>> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> >> >> >>>>> +--------------+------------+--------------+
>> >> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>> >> >> >>>>> +--------------+------------+--------------+
>> >> >> >>>>> | row_key      | ANY        | NO           |
>> >> >> >>>>> | v            | MAP        | NO           |
>> >> >> >>>>> +--------------+------------+--------------+
>> >> >> >>>>> 2 rows selected (1.665 seconds)
>> >> >> >>>>>
>> >> >> >>>>> Table creation Java program
>> >> >> >>>>>
>> >> >> >>>>> {noformat}
>> >> >> >>>>> public class PutIntDataToHBase {
>> >> >> >>>>>     public static void main(String args[]) throws IOException
>> {
>> >> >> >>>>>         Configuration conf = HBaseConfiguration.create();
>> >> >> >>>>>
>>  conf.set("hbase.zookeeper.property.clientPort","5181");
>> >> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>> >> >> >>>>>         if (admin.tableExists("browser_action2")) {
>> >> >> >>>>>             admin.disableTable("browser_action2");
>> >> >> >>>>>             admin.deleteTable("browser_action2");
>> >> >> >>>>>         }
>> >> >> >>>>>
>> >> >> >>>>>         byte[][] SPLIT_KEYS =
>> >> >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>> >> >> >>>>>         HTableDescriptor tableDesc = new
>> >> >> >>>>>
>> >>  HTableDescriptor(TableName.valueOf("browser_action2"));
>> >> >> >>>>>
>> >> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>> >> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>> >> >> >>>>>
>> >> >> >>>>>     }
>> >> >> >>>>> }
>> >> >> >>>>> {noformat}
>> >> >> >>>>>
>> >> >> >>>>> Query plan for the query that was reported as returning wrong
>> >> >> results.
>> >> >> >>>>>
>> >> >> >>>>> {noformat}
>> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p
>> from
>> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
>> >> a.`v`.`e0`;
>> >> >> >>>>> +------+------+
>> >> >> >>>>> | text | json |
>> >> >> >>>>> +------+------+
>> >> >> >>>>> | 00-00    Screen
>> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >> >> >>>>> 00-02        UnionExchange
>> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >> >> >>>>> 02-01                  UnorderedMuxExchange
>> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> >> >> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >> >> startRow=0\x00,
>> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
>> >> >> >>>>> {noformat}
>> >> >> >>>>>
>> >> >> >>>>> and the query plan for the other problem query mentioned in
>> the
>> >> >> first
>> >> >> >>>>> email.
>> >> >> >>>>>
>> >> >> >>>>> {noformat}
>> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> >> >> >>>>> hbase.browser_action2 a group by
>> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >> >> >>>>> +------+------+
>> >> >> >>>>> | text | json |
>> >> >> >>>>> +------+------+
>> >> >> >>>>> | 00-00    Screen
>> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >> >> >>>>> 00-02        UnionExchange
>> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >> >> >>>>> 02-01                  UnorderedMuxExchange
>> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>> >> >> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1,
>> 9)],
>> >> >> >>>>> row_key=[$0])
>> >> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >> >> startRow=null,
>> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
>> >> >> >>>>> {noformat}
>> >> >> >>>>>
>> >> >> >>>>> Thanks,
>> >> >> >>>>> Khurram
>> >> >> >>>>>
>> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <
>> tiredqiang@gmail.com>
>> >> >> >>>>> wrote:
>> >> >> >>>>>
>> >> >> >>>>>> Yes.
>> >> >> >>>>>> I use hbase API to create it.
>> >> >> >>>>>>
>> >> >> >>>>>> The main code is:
>> >> >> >>>>>>
>> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'},
>> >> {'5'},
>> >> >> {'6'}, {'7'},{'8'}, {'9'} };
>> >> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
>> >> >> >>>>>>
>> >> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>> >> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>> >> >> >>>>>> tableDesc.addFamily(columnDesc);
>> >> >> >>>>>>
>> >> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>> >> >> >>>>>>
>> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>> >> >> >>>>>>
>> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
>> >> reproduce
>> >> >> this
>> >> >> >>>>>>> problem so we can try to reproduce it on our end.
>> >> >> >>>>>>>
>> >> >> >>>>>>> Thanks.
>> >> >> >>>>>>>
>> >> >> >>>>>>> -- Zelaine
>> >> >> >>>>>>>
>> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
>> >> tiredqiang@gmail.com>
>> >> >> >>>>>>> wrote:
>> >> >> >>>>>>>
>> >> >> >>>>>>> > Hi ,
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > I recently meet a issue that can not query the correct
>> data
>> >> from
>> >> >> >>>>>>> hbase with
>> >> >> >>>>>>> > sql by drill, can anybody help me.
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > I test with the drill 1.6.
>> >> >> >>>>>>> > My hbase scheme:
>> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>> >> >> >>>>>>> > cf : v
>> >> >> >>>>>>> > qualifier: v, e0, e1
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > The wrong result only happened when I use group by clause.
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > This sql will not return correct result:
>> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> >> count(a.`v`.`e0`)
>> >> >> p
>> >> >> >>>>>>> from
>> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
>> >> >> a.`v`.`e0`;
>> >> >> >>>>>>> > Part of explain of this sql is:
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from
>> hbase.browser_action2
>> >> a
>> >> >> >>>>>>> where
>> >> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>> >> >> >>>>>>> > +------+------+
>> >> >> >>>>>>> > | text | json |
>> >> >> >>>>>>> > +------+------+
>> >> >> >>>>>>> > | 00-00    Screen
>> >> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
>> >> >> >>>>>>> > 00-02        UnionExchange
>> >> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>> >> >> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>> >> >> >>>>>>> > 02-01                  UnorderedMuxExchange
>> >> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
>> >> p=[COUNT($0)])
>> >> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >> >> >>>>>>> > 03-04
>> Scan(groupscan=[HBaseGroupScan
>> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >> >> >>>>>>> startRow=0\x00,
>> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > The data return very quickly , the result of this sql is :
>> >> >> >>>>>>> > +------+--------+
>> >> >> >>>>>>> > |  k   |   p    |
>> >> >> >>>>>>> > +------+--------+
>> >> >> >>>>>>> > | pay  | 12180  |
>> >> >> >>>>>>> > +------+--------
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > But I have millons of data in the table.
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > I tried to change the physical plan.  if I change the json
>> >> >> explain
>> >> >> >>>>>>> > *"columns"
>> >> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will
>> >> return
>> >> >> the
>> >> >> >>>>>>> correct
>> >> >> >>>>>>> > result.
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > It seems the physical plan is not correct.
>> >> >> >>>>>>> > I also try to debug the sql parser to find out the reason,
>> >> but
>> >> >> its
>> >> >> >>>>>>> too
>> >> >> >>>>>>> > complicate. Can anyone help me.
>> >> >> >>>>>>> >
>> >> >> >>>>>>> > Also this sql have the same issue.
>> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')
>> >> as
>> >> >> k,
>> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
>> >> >> *"columns" :
>> >> >> >>>>>>> [
>> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
>> >> >> >>>>>>> >
>> >> >> >>>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: query from hbase issue

Posted by qiang li <ti...@gmail.com>.
Sorry late.

Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.

2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kn...@maprtech.com>:

> Qiang, Can you please let us know the hbase version and hadoop distribution
> version that you are using.
>
> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <kn...@maprtech.com>
> wrote:
>
> > Can you please let us know the hbase version and hadoop distribution
> > version that you are using.
> >
> >
> > On Fri, May 20, 2016 at 1:35 AM, qiang li <ti...@gmail.com> wrote:
> >
> >> Khurram , I send the mail again, the last mail forget to cc to
> >> user@drill.apache.org
> >>
> >> The main process is the same, but my rowkey is more complicate,
> >> Here is the detail I tested.
> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
> [event] +
> >> [uid long] + [ts long]
> >> also I have other qualifiers, only qualifier v:v is integer, the others
> >> are
> >> string.
> >>
> >> example:
> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
> >> ROW                                                          COLUMN+CELL
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e0,
> >> timestamp=1461839343076, value=pay
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e1,
> >> timestamp=1461839343076, value=bijia
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e2,
> >> timestamp=1461839343076, value=browser
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e3,
> >> timestamp=1461839343076, value=*
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e4,
> >> timestamp=1461839343076, value=*
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e5,
> >> timestamp=1461839343076, value=*
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h,
> >> timestamp=1459771200000, value=20
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m,
> >> timestamp=1459771200000, value=0
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v,
> >> timestamp=1459771200000, value=\x00\x00\x00\x17
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >> 1 row(s) in 0.0410 seconds
> >>
> >>
> >> Here is the example I how the issue look like:
> >>
> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
> >> STARTROW=> '0'}
> >> ........
> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
> column=v:e0,
> >> timestamp=1463723029448, value=visit
> >>
> >>
> >>  x01T\x00\x0A\xFA\x00
> >>
> >>
> >>
> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
> column=v:e0,
> >> timestamp=1463723029217, value=visit
> >>
> >>
> >>  x01T\x00\x0A\xFA\x00
> >>
> >>
> >>
> >>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
> column=v:e0,
> >> timestamp=1463723029295, value=visit
> >>
> >>
> >>  x01T\x00\x0A\xFA\x00
> >>
> >>
> >>
> >> 9994 row(s) in 123.8650 seconds
> >>
> >> the drill result:
> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0'
> >>  group by a.`v`.`e0`;
> >> +--------+-------+
> >> |   k    |   p   |
> >> +--------+-------+
> >> | visit  | 1216  |
> >> +--------+-------+
> >>
> >>
> >> I find out that if the row size larger than 10000  will have the issue.
> >> The
> >> result is right if less that 1000 rows. But not always that way.
> >> What I can make sure is if I updated the columns in the physical plan
> and
> >> query by web UI , the result will be correct.
> >>
> >>
> >> Thanks
> >>
> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
> >>
> >> > Qiang, can you please take a look at DRILL-4686 and confirm if the
> data
> >> > set used in my repro is the same as the one you have used. If the data
> >> set
> >> > is different please let us know the type of data that you have used in
> >> your
> >> > table.
> >> >
> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share
> results.
> >> >
> >> > Thanks,
> >> > Khurram
> >> >
> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <am...@apache.org>
> >> wrote:
> >> >
> >> >> Khurram,  DRILL-4686 seems like a different issue...it is reporting
> an
> >> >> error whereas the original problem from qiang was an incorrect
> result.
> >> >> Can
> >> >> you use the same version (1.6) that he was using.  Also, is the data
> >> set
> >> >> similar ? If you are unable to repro the exact same issue,  perhaps
> >> qiang
> >> >> should file a JIRA with a smaller repro if possible.
> >> >>
> >> >>
> >> >>
> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <
> kfaraaz@maprtech.com>
> >> >> wrote:
> >> >>
> >> >> > Hello Qiang,
> >> >> >
> >> >> > DRILL-4686 is reported to track this problem.
> >> >> >
> >> >> > Thanks,
> >> >> > Khurram
> >> >> >
> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <ti...@gmail.com>
> >> wrote:
> >> >> >
> >> >> >> Ok, Thanks very much.
> >> >> >>
> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
> >> >> >>
> >> >> >>> Hello Qiang,
> >> >> >>>
> >> >> >>> Someone from our Drill team (in San Jose) will get back to you
> >> soon. I
> >> >> >>> work from the India lab and I am in a different time zone as
> >> compared
> >> >> to
> >> >> >>> San Jose office, some one from MapR San Jose will get back to you
> >> as
> >> >> soon
> >> >> >>> as possible.
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Khurram
> >> >> >>>
> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <ti...@gmail.com>
> >> >> wrote:
> >> >> >>>
> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
> >> >> >>>> conclusion?
> >> >> >>>>
> >> >> >>>> Any idea how to sovle it?
> >> >> >>>>
> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfaraaz@maprtech.com
> >:
> >> >> >>>>
> >> >> >>>>> So I tried to create the table using HBase API (with no data
> >> >> inserted
> >> >> >>>>> into table) and I got the query plan for drill 1.7.0
> >> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
> >> >> >>>>>
> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> >> >> >>>>> +--------------+------------+--------------+
> >> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> >> >> >>>>> +--------------+------------+--------------+
> >> >> >>>>> | row_key      | ANY        | NO           |
> >> >> >>>>> | v            | MAP        | NO           |
> >> >> >>>>> +--------------+------------+--------------+
> >> >> >>>>> 2 rows selected (1.665 seconds)
> >> >> >>>>>
> >> >> >>>>> Table creation Java program
> >> >> >>>>>
> >> >> >>>>> {noformat}
> >> >> >>>>> public class PutIntDataToHBase {
> >> >> >>>>>     public static void main(String args[]) throws IOException {
> >> >> >>>>>         Configuration conf = HBaseConfiguration.create();
> >> >> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
> >> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
> >> >> >>>>>         if (admin.tableExists("browser_action2")) {
> >> >> >>>>>             admin.disableTable("browser_action2");
> >> >> >>>>>             admin.deleteTable("browser_action2");
> >> >> >>>>>         }
> >> >> >>>>>
> >> >> >>>>>         byte[][] SPLIT_KEYS =
> >> >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
> >> >> >>>>>         HTableDescriptor tableDesc = new
> >> >> >>>>>
> >>  HTableDescriptor(TableName.valueOf("browser_action2"));
> >> >> >>>>>
> >> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
> >> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
> >> >> >>>>>
> >> >> >>>>>     }
> >> >> >>>>> }
> >> >> >>>>> {noformat}
> >> >> >>>>>
> >> >> >>>>> Query plan for the query that was reported as returning wrong
> >> >> results.
> >> >> >>>>>
> >> >> >>>>> {noformat}
> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
> >> a.`v`.`e0`;
> >> >> >>>>> +------+------+
> >> >> >>>>> | text | json |
> >> >> >>>>> +------+------+
> >> >> >>>>> | 00-00    Screen
> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >> >>>>> 00-02        UnionExchange
> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >> >> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
> >> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >> startRow=0\x00,
> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
> >> >> >>>>> {noformat}
> >> >> >>>>>
> >> >> >>>>> and the query plan for the other problem query mentioned in the
> >> >> first
> >> >> >>>>> email.
> >> >> >>>>>
> >> >> >>>>> {noformat}
> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
> >> >> >>>>> hbase.browser_action2 a group by
> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >> >>>>> +------+------+
> >> >> >>>>> | text | json |
> >> >> >>>>> +------+------+
> >> >> >>>>> | 00-00    Screen
> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >> >>>>> 00-02        UnionExchange
> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
> >> >> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1,
> 9)],
> >> >> >>>>> row_key=[$0])
> >> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >> startRow=null,
> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
> >> >> >>>>> {noformat}
> >> >> >>>>>
> >> >> >>>>> Thanks,
> >> >> >>>>> Khurram
> >> >> >>>>>
> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <
> tiredqiang@gmail.com>
> >> >> >>>>> wrote:
> >> >> >>>>>
> >> >> >>>>>> Yes.
> >> >> >>>>>> I use hbase API to create it.
> >> >> >>>>>>
> >> >> >>>>>> The main code is:
> >> >> >>>>>>
> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'},
> >> {'5'},
> >> >> {'6'}, {'7'},{'8'}, {'9'} };
> >> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
> >> >> >>>>>>
> >> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> >> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> >> >> >>>>>> tableDesc.addFamily(columnDesc);
> >> >> >>>>>>
> >> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
> >> >> >>>>>>
> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
> >> >> >>>>>>
> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
> >> reproduce
> >> >> this
> >> >> >>>>>>> problem so we can try to reproduce it on our end.
> >> >> >>>>>>>
> >> >> >>>>>>> Thanks.
> >> >> >>>>>>>
> >> >> >>>>>>> -- Zelaine
> >> >> >>>>>>>
> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
> >> tiredqiang@gmail.com>
> >> >> >>>>>>> wrote:
> >> >> >>>>>>>
> >> >> >>>>>>> > Hi ,
> >> >> >>>>>>> >
> >> >> >>>>>>> > I recently meet a issue that can not query the correct data
> >> from
> >> >> >>>>>>> hbase with
> >> >> >>>>>>> > sql by drill, can anybody help me.
> >> >> >>>>>>> >
> >> >> >>>>>>> > I test with the drill 1.6.
> >> >> >>>>>>> > My hbase scheme:
> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
> >> >> >>>>>>> > cf : v
> >> >> >>>>>>> > qualifier: v, e0, e1
> >> >> >>>>>>> >
> >> >> >>>>>>> > The wrong result only happened when I use group by clause.
> >> >> >>>>>>> >
> >> >> >>>>>>> > This sql will not return correct result:
> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> >> count(a.`v`.`e0`)
> >> >> p
> >> >> >>>>>>> from
> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
> >> >> a.`v`.`e0`;
> >> >> >>>>>>> > Part of explain of this sql is:
> >> >> >>>>>>> >
> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from
> hbase.browser_action2
> >> a
> >> >> >>>>>>> where
> >> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
> >> >> >>>>>>> > +------+------+
> >> >> >>>>>>> > | text | json |
> >> >> >>>>>>> > +------+------+
> >> >> >>>>>>> > | 00-00    Screen
> >> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
> >> >> >>>>>>> > 00-02        UnionExchange
> >> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
> >> >> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >> >>>>>>> > 02-01                  UnorderedMuxExchange
> >> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
> >> p=[COUNT($0)])
> >> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
> >> >> >>>>>>> > 03-04
> Scan(groupscan=[HBaseGroupScan
> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >> >>>>>>> startRow=0\x00,
> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
> >> >> >>>>>>> >
> >> >> >>>>>>> > The data return very quickly , the result of this sql is :
> >> >> >>>>>>> > +------+--------+
> >> >> >>>>>>> > |  k   |   p    |
> >> >> >>>>>>> > +------+--------+
> >> >> >>>>>>> > | pay  | 12180  |
> >> >> >>>>>>> > +------+--------
> >> >> >>>>>>> >
> >> >> >>>>>>> > But I have millons of data in the table.
> >> >> >>>>>>> >
> >> >> >>>>>>> > I tried to change the physical plan.  if I change the json
> >> >> explain
> >> >> >>>>>>> > *"columns"
> >> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will
> >> return
> >> >> the
> >> >> >>>>>>> correct
> >> >> >>>>>>> > result.
> >> >> >>>>>>> >
> >> >> >>>>>>> > It seems the physical plan is not correct.
> >> >> >>>>>>> > I also try to debug the sql parser to find out the reason,
> >> but
> >> >> its
> >> >> >>>>>>> too
> >> >> >>>>>>> > complicate. Can anyone help me.
> >> >> >>>>>>> >
> >> >> >>>>>>> > Also this sql have the same issue.
> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')
> >> as
> >> >> k,
> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
> >> >> *"columns" :
> >> >> >>>>>>> [
> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
> >> >> >>>>>>> >
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: query from hbase issue

Posted by Krystal Nguyen <kn...@maprtech.com>.
Qiang, Can you please let us know the hbase version and hadoop distribution
version that you are using.

On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <kn...@maprtech.com>
wrote:

> Can you please let us know the hbase version and hadoop distribution
> version that you are using.
>
>
> On Fri, May 20, 2016 at 1:35 AM, qiang li <ti...@gmail.com> wrote:
>
>> Khurram , I send the mail again, the last mail forget to cc to
>> user@drill.apache.org
>>
>> The main process is the same, but my rowkey is more complicate,
>> Here is the detail I tested.
>> rowkey is like this : [salt 1byte string] + [day 8byte string] + [event] +
>> [uid long] + [ts long]
>> also I have other qualifiers, only qualifier v:v is integer, the others
>> are
>> string.
>>
>> example:
>> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
>> ROW                                                          COLUMN+CELL
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e0,
>> timestamp=1461839343076, value=pay
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e1,
>> timestamp=1461839343076, value=bijia
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e2,
>> timestamp=1461839343076, value=browser
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e3,
>> timestamp=1461839343076, value=*
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e4,
>> timestamp=1461839343076, value=*
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e5,
>> timestamp=1461839343076, value=*
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h,
>> timestamp=1459771200000, value=20
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m,
>> timestamp=1459771200000, value=0
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v,
>> timestamp=1459771200000, value=\x00\x00\x00\x17
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>> 1 row(s) in 0.0410 seconds
>>
>>
>> Here is the example I how the issue look like:
>>
>> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
>> STARTROW=> '0'}
>> ........
>>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ column=v:e0,
>> timestamp=1463723029448, value=visit
>>
>>
>>  x01T\x00\x0A\xFA\x00
>>
>>
>>
>>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ column=v:e0,
>> timestamp=1463723029217, value=visit
>>
>>
>>  x01T\x00\x0A\xFA\x00
>>
>>
>>
>>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ column=v:e0,
>> timestamp=1463723029295, value=visit
>>
>>
>>  x01T\x00\x0A\xFA\x00
>>
>>
>>
>> 9994 row(s) in 123.8650 seconds
>>
>> the drill result:
>> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0'
>>  group by a.`v`.`e0`;
>> +--------+-------+
>> |   k    |   p   |
>> +--------+-------+
>> | visit  | 1216  |
>> +--------+-------+
>>
>>
>> I find out that if the row size larger than 10000  will have the issue.
>> The
>> result is right if less that 1000 rows. But not always that way.
>> What I can make sure is if I updated the columns in the physical plan and
>> query by web UI , the result will be correct.
>>
>>
>> Thanks
>>
>> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>>
>> > Qiang, can you please take a look at DRILL-4686 and confirm if the data
>> > set used in my repro is the same as the one you have used. If the data
>> set
>> > is different please let us know the type of data that you have used in
>> your
>> > table.
>> >
>> > Aman - I will try to repro the problem on Drill 1.6.0 and share results.
>> >
>> > Thanks,
>> > Khurram
>> >
>> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <am...@apache.org>
>> wrote:
>> >
>> >> Khurram,  DRILL-4686 seems like a different issue...it is reporting an
>> >> error whereas the original problem from qiang was an incorrect result.
>> >> Can
>> >> you use the same version (1.6) that he was using.  Also, is the data
>> set
>> >> similar ? If you are unable to repro the exact same issue,  perhaps
>> qiang
>> >> should file a JIRA with a smaller repro if possible.
>> >>
>> >>
>> >>
>> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kf...@maprtech.com>
>> >> wrote:
>> >>
>> >> > Hello Qiang,
>> >> >
>> >> > DRILL-4686 is reported to track this problem.
>> >> >
>> >> > Thanks,
>> >> > Khurram
>> >> >
>> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <ti...@gmail.com>
>> wrote:
>> >> >
>> >> >> Ok, Thanks very much.
>> >> >>
>> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>> >> >>
>> >> >>> Hello Qiang,
>> >> >>>
>> >> >>> Someone from our Drill team (in San Jose) will get back to you
>> soon. I
>> >> >>> work from the India lab and I am in a different time zone as
>> compared
>> >> to
>> >> >>> San Jose office, some one from MapR San Jose will get back to you
>> as
>> >> soon
>> >> >>> as possible.
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Khurram
>> >> >>>
>> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <ti...@gmail.com>
>> >> wrote:
>> >> >>>
>> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
>> >> >>>> conclusion?
>> >> >>>>
>> >> >>>> Any idea how to sovle it?
>> >> >>>>
>> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>> >> >>>>
>> >> >>>>> So I tried to create the table using HBase API (with no data
>> >> inserted
>> >> >>>>> into table) and I got the query plan for drill 1.7.0
>> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>> >> >>>>>
>> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> >> >>>>> +--------------+------------+--------------+
>> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>> >> >>>>> +--------------+------------+--------------+
>> >> >>>>> | row_key      | ANY        | NO           |
>> >> >>>>> | v            | MAP        | NO           |
>> >> >>>>> +--------------+------------+--------------+
>> >> >>>>> 2 rows selected (1.665 seconds)
>> >> >>>>>
>> >> >>>>> Table creation Java program
>> >> >>>>>
>> >> >>>>> {noformat}
>> >> >>>>> public class PutIntDataToHBase {
>> >> >>>>>     public static void main(String args[]) throws IOException {
>> >> >>>>>         Configuration conf = HBaseConfiguration.create();
>> >> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>> >> >>>>>         if (admin.tableExists("browser_action2")) {
>> >> >>>>>             admin.disableTable("browser_action2");
>> >> >>>>>             admin.deleteTable("browser_action2");
>> >> >>>>>         }
>> >> >>>>>
>> >> >>>>>         byte[][] SPLIT_KEYS =
>> >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>> >> >>>>>         HTableDescriptor tableDesc = new
>> >> >>>>>
>>  HTableDescriptor(TableName.valueOf("browser_action2"));
>> >> >>>>>
>> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>> >> >>>>>
>> >> >>>>>     }
>> >> >>>>> }
>> >> >>>>> {noformat}
>> >> >>>>>
>> >> >>>>> Query plan for the query that was reported as returning wrong
>> >> results.
>> >> >>>>>
>> >> >>>>> {noformat}
>> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
>> a.`v`.`e0`;
>> >> >>>>> +------+------+
>> >> >>>>> | text | json |
>> >> >>>>> +------+------+
>> >> >>>>> | 00-00    Screen
>> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >> >>>>> 00-02        UnionExchange
>> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >> >>>>> 02-01                  UnorderedMuxExchange
>> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> >> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >> startRow=0\x00,
>> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
>> >> >>>>> {noformat}
>> >> >>>>>
>> >> >>>>> and the query plan for the other problem query mentioned in the
>> >> first
>> >> >>>>> email.
>> >> >>>>>
>> >> >>>>> {noformat}
>> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> >> >>>>> hbase.browser_action2 a group by
>> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >> >>>>> +------+------+
>> >> >>>>> | text | json |
>> >> >>>>> +------+------+
>> >> >>>>> | 00-00    Screen
>> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >> >>>>> 00-02        UnionExchange
>> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >> >>>>> 02-01                  UnorderedMuxExchange
>> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>> >> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>> >> >>>>> row_key=[$0])
>> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >> startRow=null,
>> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
>> >> >>>>> {noformat}
>> >> >>>>>
>> >> >>>>> Thanks,
>> >> >>>>> Khurram
>> >> >>>>>
>> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com>
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>>> Yes.
>> >> >>>>>> I use hbase API to create it.
>> >> >>>>>>
>> >> >>>>>> The main code is:
>> >> >>>>>>
>> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'},
>> {'5'},
>> >> {'6'}, {'7'},{'8'}, {'9'} };
>> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
>> >> >>>>>>
>> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>> >> >>>>>> tableDesc.addFamily(columnDesc);
>> >> >>>>>>
>> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>> >> >>>>>>
>> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>> >> >>>>>>
>> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
>> reproduce
>> >> this
>> >> >>>>>>> problem so we can try to reproduce it on our end.
>> >> >>>>>>>
>> >> >>>>>>> Thanks.
>> >> >>>>>>>
>> >> >>>>>>> -- Zelaine
>> >> >>>>>>>
>> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
>> tiredqiang@gmail.com>
>> >> >>>>>>> wrote:
>> >> >>>>>>>
>> >> >>>>>>> > Hi ,
>> >> >>>>>>> >
>> >> >>>>>>> > I recently meet a issue that can not query the correct data
>> from
>> >> >>>>>>> hbase with
>> >> >>>>>>> > sql by drill, can anybody help me.
>> >> >>>>>>> >
>> >> >>>>>>> > I test with the drill 1.6.
>> >> >>>>>>> > My hbase scheme:
>> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>> >> >>>>>>> > cf : v
>> >> >>>>>>> > qualifier: v, e0, e1
>> >> >>>>>>> >
>> >> >>>>>>> > The wrong result only happened when I use group by clause.
>> >> >>>>>>> >
>> >> >>>>>>> > This sql will not return correct result:
>> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> count(a.`v`.`e0`)
>> >> p
>> >> >>>>>>> from
>> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
>> >> a.`v`.`e0`;
>> >> >>>>>>> > Part of explain of this sql is:
>> >> >>>>>>> >
>> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2
>> a
>> >> >>>>>>> where
>> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>> >> >>>>>>> > +------+------+
>> >> >>>>>>> > | text | json |
>> >> >>>>>>> > +------+------+
>> >> >>>>>>> > | 00-00    Screen
>> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
>> >> >>>>>>> > 00-02        UnionExchange
>> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>> >> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>> >> >>>>>>> > 02-01                  UnorderedMuxExchange
>> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
>> p=[COUNT($0)])
>> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >> >>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >> >>>>>>> startRow=0\x00,
>> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>> >> >>>>>>> >
>> >> >>>>>>> > The data return very quickly , the result of this sql is :
>> >> >>>>>>> > +------+--------+
>> >> >>>>>>> > |  k   |   p    |
>> >> >>>>>>> > +------+--------+
>> >> >>>>>>> > | pay  | 12180  |
>> >> >>>>>>> > +------+--------
>> >> >>>>>>> >
>> >> >>>>>>> > But I have millons of data in the table.
>> >> >>>>>>> >
>> >> >>>>>>> > I tried to change the physical plan.  if I change the json
>> >> explain
>> >> >>>>>>> > *"columns"
>> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will
>> return
>> >> the
>> >> >>>>>>> correct
>> >> >>>>>>> > result.
>> >> >>>>>>> >
>> >> >>>>>>> > It seems the physical plan is not correct.
>> >> >>>>>>> > I also try to debug the sql parser to find out the reason,
>> but
>> >> its
>> >> >>>>>>> too
>> >> >>>>>>> > complicate. Can anyone help me.
>> >> >>>>>>> >
>> >> >>>>>>> > Also this sql have the same issue.
>> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')
>> as
>> >> k,
>> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
>> >> *"columns" :
>> >> >>>>>>> [
>> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
>> >> >>>>>>> >
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: query from hbase issue

Posted by Krystal Nguyen <kn...@maprtech.com>.
Can you please let us know the hbase version and hadoop distribution
version that you are using.


On Fri, May 20, 2016 at 1:35 AM, qiang li <ti...@gmail.com> wrote:

> Khurram , I send the mail again, the last mail forget to cc to
> user@drill.apache.org
>
> The main process is the same, but my rowkey is more complicate,
> Here is the detail I tested.
> rowkey is like this : [salt 1byte string] + [day 8byte string] + [event] +
> [uid long] + [ts long]
> also I have other qualifiers, only qualifier v:v is integer, the others are
> string.
>
> example:
> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
> ROW                                                          COLUMN+CELL
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e0,
> timestamp=1461839343076, value=pay
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e1,
> timestamp=1461839343076, value=bijia
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e2,
> timestamp=1461839343076, value=browser
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e3,
> timestamp=1461839343076, value=*
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e4,
> timestamp=1461839343076, value=*
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e5,
> timestamp=1461839343076, value=*
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h,
> timestamp=1459771200000, value=20
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m,
> timestamp=1459771200000, value=0
>
>
>  1$\xD2\x00
>
>
>
>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v,
> timestamp=1459771200000, value=\x00\x00\x00\x17
>
>
>  1$\xD2\x00
>
>
>
> 1 row(s) in 0.0410 seconds
>
>
> Here is the example I how the issue look like:
>
> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
> STARTROW=> '0'}
> ........
>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ column=v:e0,
> timestamp=1463723029448, value=visit
>
>
>  x01T\x00\x0A\xFA\x00
>
>
>
>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ column=v:e0,
> timestamp=1463723029217, value=visit
>
>
>  x01T\x00\x0A\xFA\x00
>
>
>
>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ column=v:e0,
> timestamp=1463723029295, value=visit
>
>
>  x01T\x00\x0A\xFA\x00
>
>
>
> 9994 row(s) in 123.8650 seconds
>
> the drill result:
> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0'
>  group by a.`v`.`e0`;
> +--------+-------+
> |   k    |   p   |
> +--------+-------+
> | visit  | 1216  |
> +--------+-------+
>
>
> I find out that if the row size larger than 10000  will have the issue. The
> result is right if less that 1000 rows. But not always that way.
> What I can make sure is if I updated the columns in the physical plan and
> query by web UI , the result will be correct.
>
>
> Thanks
>
> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>
> > Qiang, can you please take a look at DRILL-4686 and confirm if the data
> > set used in my repro is the same as the one you have used. If the data
> set
> > is different please let us know the type of data that you have used in
> your
> > table.
> >
> > Aman - I will try to repro the problem on Drill 1.6.0 and share results.
> >
> > Thanks,
> > Khurram
> >
> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <am...@apache.org>
> wrote:
> >
> >> Khurram,  DRILL-4686 seems like a different issue...it is reporting an
> >> error whereas the original problem from qiang was an incorrect result.
> >> Can
> >> you use the same version (1.6) that he was using.  Also, is the data set
> >> similar ? If you are unable to repro the exact same issue,  perhaps
> qiang
> >> should file a JIRA with a smaller repro if possible.
> >>
> >>
> >>
> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kf...@maprtech.com>
> >> wrote:
> >>
> >> > Hello Qiang,
> >> >
> >> > DRILL-4686 is reported to track this problem.
> >> >
> >> > Thanks,
> >> > Khurram
> >> >
> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <ti...@gmail.com>
> wrote:
> >> >
> >> >> Ok, Thanks very much.
> >> >>
> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
> >> >>
> >> >>> Hello Qiang,
> >> >>>
> >> >>> Someone from our Drill team (in San Jose) will get back to you
> soon. I
> >> >>> work from the India lab and I am in a different time zone as
> compared
> >> to
> >> >>> San Jose office, some one from MapR San Jose will get back to you as
> >> soon
> >> >>> as possible.
> >> >>>
> >> >>> Thanks,
> >> >>> Khurram
> >> >>>
> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <ti...@gmail.com>
> >> wrote:
> >> >>>
> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
> >> >>>> conclusion?
> >> >>>>
> >> >>>> Any idea how to sovle it?
> >> >>>>
> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
> >> >>>>
> >> >>>>> So I tried to create the table using HBase API (with no data
> >> inserted
> >> >>>>> into table) and I got the query plan for drill 1.7.0
> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
> >> >>>>>
> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> >> >>>>> +--------------+------------+--------------+
> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> >> >>>>> +--------------+------------+--------------+
> >> >>>>> | row_key      | ANY        | NO           |
> >> >>>>> | v            | MAP        | NO           |
> >> >>>>> +--------------+------------+--------------+
> >> >>>>> 2 rows selected (1.665 seconds)
> >> >>>>>
> >> >>>>> Table creation Java program
> >> >>>>>
> >> >>>>> {noformat}
> >> >>>>> public class PutIntDataToHBase {
> >> >>>>>     public static void main(String args[]) throws IOException {
> >> >>>>>         Configuration conf = HBaseConfiguration.create();
> >> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
> >> >>>>>         if (admin.tableExists("browser_action2")) {
> >> >>>>>             admin.disableTable("browser_action2");
> >> >>>>>             admin.deleteTable("browser_action2");
> >> >>>>>         }
> >> >>>>>
> >> >>>>>         byte[][] SPLIT_KEYS =
> >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
> >> >>>>>         HTableDescriptor tableDesc = new
> >> >>>>>
>  HTableDescriptor(TableName.valueOf("browser_action2"));
> >> >>>>>
> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
> >> >>>>>
> >> >>>>>     }
> >> >>>>> }
> >> >>>>> {noformat}
> >> >>>>>
> >> >>>>> Query plan for the query that was reported as returning wrong
> >> results.
> >> >>>>>
> >> >>>>> {noformat}
> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
> a.`v`.`e0`;
> >> >>>>> +------+------+
> >> >>>>> | text | json |
> >> >>>>> +------+------+
> >> >>>>> | 00-00    Screen
> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >>>>> 00-02        UnionExchange
> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> startRow=0\x00,
> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
> >> >>>>> {noformat}
> >> >>>>>
> >> >>>>> and the query plan for the other problem query mentioned in the
> >> first
> >> >>>>> email.
> >> >>>>>
> >> >>>>> {noformat}
> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
> >> >>>>> hbase.browser_action2 a group by
> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >>>>> +------+------+
> >> >>>>> | text | json |
> >> >>>>> +------+------+
> >> >>>>> | 00-00    Screen
> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >>>>> 00-02        UnionExchange
> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
> >> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
> >> >>>>> row_key=[$0])
> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> startRow=null,
> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
> >> >>>>> {noformat}
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Khurram
> >> >>>>>
> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> Yes.
> >> >>>>>> I use hbase API to create it.
> >> >>>>>>
> >> >>>>>> The main code is:
> >> >>>>>>
> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'},
> >> {'6'}, {'7'},{'8'}, {'9'} };
> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
> >> >>>>>>
> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> >> >>>>>> tableDesc.addFamily(columnDesc);
> >> >>>>>>
> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
> >> >>>>>>
> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
> >> >>>>>>
> >> >>>>>>> Can you provide the CREATE TABLE statement you used to reproduce
> >> this
> >> >>>>>>> problem so we can try to reproduce it on our end.
> >> >>>>>>>
> >> >>>>>>> Thanks.
> >> >>>>>>>
> >> >>>>>>> -- Zelaine
> >> >>>>>>>
> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqiang@gmail.com
> >
> >> >>>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>> > Hi ,
> >> >>>>>>> >
> >> >>>>>>> > I recently meet a issue that can not query the correct data
> from
> >> >>>>>>> hbase with
> >> >>>>>>> > sql by drill, can anybody help me.
> >> >>>>>>> >
> >> >>>>>>> > I test with the drill 1.6.
> >> >>>>>>> > My hbase scheme:
> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
> >> >>>>>>> > cf : v
> >> >>>>>>> > qualifier: v, e0, e1
> >> >>>>>>> >
> >> >>>>>>> > The wrong result only happened when I use group by clause.
> >> >>>>>>> >
> >> >>>>>>> > This sql will not return correct result:
> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> count(a.`v`.`e0`)
> >> p
> >> >>>>>>> from
> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
> >> a.`v`.`e0`;
> >> >>>>>>> > Part of explain of this sql is:
> >> >>>>>>> >
> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a
> >> >>>>>>> where
> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
> >> >>>>>>> > +------+------+
> >> >>>>>>> > | text | json |
> >> >>>>>>> > +------+------+
> >> >>>>>>> > | 00-00    Screen
> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
> >> >>>>>>> > 00-02        UnionExchange
> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
> >> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >>>>>>> > 02-01                  UnorderedMuxExchange
> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >>>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
> >> >>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >>>>>>> startRow=0\x00,
> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
> >> >>>>>>> >
> >> >>>>>>> > The data return very quickly , the result of this sql is :
> >> >>>>>>> > +------+--------+
> >> >>>>>>> > |  k   |   p    |
> >> >>>>>>> > +------+--------+
> >> >>>>>>> > | pay  | 12180  |
> >> >>>>>>> > +------+--------
> >> >>>>>>> >
> >> >>>>>>> > But I have millons of data in the table.
> >> >>>>>>> >
> >> >>>>>>> > I tried to change the physical plan.  if I change the json
> >> explain
> >> >>>>>>> > *"columns"
> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return
> >> the
> >> >>>>>>> correct
> >> >>>>>>> > result.
> >> >>>>>>> >
> >> >>>>>>> > It seems the physical plan is not correct.
> >> >>>>>>> > I also try to debug the sql parser to find out the reason, but
> >> its
> >> >>>>>>> too
> >> >>>>>>> > complicate. Can anyone help me.
> >> >>>>>>> >
> >> >>>>>>> > Also this sql have the same issue.
> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as
> >> k,
> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
> >> *"columns" :
> >> >>>>>>> [
> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
> >> >>>>>>> >
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >
> >
>

Re: query from hbase issue

Posted by qiang li <ti...@gmail.com>.
Khurram , I send the mail again, the last mail forget to cc to
user@drill.apache.org

The main process is the same, but my rowkey is more complicate,
Here is the detail I tested.
rowkey is like this : [salt 1byte string] + [day 8byte string] + [event] +
[uid long] + [ts long]
also I have other qualifiers, only qualifier v:v is integer, the others are
string.

example:
hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
ROW                                                          COLUMN+CELL



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e0,
timestamp=1461839343076, value=pay


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e1,
timestamp=1461839343076, value=bijia


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e2,
timestamp=1461839343076, value=browser


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e3,
timestamp=1461839343076, value=*


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e4,
timestamp=1461839343076, value=*


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e5,
timestamp=1461839343076, value=*


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h,
timestamp=1459771200000, value=20


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m,
timestamp=1459771200000, value=0


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v,
timestamp=1459771200000, value=\x00\x00\x00\x17


 1$\xD2\x00



1 row(s) in 0.0410 seconds


Here is the example I how the issue look like:

hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
STARTROW=> '0'}
........
 920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ column=v:e0,
timestamp=1463723029448, value=visit


 x01T\x00\x0A\xFA\x00



 920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ column=v:e0,
timestamp=1463723029217, value=visit


 x01T\x00\x0A\xFA\x00



 920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ column=v:e0,
timestamp=1463723029295, value=visit


 x01T\x00\x0A\xFA\x00



9994 row(s) in 123.8650 seconds

the drill result:
0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0'
 group by a.`v`.`e0`;
+--------+-------+
|   k    |   p   |
+--------+-------+
| visit  | 1216  |
+--------+-------+


I find out that if the row size larger than 10000  will have the issue. The
result is right if less that 1000 rows. But not always that way.
What I can make sure is if I updated the columns in the physical plan and
query by web UI , the result will be correct.


Thanks

2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:

> Qiang, can you please take a look at DRILL-4686 and confirm if the data
> set used in my repro is the same as the one you have used. If the data set
> is different please let us know the type of data that you have used in your
> table.
>
> Aman - I will try to repro the problem on Drill 1.6.0 and share results.
>
> Thanks,
> Khurram
>
> On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <am...@apache.org> wrote:
>
>> Khurram,  DRILL-4686 seems like a different issue...it is reporting an
>> error whereas the original problem from qiang was an incorrect result.
>> Can
>> you use the same version (1.6) that he was using.  Also, is the data set
>> similar ? If you are unable to repro the exact same issue,  perhaps qiang
>> should file a JIRA with a smaller repro if possible.
>>
>>
>>
>> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kf...@maprtech.com>
>> wrote:
>>
>> > Hello Qiang,
>> >
>> > DRILL-4686 is reported to track this problem.
>> >
>> > Thanks,
>> > Khurram
>> >
>> > On Wed, May 18, 2016 at 3:16 PM, qiang li <ti...@gmail.com> wrote:
>> >
>> >> Ok, Thanks very much.
>> >>
>> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>> >>
>> >>> Hello Qiang,
>> >>>
>> >>> Someone from our Drill team (in San Jose) will get back to you soon. I
>> >>> work from the India lab and I am in a different time zone as compared
>> to
>> >>> San Jose office, some one from MapR San Jose will get back to you as
>> soon
>> >>> as possible.
>> >>>
>> >>> Thanks,
>> >>> Khurram
>> >>>
>> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <ti...@gmail.com>
>> wrote:
>> >>>
>> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
>> >>>> conclusion?
>> >>>>
>> >>>> Any idea how to sovle it?
>> >>>>
>> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>> >>>>
>> >>>>> So I tried to create the table using HBase API (with no data
>> inserted
>> >>>>> into table) and I got the query plan for drill 1.7.0
>> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>> >>>>>
>> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> >>>>> +--------------+------------+--------------+
>> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>> >>>>> +--------------+------------+--------------+
>> >>>>> | row_key      | ANY        | NO           |
>> >>>>> | v            | MAP        | NO           |
>> >>>>> +--------------+------------+--------------+
>> >>>>> 2 rows selected (1.665 seconds)
>> >>>>>
>> >>>>> Table creation Java program
>> >>>>>
>> >>>>> {noformat}
>> >>>>> public class PutIntDataToHBase {
>> >>>>>     public static void main(String args[]) throws IOException {
>> >>>>>         Configuration conf = HBaseConfiguration.create();
>> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>> >>>>>         if (admin.tableExists("browser_action2")) {
>> >>>>>             admin.disableTable("browser_action2");
>> >>>>>             admin.deleteTable("browser_action2");
>> >>>>>         }
>> >>>>>
>> >>>>>         byte[][] SPLIT_KEYS =
>> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>> >>>>>         HTableDescriptor tableDesc = new
>> >>>>>             HTableDescriptor(TableName.valueOf("browser_action2"));
>> >>>>>
>> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>> >>>>>
>> >>>>>     }
>> >>>>> }
>> >>>>> {noformat}
>> >>>>>
>> >>>>> Query plan for the query that was reported as returning wrong
>> results.
>> >>>>>
>> >>>>> {noformat}
>> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>> >>>>> +------+------+
>> >>>>> | text | json |
>> >>>>> +------+------+
>> >>>>> | 00-00    Screen
>> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >>>>> 00-02        UnionExchange
>> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >>>>> 02-01                  UnorderedMuxExchange
>> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> startRow=0\x00,
>> >>>>> stopRow=, filter=null], columns=[`*`]]])
>> >>>>> {noformat}
>> >>>>>
>> >>>>> and the query plan for the other problem query mentioned in the
>> first
>> >>>>> email.
>> >>>>>
>> >>>>> {noformat}
>> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> >>>>> hbase.browser_action2 a group by
>> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >>>>> +------+------+
>> >>>>> | text | json |
>> >>>>> +------+------+
>> >>>>> | 00-00    Screen
>> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >>>>> 00-02        UnionExchange
>> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >>>>> 02-01                  UnorderedMuxExchange
>> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>> >>>>> row_key=[$0])
>> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> startRow=null,
>> >>>>> stopRow=null, filter=null], columns=[`*`]]])
>> >>>>> {noformat}
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Khurram
>> >>>>>
>> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Yes.
>> >>>>>> I use hbase API to create it.
>> >>>>>>
>> >>>>>> The main code is:
>> >>>>>>
>> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'},
>> {'6'}, {'7'},{'8'}, {'9'} };
>> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
>> >>>>>>
>> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>> >>>>>> tableDesc.addFamily(columnDesc);
>> >>>>>>
>> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>> >>>>>>
>> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>> >>>>>>
>> >>>>>>> Can you provide the CREATE TABLE statement you used to reproduce
>> this
>> >>>>>>> problem so we can try to reproduce it on our end.
>> >>>>>>>
>> >>>>>>> Thanks.
>> >>>>>>>
>> >>>>>>> -- Zelaine
>> >>>>>>>
>> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>> > Hi ,
>> >>>>>>> >
>> >>>>>>> > I recently meet a issue that can not query the correct data from
>> >>>>>>> hbase with
>> >>>>>>> > sql by drill, can anybody help me.
>> >>>>>>> >
>> >>>>>>> > I test with the drill 1.6.
>> >>>>>>> > My hbase scheme:
>> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>> >>>>>>> > cf : v
>> >>>>>>> > qualifier: v, e0, e1
>> >>>>>>> >
>> >>>>>>> > The wrong result only happened when I use group by clause.
>> >>>>>>> >
>> >>>>>>> > This sql will not return correct result:
>> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`)
>> p
>> >>>>>>> from
>> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
>> a.`v`.`e0`;
>> >>>>>>> > Part of explain of this sql is:
>> >>>>>>> >
>> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a
>> >>>>>>> where
>> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>> >>>>>>> > +------+------+
>> >>>>>>> > | text | json |
>> >>>>>>> > +------+------+
>> >>>>>>> > | 00-00    Screen
>> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
>> >>>>>>> > 00-02        UnionExchange
>> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>> >>>>>>> > 02-01                  UnorderedMuxExchange
>> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >>>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >>>>>>> startRow=0\x00,
>> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>> >>>>>>> >
>> >>>>>>> > The data return very quickly , the result of this sql is :
>> >>>>>>> > +------+--------+
>> >>>>>>> > |  k   |   p    |
>> >>>>>>> > +------+--------+
>> >>>>>>> > | pay  | 12180  |
>> >>>>>>> > +------+--------
>> >>>>>>> >
>> >>>>>>> > But I have millons of data in the table.
>> >>>>>>> >
>> >>>>>>> > I tried to change the physical plan.  if I change the json
>> explain
>> >>>>>>> > *"columns"
>> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return
>> the
>> >>>>>>> correct
>> >>>>>>> > result.
>> >>>>>>> >
>> >>>>>>> > It seems the physical plan is not correct.
>> >>>>>>> > I also try to debug the sql parser to find out the reason, but
>> its
>> >>>>>>> too
>> >>>>>>> > complicate. Can anyone help me.
>> >>>>>>> >
>> >>>>>>> > Also this sql have the same issue.
>> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as
>> k,
>> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
>> *"columns" :
>> >>>>>>> [
>> >>>>>>> > "`row_key`" ] *, it will return the correct result.
>> >>>>>>> >
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>
>

Re: query from hbase issue

Posted by Khurram Faraaz <kf...@maprtech.com>.
Qiang, can you please take a look at DRILL-4686 and confirm if the data set
used in my repro is the same as the one you have used. If the data set is
different please let us know the type of data that you have used in your
table.

Aman - I will try to repro the problem on Drill 1.6.0 and share results.

Thanks,
Khurram

On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <am...@apache.org> wrote:

> Khurram,  DRILL-4686 seems like a different issue...it is reporting an
> error whereas the original problem from qiang was an incorrect result.  Can
> you use the same version (1.6) that he was using.  Also, is the data set
> similar ? If you are unable to repro the exact same issue,  perhaps qiang
> should file a JIRA with a smaller repro if possible.
>
>
>
> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kf...@maprtech.com>
> wrote:
>
> > Hello Qiang,
> >
> > DRILL-4686 is reported to track this problem.
> >
> > Thanks,
> > Khurram
> >
> > On Wed, May 18, 2016 at 3:16 PM, qiang li <ti...@gmail.com> wrote:
> >
> >> Ok, Thanks very much.
> >>
> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
> >>
> >>> Hello Qiang,
> >>>
> >>> Someone from our Drill team (in San Jose) will get back to you soon. I
> >>> work from the India lab and I am in a different time zone as compared
> to
> >>> San Jose office, some one from MapR San Jose will get back to you as
> soon
> >>> as possible.
> >>>
> >>> Thanks,
> >>> Khurram
> >>>
> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <ti...@gmail.com>
> wrote:
> >>>
> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
> >>>> conclusion?
> >>>>
> >>>> Any idea how to sovle it?
> >>>>
> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
> >>>>
> >>>>> So I tried to create the table using HBase API (with no data inserted
> >>>>> into table) and I got the query plan for drill 1.7.0
> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
> >>>>>
> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> >>>>> +--------------+------------+--------------+
> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> >>>>> +--------------+------------+--------------+
> >>>>> | row_key      | ANY        | NO           |
> >>>>> | v            | MAP        | NO           |
> >>>>> +--------------+------------+--------------+
> >>>>> 2 rows selected (1.665 seconds)
> >>>>>
> >>>>> Table creation Java program
> >>>>>
> >>>>> {noformat}
> >>>>> public class PutIntDataToHBase {
> >>>>>     public static void main(String args[]) throws IOException {
> >>>>>         Configuration conf = HBaseConfiguration.create();
> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
> >>>>>         if (admin.tableExists("browser_action2")) {
> >>>>>             admin.disableTable("browser_action2");
> >>>>>             admin.deleteTable("browser_action2");
> >>>>>         }
> >>>>>
> >>>>>         byte[][] SPLIT_KEYS =
> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
> >>>>>         HTableDescriptor tableDesc = new
> >>>>>             HTableDescriptor(TableName.valueOf("browser_action2"));
> >>>>>
> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
> >>>>>
> >>>>>     }
> >>>>> }
> >>>>> {noformat}
> >>>>>
> >>>>> Query plan for the query that was reported as returning wrong
> results.
> >>>>>
> >>>>> {noformat}
> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
> >>>>> +------+------+
> >>>>> | text | json |
> >>>>> +------+------+
> >>>>> | 00-00    Screen
> >>>>> 00-01      Project(k=[$0], p=[$1])
> >>>>> 00-02        UnionExchange
> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >>>>> 02-01                  UnorderedMuxExchange
> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> startRow=0\x00,
> >>>>> stopRow=, filter=null], columns=[`*`]]])
> >>>>> {noformat}
> >>>>>
> >>>>> and the query plan for the other problem query mentioned in the first
> >>>>> email.
> >>>>>
> >>>>> {noformat}
> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
> >>>>> hbase.browser_action2 a group by
> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
> >>>>> +------+------+
> >>>>> | text | json |
> >>>>> +------+------+
> >>>>> | 00-00    Screen
> >>>>> 00-01      Project(k=[$0], p=[$1])
> >>>>> 00-02        UnionExchange
> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >>>>> 02-01                  UnorderedMuxExchange
> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
> >>>>> row_key=[$0])
> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> startRow=null,
> >>>>> stopRow=null, filter=null], columns=[`*`]]])
> >>>>> {noformat}
> >>>>>
> >>>>> Thanks,
> >>>>> Khurram
> >>>>>
> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Yes.
> >>>>>> I use hbase API to create it.
> >>>>>>
> >>>>>> The main code is:
> >>>>>>
> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'},
> {'6'}, {'7'},{'8'}, {'9'} };
> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
> >>>>>>
> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> >>>>>> tableDesc.addFamily(columnDesc);
> >>>>>>
> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
> >>>>>>
> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
> >>>>>>
> >>>>>>> Can you provide the CREATE TABLE statement you used to reproduce
> this
> >>>>>>> problem so we can try to reproduce it on our end.
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> -- Zelaine
> >>>>>>>
> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> > Hi ,
> >>>>>>> >
> >>>>>>> > I recently meet a issue that can not query the correct data from
> >>>>>>> hbase with
> >>>>>>> > sql by drill, can anybody help me.
> >>>>>>> >
> >>>>>>> > I test with the drill 1.6.
> >>>>>>> > My hbase scheme:
> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
> >>>>>>> > cf : v
> >>>>>>> > qualifier: v, e0, e1
> >>>>>>> >
> >>>>>>> > The wrong result only happened when I use group by clause.
> >>>>>>> >
> >>>>>>> > This sql will not return correct result:
> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p
> >>>>>>> from
> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
> a.`v`.`e0`;
> >>>>>>> > Part of explain of this sql is:
> >>>>>>> >
> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a
> >>>>>>> where
> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
> >>>>>>> > +------+------+
> >>>>>>> > | text | json |
> >>>>>>> > +------+------+
> >>>>>>> > | 00-00    Screen
> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
> >>>>>>> > 00-02        UnionExchange
> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
> >>>>>>> > 02-01                  UnorderedMuxExchange
> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >>>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
> >>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >>>>>>> startRow=0\x00,
> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
> >>>>>>> >
> >>>>>>> > The data return very quickly , the result of this sql is :
> >>>>>>> > +------+--------+
> >>>>>>> > |  k   |   p    |
> >>>>>>> > +------+--------+
> >>>>>>> > | pay  | 12180  |
> >>>>>>> > +------+--------
> >>>>>>> >
> >>>>>>> > But I have millons of data in the table.
> >>>>>>> >
> >>>>>>> > I tried to change the physical plan.  if I change the json
> explain
> >>>>>>> > *"columns"
> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return
> the
> >>>>>>> correct
> >>>>>>> > result.
> >>>>>>> >
> >>>>>>> > It seems the physical plan is not correct.
> >>>>>>> > I also try to debug the sql parser to find out the reason, but
> its
> >>>>>>> too
> >>>>>>> > complicate. Can anyone help me.
> >>>>>>> >
> >>>>>>> > Also this sql have the same issue.
> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns"
> :
> >>>>>>> [
> >>>>>>> > "`row_key`" ] *, it will return the correct result.
> >>>>>>> >
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: query from hbase issue

Posted by Aman Sinha <am...@apache.org>.
Khurram,  DRILL-4686 seems like a different issue...it is reporting an
error whereas the original problem from qiang was an incorrect result.  Can
you use the same version (1.6) that he was using.  Also, is the data set
similar ? If you are unable to repro the exact same issue,  perhaps qiang
should file a JIRA with a smaller repro if possible.



On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kf...@maprtech.com>
wrote:

> Hello Qiang,
>
> DRILL-4686 is reported to track this problem.
>
> Thanks,
> Khurram
>
> On Wed, May 18, 2016 at 3:16 PM, qiang li <ti...@gmail.com> wrote:
>
>> Ok, Thanks very much.
>>
>> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>>
>>> Hello Qiang,
>>>
>>> Someone from our Drill team (in San Jose) will get back to you soon. I
>>> work from the India lab and I am in a different time zone as compared to
>>> San Jose office, some one from MapR San Jose will get back to you as soon
>>> as possible.
>>>
>>> Thanks,
>>> Khurram
>>>
>>> On Wed, May 18, 2016 at 3:09 PM, qiang li <ti...@gmail.com> wrote:
>>>
>>>> Hi Khurram, Thanks very much to reproduce it, so what's the
>>>> conclusion?
>>>>
>>>> Any idea how to sovle it?
>>>>
>>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>>>>
>>>>> So I tried to create the table using HBase API (with no data inserted
>>>>> into table) and I got the query plan for drill 1.7.0
>>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>>>>>
>>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>>>>> +--------------+------------+--------------+
>>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>>>>> +--------------+------------+--------------+
>>>>> | row_key      | ANY        | NO           |
>>>>> | v            | MAP        | NO           |
>>>>> +--------------+------------+--------------+
>>>>> 2 rows selected (1.665 seconds)
>>>>>
>>>>> Table creation Java program
>>>>>
>>>>> {noformat}
>>>>> public class PutIntDataToHBase {
>>>>>     public static void main(String args[]) throws IOException {
>>>>>         Configuration conf = HBaseConfiguration.create();
>>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>>>>>         if (admin.tableExists("browser_action2")) {
>>>>>             admin.disableTable("browser_action2");
>>>>>             admin.deleteTable("browser_action2");
>>>>>         }
>>>>>
>>>>>         byte[][] SPLIT_KEYS =
>>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>>>>>         HTableDescriptor tableDesc = new
>>>>>             HTableDescriptor(TableName.valueOf("browser_action2"));
>>>>>
>>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>>>>>
>>>>>     }
>>>>> }
>>>>> {noformat}
>>>>>
>>>>> Query plan for the query that was reported as returning wrong results.
>>>>>
>>>>> {noformat}
>>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>>>>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>>>> +------+------+
>>>>> | text | json |
>>>>> +------+------+
>>>>> | 00-00    Screen
>>>>> 00-01      Project(k=[$0], p=[$1])
>>>>> 00-02        UnionExchange
>>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>>> 01-03              Project($f0=[$0], p=[$1])
>>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>>>> 02-01                  UnorderedMuxExchange
>>>>> 03-01                    Project($f0=[$0], p=[$1],
>>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
>>>>> stopRow=, filter=null], columns=[`*`]]])
>>>>> {noformat}
>>>>>
>>>>> and the query plan for the other problem query mentioned in the first
>>>>> email.
>>>>>
>>>>> {noformat}
>>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>>>>> hbase.browser_action2 a group by
>>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>>>>> +------+------+
>>>>> | text | json |
>>>>> +------+------+
>>>>> | 00-00    Screen
>>>>> 00-01      Project(k=[$0], p=[$1])
>>>>> 00-02        UnionExchange
>>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>>> 01-03              Project($f0=[$0], p=[$1])
>>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>>>> 02-01                  UnorderedMuxExchange
>>>>> 03-01                    Project($f0=[$0], p=[$1],
>>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>>>>> row_key=[$0])
>>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
>>>>> stopRow=null, filter=null], columns=[`*`]]])
>>>>> {noformat}
>>>>>
>>>>> Thanks,
>>>>> Khurram
>>>>>
>>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes.
>>>>>> I use hbase API to create it.
>>>>>>
>>>>>> The main code is:
>>>>>>
>>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, {'7'},{'8'}, {'9'} };
>>>>>> TableName tableName = TableName.valueOf("browser_action2");
>>>>>>
>>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>>>>>> tableDesc.addFamily(columnDesc);
>>>>>>
>>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>>>>>
>>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>>>>>>
>>>>>>> Can you provide the CREATE TABLE statement you used to reproduce this
>>>>>>> problem so we can try to reproduce it on our end.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> -- Zelaine
>>>>>>>
>>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Hi ,
>>>>>>> >
>>>>>>> > I recently meet a issue that can not query the correct data from
>>>>>>> hbase with
>>>>>>> > sql by drill, can anybody help me.
>>>>>>> >
>>>>>>> > I test with the drill 1.6.
>>>>>>> > My hbase scheme:
>>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>>>>>>> > cf : v
>>>>>>> > qualifier: v, e0, e1
>>>>>>> >
>>>>>>> > The wrong result only happened when I use group by clause.
>>>>>>> >
>>>>>>> > This sql will not return correct result:
>>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p
>>>>>>> from
>>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>>>>>> > Part of explain of this sql is:
>>>>>>> >
>>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>>>>>>> CONVERT_FROM(a.`v`.`e0`,
>>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a
>>>>>>> where
>>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>>>>>>> > +------+------+
>>>>>>> > | text | json |
>>>>>>> > +------+------+
>>>>>>> > | 00-00    Screen
>>>>>>> > 00-01      Project(k=[$0], p=[$1])
>>>>>>> > 00-02        UnionExchange
>>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>>>>>>> > 02-01                  UnorderedMuxExchange
>>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>>>>>> startRow=0\x00,
>>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>>>>>>> >
>>>>>>> > The data return very quickly , the result of this sql is :
>>>>>>> > +------+--------+
>>>>>>> > |  k   |   p    |
>>>>>>> > +------+--------+
>>>>>>> > | pay  | 12180  |
>>>>>>> > +------+--------
>>>>>>> >
>>>>>>> > But I have millons of data in the table.
>>>>>>> >
>>>>>>> > I tried to change the physical plan.  if I change the json explain
>>>>>>> > *"columns"
>>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
>>>>>>> correct
>>>>>>> > result.
>>>>>>> >
>>>>>>> > It seems the physical plan is not correct.
>>>>>>> > I also try to debug the sql parser to find out the reason, but its
>>>>>>> too
>>>>>>> > complicate. Can anyone help me.
>>>>>>> >
>>>>>>> > Also this sql have the same issue.
>>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" :
>>>>>>> [
>>>>>>> > "`row_key`" ] *, it will return the correct result.
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: query from hbase issue

Posted by Khurram Faraaz <kf...@maprtech.com>.
Hello Qiang,

DRILL-4686 is reported to track this problem.

Thanks,
Khurram

On Wed, May 18, 2016 at 3:16 PM, qiang li <ti...@gmail.com> wrote:

> Ok, Thanks very much.
>
> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>
>> Hello Qiang,
>>
>> Someone from our Drill team (in San Jose) will get back to you soon. I
>> work from the India lab and I am in a different time zone as compared to
>> San Jose office, some one from MapR San Jose will get back to you as soon
>> as possible.
>>
>> Thanks,
>> Khurram
>>
>> On Wed, May 18, 2016 at 3:09 PM, qiang li <ti...@gmail.com> wrote:
>>
>>> Hi Khurram, Thanks very much to reproduce it, so what's the conclusion?
>>>
>>> Any idea how to sovle it?
>>>
>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>>>
>>>> So I tried to create the table using HBase API (with no data inserted
>>>> into table) and I got the query plan for drill 1.7.0
>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>>>>
>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>>>> +--------------+------------+--------------+
>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>>>> +--------------+------------+--------------+
>>>> | row_key      | ANY        | NO           |
>>>> | v            | MAP        | NO           |
>>>> +--------------+------------+--------------+
>>>> 2 rows selected (1.665 seconds)
>>>>
>>>> Table creation Java program
>>>>
>>>> {noformat}
>>>> public class PutIntDataToHBase {
>>>>     public static void main(String args[]) throws IOException {
>>>>         Configuration conf = HBaseConfiguration.create();
>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>>>>         if (admin.tableExists("browser_action2")) {
>>>>             admin.disableTable("browser_action2");
>>>>             admin.deleteTable("browser_action2");
>>>>         }
>>>>
>>>>         byte[][] SPLIT_KEYS =
>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>>>>         HTableDescriptor tableDesc = new
>>>>             HTableDescriptor(TableName.valueOf("browser_action2"));
>>>>
>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>>>>
>>>>     }
>>>> }
>>>> {noformat}
>>>>
>>>> Query plan for the query that was reported as returning wrong results.
>>>>
>>>> {noformat}
>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>>>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>>> +------+------+
>>>> | text | json |
>>>> +------+------+
>>>> | 00-00    Screen
>>>> 00-01      Project(k=[$0], p=[$1])
>>>> 00-02        UnionExchange
>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>> 01-03              Project($f0=[$0], p=[$1])
>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>>> 02-01                  UnorderedMuxExchange
>>>> 03-01                    Project($f0=[$0], p=[$1],
>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
>>>> stopRow=, filter=null], columns=[`*`]]])
>>>> {noformat}
>>>>
>>>> and the query plan for the other problem query mentioned in the first
>>>> email.
>>>>
>>>> {noformat}
>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>>>> hbase.browser_action2 a group by
>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>>>> +------+------+
>>>> | text | json |
>>>> +------+------+
>>>> | 00-00    Screen
>>>> 00-01      Project(k=[$0], p=[$1])
>>>> 00-02        UnionExchange
>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>> 01-03              Project($f0=[$0], p=[$1])
>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>>> 02-01                  UnorderedMuxExchange
>>>> 03-01                    Project($f0=[$0], p=[$1],
>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>>>> row_key=[$0])
>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
>>>> stopRow=null, filter=null], columns=[`*`]]])
>>>> {noformat}
>>>>
>>>> Thanks,
>>>> Khurram
>>>>
>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com> wrote:
>>>>
>>>>> Yes.
>>>>> I use hbase API to create it.
>>>>>
>>>>> The main code is:
>>>>>
>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, {'7'},{'8'}, {'9'} };
>>>>> TableName tableName = TableName.valueOf("browser_action2");
>>>>>
>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>>>>> tableDesc.addFamily(columnDesc);
>>>>>
>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>>>>
>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>>>>>
>>>>>> Can you provide the CREATE TABLE statement you used to reproduce this
>>>>>> problem so we can try to reproduce it on our end.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> -- Zelaine
>>>>>>
>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > Hi ,
>>>>>> >
>>>>>> > I recently meet a issue that can not query the correct data from
>>>>>> hbase with
>>>>>> > sql by drill, can anybody help me.
>>>>>> >
>>>>>> > I test with the drill 1.6.
>>>>>> > My hbase scheme:
>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>>>>>> > cf : v
>>>>>> > qualifier: v, e0, e1
>>>>>> >
>>>>>> > The wrong result only happened when I use group by clause.
>>>>>> >
>>>>>> > This sql will not return correct result:
>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p
>>>>>> from
>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>>>>> > Part of explain of this sql is:
>>>>>> >
>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>>>>>> CONVERT_FROM(a.`v`.`e0`,
>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>>>>>> > +------+------+
>>>>>> > | text | json |
>>>>>> > +------+------+
>>>>>> > | 00-00    Screen
>>>>>> > 00-01      Project(k=[$0], p=[$1])
>>>>>> > 00-02        UnionExchange
>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>>>>>> > 02-01                  UnorderedMuxExchange
>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>>>>> startRow=0\x00,
>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>>>>>> >
>>>>>> > The data return very quickly , the result of this sql is :
>>>>>> > +------+--------+
>>>>>> > |  k   |   p    |
>>>>>> > +------+--------+
>>>>>> > | pay  | 12180  |
>>>>>> > +------+--------
>>>>>> >
>>>>>> > But I have millons of data in the table.
>>>>>> >
>>>>>> > I tried to change the physical plan.  if I change the json explain
>>>>>> > *"columns"
>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
>>>>>> correct
>>>>>> > result.
>>>>>> >
>>>>>> > It seems the physical plan is not correct.
>>>>>> > I also try to debug the sql parser to find out the reason, but its
>>>>>> too
>>>>>> > complicate. Can anyone help me.
>>>>>> >
>>>>>> > Also this sql have the same issue.
>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
>>>>>> > "`row_key`" ] *, it will return the correct result.
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: query from hbase issue

Posted by qiang li <ti...@gmail.com>.
Ok, Thanks very much.

2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:

> Hello Qiang,
>
> Someone from our Drill team (in San Jose) will get back to you soon. I
> work from the India lab and I am in a different time zone as compared to
> San Jose office, some one from MapR San Jose will get back to you as soon
> as possible.
>
> Thanks,
> Khurram
>
> On Wed, May 18, 2016 at 3:09 PM, qiang li <ti...@gmail.com> wrote:
>
>> Hi Khurram, Thanks very much to reproduce it, so what's the conclusion?
>>
>> Any idea how to sovle it?
>>
>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>>
>>> So I tried to create the table using HBase API (with no data inserted
>>> into table) and I got the query plan for drill 1.7.0
>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>>>
>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>>> +--------------+------------+--------------+
>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>>> +--------------+------------+--------------+
>>> | row_key      | ANY        | NO           |
>>> | v            | MAP        | NO           |
>>> +--------------+------------+--------------+
>>> 2 rows selected (1.665 seconds)
>>>
>>> Table creation Java program
>>>
>>> {noformat}
>>> public class PutIntDataToHBase {
>>>     public static void main(String args[]) throws IOException {
>>>         Configuration conf = HBaseConfiguration.create();
>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>>>         if (admin.tableExists("browser_action2")) {
>>>             admin.disableTable("browser_action2");
>>>             admin.deleteTable("browser_action2");
>>>         }
>>>
>>>         byte[][] SPLIT_KEYS =
>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>>>         HTableDescriptor tableDesc = new
>>>             HTableDescriptor(TableName.valueOf("browser_action2"));
>>>
>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>>>
>>>     }
>>> }
>>> {noformat}
>>>
>>> Query plan for the query that was reported as returning wrong results.
>>>
>>> {noformat}
>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>> +------+------+
>>> | text | json |
>>> +------+------+
>>> | 00-00    Screen
>>> 00-01      Project(k=[$0], p=[$1])
>>> 00-02        UnionExchange
>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>> 01-03              Project($f0=[$0], p=[$1])
>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>> 02-01                  UnorderedMuxExchange
>>> 03-01                    Project($f0=[$0], p=[$1],
>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
>>> stopRow=, filter=null], columns=[`*`]]])
>>> {noformat}
>>>
>>> and the query plan for the other problem query mentioned in the first
>>> email.
>>>
>>> {noformat}
>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>>> hbase.browser_action2 a group by
>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>>> +------+------+
>>> | text | json |
>>> +------+------+
>>> | 00-00    Screen
>>> 00-01      Project(k=[$0], p=[$1])
>>> 00-02        UnionExchange
>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>> 01-03              Project($f0=[$0], p=[$1])
>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>> 02-01                  UnorderedMuxExchange
>>> 03-01                    Project($f0=[$0], p=[$1],
>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>>> row_key=[$0])
>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
>>> stopRow=null, filter=null], columns=[`*`]]])
>>> {noformat}
>>>
>>> Thanks,
>>> Khurram
>>>
>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com> wrote:
>>>
>>>> Yes.
>>>> I use hbase API to create it.
>>>>
>>>> The main code is:
>>>>
>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, {'7'},{'8'}, {'9'} };
>>>> TableName tableName = TableName.valueOf("browser_action2");
>>>>
>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>>>> tableDesc.addFamily(columnDesc);
>>>>
>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>>>
>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>>>>
>>>>
>>>>
>>>>
>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>>>>
>>>>> Can you provide the CREATE TABLE statement you used to reproduce this
>>>>> problem so we can try to reproduce it on our end.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> -- Zelaine
>>>>>
>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Hi ,
>>>>> >
>>>>> > I recently meet a issue that can not query the correct data from
>>>>> hbase with
>>>>> > sql by drill, can anybody help me.
>>>>> >
>>>>> > I test with the drill 1.6.
>>>>> > My hbase scheme:
>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>>>>> > cf : v
>>>>> > qualifier: v, e0, e1
>>>>> >
>>>>> > The wrong result only happened when I use group by clause.
>>>>> >
>>>>> > This sql will not return correct result:
>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p
>>>>> from
>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>>>> > Part of explain of this sql is:
>>>>> >
>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>>>>> CONVERT_FROM(a.`v`.`e0`,
>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>>>>> > +------+------+
>>>>> > | text | json |
>>>>> > +------+------+
>>>>> > | 00-00    Screen
>>>>> > 00-01      Project(k=[$0], p=[$1])
>>>>> > 00-02        UnionExchange
>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>>> > 01-03              Project($f0=[$0], p=[$1])
>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>>>>> > 02-01                  UnorderedMuxExchange
>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>>>> startRow=0\x00,
>>>>> > stopRow=, filter=null], columns=[`*`]]])
>>>>> >
>>>>> > The data return very quickly , the result of this sql is :
>>>>> > +------+--------+
>>>>> > |  k   |   p    |
>>>>> > +------+--------+
>>>>> > | pay  | 12180  |
>>>>> > +------+--------
>>>>> >
>>>>> > But I have millons of data in the table.
>>>>> >
>>>>> > I tried to change the physical plan.  if I change the json explain
>>>>> > *"columns"
>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
>>>>> correct
>>>>> > result.
>>>>> >
>>>>> > It seems the physical plan is not correct.
>>>>> > I also try to debug the sql parser to find out the reason, but its
>>>>> too
>>>>> > complicate. Can anyone help me.
>>>>> >
>>>>> > Also this sql have the same issue.
>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
>>>>> > "`row_key`" ] *, it will return the correct result.
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: query from hbase issue

Posted by Khurram Faraaz <kf...@maprtech.com>.
Hello Qiang,

Someone from our Drill team (in San Jose) will get back to you soon. I work
from the India lab and I am in a different time zone as compared to San
Jose office, some one from MapR San Jose will get back to you as soon as
possible.

Thanks,
Khurram

On Wed, May 18, 2016 at 3:09 PM, qiang li <ti...@gmail.com> wrote:

> Hi Khurram, Thanks very much to reproduce it, so what's the conclusion?
>
> Any idea how to sovle it?
>
> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:
>
>> So I tried to create the table using HBase API (with no data inserted
>> into table) and I got the query plan for drill 1.7.0
>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>>
>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> +--------------+------------+--------------+
>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>> +--------------+------------+--------------+
>> | row_key      | ANY        | NO           |
>> | v            | MAP        | NO           |
>> +--------------+------------+--------------+
>> 2 rows selected (1.665 seconds)
>>
>> Table creation Java program
>>
>> {noformat}
>> public class PutIntDataToHBase {
>>     public static void main(String args[]) throws IOException {
>>         Configuration conf = HBaseConfiguration.create();
>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>>         HBaseAdmin admin = new HBaseAdmin(conf);
>>         if (admin.tableExists("browser_action2")) {
>>             admin.disableTable("browser_action2");
>>             admin.deleteTable("browser_action2");
>>         }
>>
>>         byte[][] SPLIT_KEYS =
>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>>         HTableDescriptor tableDesc = new
>>             HTableDescriptor(TableName.valueOf("browser_action2"));
>>
>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>>         admin.createTable(tableDesc,SPLIT_KEYS);
>>
>>     }
>> }
>> {noformat}
>>
>> Query plan for the query that was reported as returning wrong results.
>>
>> {noformat}
>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>> +------+------+
>> | text | json |
>> +------+------+
>> | 00-00    Screen
>> 00-01      Project(k=[$0], p=[$1])
>> 00-02        UnionExchange
>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> 01-03              Project($f0=[$0], p=[$1])
>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> 02-01                  UnorderedMuxExchange
>> 03-01                    Project($f0=[$0], p=[$1],
>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
>> stopRow=, filter=null], columns=[`*`]]])
>> {noformat}
>>
>> and the query plan for the other problem query mentioned in the first
>> email.
>>
>> {noformat}
>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> hbase.browser_action2 a group by
>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>> +------+------+
>> | text | json |
>> +------+------+
>> | 00-00    Screen
>> 00-01      Project(k=[$0], p=[$1])
>> 00-02        UnionExchange
>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> 01-03              Project($f0=[$0], p=[$1])
>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> 02-01                  UnorderedMuxExchange
>> 03-01                    Project($f0=[$0], p=[$1],
>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>> row_key=[$0])
>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
>> stopRow=null, filter=null], columns=[`*`]]])
>> {noformat}
>>
>> Thanks,
>> Khurram
>>
>> On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com> wrote:
>>
>>> Yes.
>>> I use hbase API to create it.
>>>
>>> The main code is:
>>>
>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, {'7'},{'8'}, {'9'} };
>>> TableName tableName = TableName.valueOf("browser_action2");
>>>
>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>>> tableDesc.addFamily(columnDesc);
>>>
>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>>
>>> admin.createTable(tableDesc, SPLIT_KEYS);
>>>
>>>
>>>
>>>
>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>>>
>>>> Can you provide the CREATE TABLE statement you used to reproduce this
>>>> problem so we can try to reproduce it on our end.
>>>>
>>>> Thanks.
>>>>
>>>> -- Zelaine
>>>>
>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com> wrote:
>>>>
>>>> > Hi ,
>>>> >
>>>> > I recently meet a issue that can not query the correct data from
>>>> hbase with
>>>> > sql by drill, can anybody help me.
>>>> >
>>>> > I test with the drill 1.6.
>>>> > My hbase scheme:
>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>>>> > cf : v
>>>> > qualifier: v, e0, e1
>>>> >
>>>> > The wrong result only happened when I use group by clause.
>>>> >
>>>> > This sql will not return correct result:
>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>>>> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>>> > Part of explain of this sql is:
>>>> >
>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>>>> CONVERT_FROM(a.`v`.`e0`,
>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>>>> > +------+------+
>>>> > | text | json |
>>>> > +------+------+
>>>> > | 00-00    Screen
>>>> > 00-01      Project(k=[$0], p=[$1])
>>>> > 00-02        UnionExchange
>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>> > 01-03              Project($f0=[$0], p=[$1])
>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>>>> > 02-01                  UnorderedMuxExchange
>>>> > 03-01                    Project($f0=[$0], p=[$1],
>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>>> startRow=0\x00,
>>>> > stopRow=, filter=null], columns=[`*`]]])
>>>> >
>>>> > The data return very quickly , the result of this sql is :
>>>> > +------+--------+
>>>> > |  k   |   p    |
>>>> > +------+--------+
>>>> > | pay  | 12180  |
>>>> > +------+--------
>>>> >
>>>> > But I have millons of data in the table.
>>>> >
>>>> > I tried to change the physical plan.  if I change the json explain
>>>> > *"columns"
>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
>>>> correct
>>>> > result.
>>>> >
>>>> > It seems the physical plan is not correct.
>>>> > I also try to debug the sql parser to find out the reason, but its too
>>>> > complicate. Can anyone help me.
>>>> >
>>>> > Also this sql have the same issue.
>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
>>>> > "`row_key`" ] *, it will return the correct result.
>>>> >
>>>>
>>>
>>>
>>
>

Re: query from hbase issue

Posted by qiang li <ti...@gmail.com>.
Hi Khurram, Thanks very much to reproduce it, so what's the conclusion?

Any idea how to sovle it?

2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kf...@maprtech.com>:

> So I tried to create the table using HBase API (with no data inserted into
> table) and I got the query plan for drill 1.7.0
> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>
> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> +--------------+------------+--------------+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--------------+------------+--------------+
> | row_key      | ANY        | NO           |
> | v            | MAP        | NO           |
> +--------------+------------+--------------+
> 2 rows selected (1.665 seconds)
>
> Table creation Java program
>
> {noformat}
> public class PutIntDataToHBase {
>     public static void main(String args[]) throws IOException {
>         Configuration conf = HBaseConfiguration.create();
>         conf.set("hbase.zookeeper.property.clientPort","5181");
>         HBaseAdmin admin = new HBaseAdmin(conf);
>         if (admin.tableExists("browser_action2")) {
>             admin.disableTable("browser_action2");
>             admin.deleteTable("browser_action2");
>         }
>
>         byte[][] SPLIT_KEYS =
> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>         HTableDescriptor tableDesc = new
>             HTableDescriptor(TableName.valueOf("browser_action2"));
>
>         tableDesc.addFamily(new HColumnDescriptor("v"));
>         admin.createTable(tableDesc,SPLIT_KEYS);
>
>     }
> }
> {noformat}
>
> Query plan for the query that was reported as returning wrong results.
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(k=[$0], p=[$1])
> 00-02        UnionExchange
> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> 01-03              Project($f0=[$0], p=[$1])
> 01-04                HashToRandomExchange(dist0=[[$0]])
> 02-01                  UnorderedMuxExchange
> 03-01                    Project($f0=[$0], p=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> 03-03                        Project($f0=[ITEM($1, 'e0')])
> 03-04                          Scan(groupscan=[HBaseGroupScan
> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
> stopRow=, filter=null], columns=[`*`]]])
> {noformat}
>
> and the query plan for the other problem query mentioned in the first
> email.
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> . . . . . . . . . . . . . . > count(a.row_key) p from
> hbase.browser_action2 a group by
> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(k=[$0], p=[$1])
> 00-02        UnionExchange
> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> 01-03              Project($f0=[$0], p=[$1])
> 01-04                HashToRandomExchange(dist0=[[$0]])
> 02-01                  UnorderedMuxExchange
> 03-01                    Project($f0=[$0], p=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
> row_key=[$0])
> 03-04                          Scan(groupscan=[HBaseGroupScan
> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
> stopRow=null, filter=null], columns=[`*`]]])
> {noformat}
>
> Thanks,
> Khurram
>
> On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com> wrote:
>
>> Yes.
>> I use hbase API to create it.
>>
>> The main code is:
>>
>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, {'7'},{'8'}, {'9'} };
>> TableName tableName = TableName.valueOf("browser_action2");
>>
>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>> tableDesc.addFamily(columnDesc);
>>
>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>
>> admin.createTable(tableDesc, SPLIT_KEYS);
>>
>>
>>
>>
>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>>
>>> Can you provide the CREATE TABLE statement you used to reproduce this
>>> problem so we can try to reproduce it on our end.
>>>
>>> Thanks.
>>>
>>> -- Zelaine
>>>
>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com> wrote:
>>>
>>> > Hi ,
>>> >
>>> > I recently meet a issue that can not query the correct data from hbase
>>> with
>>> > sql by drill, can anybody help me.
>>> >
>>> > I test with the drill 1.6.
>>> > My hbase scheme:
>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>>> > cf : v
>>> > qualifier: v, e0, e1
>>> >
>>> > The wrong result only happened when I use group by clause.
>>> >
>>> > This sql will not return correct result:
>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>>> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>> > Part of explain of this sql is:
>>> >
>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>>> CONVERT_FROM(a.`v`.`e0`,
>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
>>> > a.row_key > '0'  group by a.`v`.`e0`;
>>> > +------+------+
>>> > | text | json |
>>> > +------+------+
>>> > | 00-00    Screen
>>> > 00-01      Project(k=[$0], p=[$1])
>>> > 00-02        UnionExchange
>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>> > 01-03              Project($f0=[$0], p=[$1])
>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>>> > 02-01                  UnorderedMuxExchange
>>> > 03-01                    Project($f0=[$0], p=[$1],
>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>> startRow=0\x00,
>>> > stopRow=, filter=null], columns=[`*`]]])
>>> >
>>> > The data return very quickly , the result of this sql is :
>>> > +------+--------+
>>> > |  k   |   p    |
>>> > +------+--------+
>>> > | pay  | 12180  |
>>> > +------+--------
>>> >
>>> > But I have millons of data in the table.
>>> >
>>> > I tried to change the physical plan.  if I change the json explain
>>> > *"columns"
>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
>>> correct
>>> > result.
>>> >
>>> > It seems the physical plan is not correct.
>>> > I also try to debug the sql parser to find out the reason, but its too
>>> > complicate. Can anyone help me.
>>> >
>>> > Also this sql have the same issue.
>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>> > count(a.row_key) p from hbase.browser_action2 a group by
>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
>>> > "`row_key`" ] *, it will return the correct result.
>>> >
>>>
>>
>>
>

Re: query from hbase issue

Posted by Khurram Faraaz <kf...@maprtech.com>.
So I tried to create the table using HBase API (with no data inserted into
table) and I got the query plan for drill 1.7.0
Drill 1.7.0-SNAPSHOT  commit ID :  09b26277

0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
+--------------+------------+--------------+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--------------+------------+--------------+
| row_key      | ANY        | NO           |
| v            | MAP        | NO           |
+--------------+------------+--------------+
2 rows selected (1.665 seconds)

Table creation Java program

{noformat}
public class PutIntDataToHBase {
    public static void main(String args[]) throws IOException {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.property.clientPort","5181");
        HBaseAdmin admin = new HBaseAdmin(conf);
        if (admin.tableExists("browser_action2")) {
            admin.disableTable("browser_action2");
            admin.deleteTable("browser_action2");
        }

        byte[][] SPLIT_KEYS =
{{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
        HTableDescriptor tableDesc = new
            HTableDescriptor(TableName.valueOf("browser_action2"));

        tableDesc.addFamily(new HColumnDescriptor("v"));
        admin.createTable(tableDesc,SPLIT_KEYS);

    }
}
{noformat}

Query plan for the query that was reported as returning wrong results.

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select
CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(k=[$0], p=[$1])
00-02        UnionExchange
01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
01-03              Project($f0=[$0], p=[$1])
01-04                HashToRandomExchange(dist0=[[$0]])
02-01                  UnorderedMuxExchange
03-01                    Project($f0=[$0], p=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
03-03                        Project($f0=[ITEM($1, 'e0')])
03-04                          Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
stopRow=, filter=null], columns=[`*`]]])
{noformat}

and the query plan for the other problem query mentioned in the first email.

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select
CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
. . . . . . . . . . . . . . > count(a.row_key) p from hbase.browser_action2
a group by
. . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(k=[$0], p=[$1])
00-02        UnionExchange
01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
01-03              Project($f0=[$0], p=[$1])
01-04                HashToRandomExchange(dist0=[[$0]])
02-01                  UnorderedMuxExchange
03-01                    Project($f0=[$0], p=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
row_key=[$0])
03-04                          Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
stopRow=null, filter=null], columns=[`*`]]])
{noformat}

Thanks,
Khurram

On Wed, May 18, 2016 at 7:01 AM, qiang li <ti...@gmail.com> wrote:

> Yes.
> I use hbase API to create it.
>
> The main code is:
>
> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, {'7'},{'8'}, {'9'} };
> TableName tableName = TableName.valueOf("browser_action2");
>
> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> tableDesc.addFamily(columnDesc);
>
> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>
> admin.createTable(tableDesc, SPLIT_KEYS);
>
>
>
>
> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>
>> Can you provide the CREATE TABLE statement you used to reproduce this
>> problem so we can try to reproduce it on our end.
>>
>> Thanks.
>>
>> -- Zelaine
>>
>> On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com> wrote:
>>
>> > Hi ,
>> >
>> > I recently meet a issue that can not query the correct data from hbase
>> with
>> > sql by drill, can anybody help me.
>> >
>> > I test with the drill 1.6.
>> > My hbase scheme:
>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>> > cf : v
>> > qualifier: v, e0, e1
>> >
>> > The wrong result only happened when I use group by clause.
>> >
>> > This sql will not return correct result:
>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>> > Part of explain of this sql is:
>> >
>> > 0: jdbc:drill:zk=rfdc5> explain plan for select CONVERT_FROM(a.`v`.`e0`,
>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
>> > a.row_key > '0'  group by a.`v`.`e0`;
>> > +------+------+
>> > | text | json |
>> > +------+------+
>> > | 00-00    Screen
>> > 00-01      Project(k=[$0], p=[$1])
>> > 00-02        UnionExchange
>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> > 01-03              Project($f0=[$0], p=[$1])
>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>> > 02-01                  UnorderedMuxExchange
>> > 03-01                    Project($f0=[$0], p=[$1],
>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
>> > stopRow=, filter=null], columns=[`*`]]])
>> >
>> > The data return very quickly , the result of this sql is :
>> > +------+--------+
>> > |  k   |   p    |
>> > +------+--------+
>> > | pay  | 12180  |
>> > +------+--------
>> >
>> > But I have millons of data in the table.
>> >
>> > I tried to change the physical plan.  if I change the json explain
>> > *"columns"
>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
>> correct
>> > result.
>> >
>> > It seems the physical plan is not correct.
>> > I also try to debug the sql parser to find out the reason, but its too
>> > complicate. Can anyone help me.
>> >
>> > Also this sql have the same issue.
>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>> > count(a.row_key) p from hbase.browser_action2 a group by
>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
>> > "`row_key`" ] *, it will return the correct result.
>> >
>>
>
>

Re: query from hbase issue

Posted by qiang li <ti...@gmail.com>.
Yes.
I use hbase API to create it.

The main code is:

byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'},
{'6'}, {'7'},{'8'}, {'9'} };
TableName tableName = TableName.valueOf("browser_action2");

HTableDescriptor tableDesc = new HTableDescriptor(tableName);
HColumnDescriptor columnDesc = new HColumnDescriptor("v");
tableDesc.addFamily(columnDesc);

columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);

admin.createTable(tableDesc, SPLIT_KEYS);




2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:

> Can you provide the CREATE TABLE statement you used to reproduce this
> problem so we can try to reproduce it on our end.
>
> Thanks.
>
> -- Zelaine
>
> On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com> wrote:
>
> > Hi ,
> >
> > I recently meet a issue that can not query the correct data from hbase
> with
> > sql by drill, can anybody help me.
> >
> > I test with the drill 1.6.
> > My hbase scheme:
> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
> > cf : v
> > qualifier: v, e0, e1
> >
> > The wrong result only happened when I use group by clause.
> >
> > This sql will not return correct result:
> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
> > Part of explain of this sql is:
> >
> > 0: jdbc:drill:zk=rfdc5> explain plan for select CONVERT_FROM(a.`v`.`e0`,
> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
> > a.row_key > '0'  group by a.`v`.`e0`;
> > +------+------+
> > | text | json |
> > +------+------+
> > | 00-00    Screen
> > 00-01      Project(k=[$0], p=[$1])
> > 00-02        UnionExchange
> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> > 01-03              Project($f0=[$0], p=[$1])
> > 01-04                HashToRandomExchange(dist0=[[$0]])
> > 02-01                  UnorderedMuxExchange
> > 03-01                    Project($f0=[$0], p=[$1],
> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> > 03-03                        Project($f0=[ITEM($1, 'e0')])
> > 03-04                          Scan(groupscan=[HBaseGroupScan
> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
> > stopRow=, filter=null], columns=[`*`]]])
> >
> > The data return very quickly , the result of this sql is :
> > +------+--------+
> > |  k   |   p    |
> > +------+--------+
> > | pay  | 12180  |
> > +------+--------
> >
> > But I have millons of data in the table.
> >
> > I tried to change the physical plan.  if I change the json explain
> > *"columns"
> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
> correct
> > result.
> >
> > It seems the physical plan is not correct.
> > I also try to debug the sql parser to find out the reason, but its too
> > complicate. Can anyone help me.
> >
> > Also this sql have the same issue.
> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> > count(a.row_key) p from hbase.browser_action2 a group by
> > BYTE_SUBSTR(a.row_key, 1 , 9);
> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
> > "`row_key`" ] *, it will return the correct result.
> >
>

Re: query from hbase issue

Posted by Zelaine Fong <zf...@maprtech.com>.
Can you provide the CREATE TABLE statement you used to reproduce this
problem so we can try to reproduce it on our end.

Thanks.

-- Zelaine

On Tue, May 17, 2016 at 4:50 AM, qiang li <ti...@gmail.com> wrote:

> Hi ,
>
> I recently meet a issue that can not query the correct data from hbase with
> sql by drill, can anybody help me.
>
> I test with the drill 1.6.
> My hbase scheme:
> rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
> cf : v
> qualifier: v, e0, e1
>
> The wrong result only happened when I use group by clause.
>
> This sql will not return correct result:
> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
> Part of explain of this sql is:
>
> 0: jdbc:drill:zk=rfdc5> explain plan for select CONVERT_FROM(a.`v`.`e0`,
> 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
> a.row_key > '0'  group by a.`v`.`e0`;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(k=[$0], p=[$1])
> 00-02        UnionExchange
> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> 01-03              Project($f0=[$0], p=[$1])
> 01-04                HashToRandomExchange(dist0=[[$0]])
> 02-01                  UnorderedMuxExchange
> 03-01                    Project($f0=[$0], p=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> 03-03                        Project($f0=[ITEM($1, 'e0')])
> 03-04                          Scan(groupscan=[HBaseGroupScan
> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
> stopRow=, filter=null], columns=[`*`]]])
>
> The data return very quickly , the result of this sql is :
> +------+--------+
> |  k   |   p    |
> +------+--------+
> | pay  | 12180  |
> +------+--------
>
> But I have millons of data in the table.
>
> I tried to change the physical plan.  if I change the json explain
> *"columns"
> : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the correct
> result.
>
> It seems the physical plan is not correct.
> I also try to debug the sql parser to find out the reason, but its too
> complicate. Can anyone help me.
>
> Also this sql have the same issue.
> select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> count(a.row_key) p from hbase.browser_action2 a group by
> BYTE_SUBSTR(a.row_key, 1 , 9);
> I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
> "`row_key`" ] *, it will return the correct result.
>