You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Nikolay (JIRA)" <ji...@apache.org> on 2015/12/24 13:49:49 UTC
[jira] [Commented] (DRILL-4221) Not correct number of rows when rows are fetched from hbase storage

    [ https://issues.apache.org/jira/browse/DRILL-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070966#comment-15070966 ] 

Nikolay commented on DRILL-4221:
--------------------------------

Physical plan for query 
{quote}SELECT count(`demo_xm:events`.PAYMENT.payType) FROM hbase.`demo_xm:events`{quote}
which returns correct result 

{color:red}00-00    Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.35544341E7 rows, 2.181038281E8 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2964
00-01      Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.3554434E7 rows, 2.18103828E8 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2963
00-02        StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.3554434E7 rows, 2.18103828E8 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2962
00-03          UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.3554433E7 rows, 2.18103816E8 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2961
01-01            StreamAgg(group=[{}], EXPR$0=[COUNT($0)]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.3554432E7 rows, 2.18103808E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2960
01-02              Project($f0=[ITEM($0, 'payType')]) : rowType = RecordType(ANY $f0): rowcount = 1.6777216E7, cumulative cost = {1.6777216E7 rows, 1.6777216E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2959
01-03                Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=demo_xm:events, startRow=null, stopRow=null, filter=null], columns=[`PAYMENT`.`payType`]]]) : rowType = RecordType((VARCHAR(1), ANY) MAP PAYMENT): rowcount = 1.6777216E7, cumulative cost = {1.6777216E7 rows, 1.6777216E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2958{color}


And for query
{quote}SELECT count(`demo_xm:events`.PAYMENT.payType) FROM hbase.`demo_xm:events` where row_key BETWEEN 'a' AND 'f'{quote}
which returns incorrect result 

{color:red}00-00    Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291458.1 rows, 4.40402121E7 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2746
00-01      Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291458.0 rows, 4.4040212E7 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2745
00-02        StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291458.0 rows, 4.4040212E7 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2744
00-03          UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291457.0 rows, 4.40402E7 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2743
01-01            StreamAgg(group=[{}], EXPR$0=[COUNT($0)]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291456.0 rows, 4.4040192E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2742
01-02              Project($f0=[ITEM($1, 'payType')]) : rowType = RecordType(ANY $f0): rowcount = 3145728.0, cumulative cost = {3145728.0 rows, 6291456.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2741
01-03                Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=demo_xm:events, startRow=a, stopRow=f\x00, filter=FilterList AND (2/2): [RowFilter (GREATER_OR_EQUAL, a), RowFilter (LESS_OR_EQUAL, f)]], columns=[`*`]]]) : rowType = RecordType(ANY row_key, (VARCHAR(1), ANY) MAP PAYMENT): rowcount = 3145728.0, cumulative cost = {3145728.0 rows, 6291456.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2740{color}

> Not correct number of rows when rows are fetched from hbase storage
> -------------------------------------------------------------------
>
>                 Key: DRILL-4221
>                 URL: https://issues.apache.org/jira/browse/DRILL-4221
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - HBase
>    Affects Versions: 1.3.0
>         Environment: Linux 14.04, JAVA 1.7.0_67
>            Reporter: Nikolay
>            Priority: Critical
>
> Drill returns incorrect result when rows more than *832*. 
> When I run query *SELECT count(row_key) FROM hbase.`ns:events`*, query returns *833*(correct result).
> If I run query *SELECT count(\*) FROM hbase.`ns:events`*, query returns *832*(incorrect result). 
> Also incorrect results can be for other type of queries. For example if I run query *SELECT count(`ns:events`.CF.column1) FROM hbase.`ns:events`*, query returns *833*(correct results) but if I run query with"WHERE CLAUSE":  *SELECT count(`ns:events`.CF.column1) FROM hbase.`ns:events` where row_key BETWEEN 'a' AND 'f'* (Range from '*a*' to '*f*' is enough for getting all rows, all keys starts with 'b') in this case the query returns incorrect result(*832*). 
> Also strange behavior is continued, when I added another 1000 rows, then the query that returns an incorrect number of rows (*832*), began to return the result *831*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)