You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Nikolay (JIRA)" <ji...@apache.org> on 2015/12/24 13:49:49 UTC
[jira] [Commented] (DRILL-4221) Not correct number of rows when
rows are fetched from hbase storage
[ https://issues.apache.org/jira/browse/DRILL-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070966#comment-15070966 ]
Nikolay commented on DRILL-4221:
--------------------------------
Physical plan for query
{quote}SELECT count(`demo_xm:events`.PAYMENT.payType) FROM hbase.`demo_xm:events`{quote}
which returns correct result
{color:red}00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.35544341E7 rows, 2.181038281E8 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2964
00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.3554434E7 rows, 2.18103828E8 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2963
00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.3554434E7 rows, 2.18103828E8 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2962
00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.3554433E7 rows, 2.18103816E8 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2961
01-01 StreamAgg(group=[{}], EXPR$0=[COUNT($0)]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {3.3554432E7 rows, 2.18103808E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2960
01-02 Project($f0=[ITEM($0, 'payType')]) : rowType = RecordType(ANY $f0): rowcount = 1.6777216E7, cumulative cost = {1.6777216E7 rows, 1.6777216E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2959
01-03 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=demo_xm:events, startRow=null, stopRow=null, filter=null], columns=[`PAYMENT`.`payType`]]]) : rowType = RecordType((VARCHAR(1), ANY) MAP PAYMENT): rowcount = 1.6777216E7, cumulative cost = {1.6777216E7 rows, 1.6777216E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2958{color}
And for query
{quote}SELECT count(`demo_xm:events`.PAYMENT.payType) FROM hbase.`demo_xm:events` where row_key BETWEEN 'a' AND 'f'{quote}
which returns incorrect result
{color:red}00-00 Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291458.1 rows, 4.40402121E7 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2746
00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291458.0 rows, 4.4040212E7 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2745
00-02 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291458.0 rows, 4.4040212E7 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2744
00-03 UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291457.0 rows, 4.40402E7 cpu, 0.0 io, 4096.0 network, 0.0 memory}, id = 2743
01-01 StreamAgg(group=[{}], EXPR$0=[COUNT($0)]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {6291456.0 rows, 4.4040192E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2742
01-02 Project($f0=[ITEM($1, 'payType')]) : rowType = RecordType(ANY $f0): rowcount = 3145728.0, cumulative cost = {3145728.0 rows, 6291456.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2741
01-03 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=demo_xm:events, startRow=a, stopRow=f\x00, filter=FilterList AND (2/2): [RowFilter (GREATER_OR_EQUAL, a), RowFilter (LESS_OR_EQUAL, f)]], columns=[`*`]]]) : rowType = RecordType(ANY row_key, (VARCHAR(1), ANY) MAP PAYMENT): rowcount = 3145728.0, cumulative cost = {3145728.0 rows, 6291456.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2740{color}
> Not correct number of rows when rows are fetched from hbase storage
> -------------------------------------------------------------------
>
> Key: DRILL-4221
> URL: https://issues.apache.org/jira/browse/DRILL-4221
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - HBase
> Affects Versions: 1.3.0
> Environment: Linux 14.04, JAVA 1.7.0_67
> Reporter: Nikolay
> Priority: Critical
>
> Drill returns incorrect result when rows more than *832*.
> When I run query *SELECT count(row_key) FROM hbase.`ns:events`*, query returns *833*(correct result).
> If I run query *SELECT count(\*) FROM hbase.`ns:events`*, query returns *832*(incorrect result).
> Also incorrect results can be for other type of queries. For example if I run query *SELECT count(`ns:events`.CF.column1) FROM hbase.`ns:events`*, query returns *833*(correct results) but if I run query with"WHERE CLAUSE": *SELECT count(`ns:events`.CF.column1) FROM hbase.`ns:events` where row_key BETWEEN 'a' AND 'f'* (Range from '*a*' to '*f*' is enough for getting all rows, all keys starts with 'b') in this case the query returns incorrect result(*832*).
> Also strange behavior is continued, when I added another 1000 rows, then the query that returns an incorrect number of rows (*832*), began to return the result *831*
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)