You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Rodion Myronov (JIRA)" <ji...@apache.org> on 2018/07/05 07:28:00 UTC

[jira] [Commented] (KUDU-2463) Different results returned by group by on count() metric

    [ https://issues.apache.org/jira/browse/KUDU-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533342#comment-16533342 ] 

Rodion Myronov commented on KUDU-2463:
--------------------------------------

I've faced a very similar issue with my Impala/Kudu cluster returning different results for the same query with no real data changes at the background.

Setup:
 * CHD 5.14 (Kudu 1.6.0, Impala 2.11)
 * 3 Kudu tablet servers colocated with impalad processes.
 * 3 Kudu masters at the different nodes.
 * partitioned Kudu table (23 partitions, replication factor 3, 69 tablets in total)

Issue:
 * I run “select count(*) from my_kudu_table” via Impala (Hue) and get different numbers. To be specific, I get 3 different results in a random order, like this: 568142536, 568142536, 567932431, 568142536, 568210042, 568210042, 568142536 etc.
 * I’m sure there are no background data changes.
 * the same behavior is observed for several tables in that cluster

What has already been done:
 * logs checked (impalad, kudu tserver, kudu master). There are some errors related to (assumption) time synchronization across the cluster. Nothing really useful to explain this issue (as for my limited understanding)
 * Kudu and Impala services restarted. The issue is still present after restart.
 * "kudu cluster ksck -checksum_scan" executed. No issues found. 
 * checked if query result depends on impala coordinator node. No dependency.
 * "kudu fs check" executed. It reports some orphaned blocks at tablet servers. Not sure critical this might be.

In hope it's just Impala-level issue, I've also prepared a kudu-python script which calculates the number of rows in the table via API, with different replica selection modes. It has proven Kudu returns different results depending on replica used (in LEADER_ONLY and CLOSEST_REPLICA modes):

 
{noformat}
[datalake@ip-10-0-102-218 ~]$ python3.6 /tmp/get_row_count_fast.py
Scanning "impala::prep.pos_transaction" table
Scan mode is LEADER_ONLY
Number of rows: 568210042
Batches read from tablet servers:
{'ip-10-0-100-90.customer.dmp.internal': 6556, 'ip-10-0-100-105.customer.dmp.internal': 653, 'ip-10-0-100-52.customer.dmp.internal': 15552}

[datalake@ip-10-0-102-218 ~]$ python3.6 /tmp/get_row_count_fast.py
Scanning "impala::prep.pos_transaction" table
Scan mode is CLOSEST_REPLICA
Number of rows: 568142536
Batches read from tablet servers:
{'ip-10-0-100-90.customer.dmp.internal': 883, 'ip-10-0-100-105.customer.dmp.internal': 8107, 'ip-10-0-100-52.customer.dmp.internal': 13768}
{noformat}
 

 

Attachments:
 * checksum_scan.txt - the result of "kudu cluster csck -checksum_scan" for the table

 * fs_check_tablet_servers.txt - the result of "kudu fs check" for 3 tablet servers

 * fs_check_master_servers.txt - the result of "kudu fs check" for 3 master servers

 

> Different results returned by group by on count() metric
> --------------------------------------------------------
>
>                 Key: KUDU-2463
>                 URL: https://issues.apache.org/jira/browse/KUDU-2463
>             Project: Kudu
>          Issue Type: Bug
>          Components: impala
>    Affects Versions: 1.5.0
>            Reporter: Tomas Farkas
>            Priority: Critical
>             Fix For: n/a
>
>         Attachments: checksum_scan.txt, fs_check_master_servers.txt, fs_check_tablet_servers.txt
>
>
> Hi, 
> I have a static table in Kudu, no inserts/updates or deletes are running on the cluster. The query returns DIFFERENT result when I change the where condition on one of the primary key columns, which is in the group_by list.
> The created_date is part of the PK and is type of int.
> PK contains subscriber, time, date, identifier and created_date.
> I tried to check if the inserted count is equal to the HDFS table, and noticed on one day, that the count differs based on the where criteria!!
> {quote} 
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu where created_date >= 20180601 group by created_date;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where created_date >= 20180601 group by created_date}}
>  {{Query submitted at: 2018-06-04 21:06:30 (Coordinator: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=ce4e92eda5aaa02f:ea07aa4600000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180601 | 18076448 | -> 195k MORE!!!}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 4 row(s) in 0.37s}}
>  {{[10.197.0.164:21000] >}}
>  {{[10.197.0.164:21000] >}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu where created_date >= 20180601 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where created_date >= 20180601 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:06:55 (Coordinator: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=d541a9dda19e28e4:be4a2ca000000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180601 | 18076448 | -> 195k MORE!!!}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 4 row(s) in 1.14s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu where created_date >= 20180528 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where created_date >= 20180528 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:07:12 (Coordinator: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=774a261fb94ad2bb:aab28b8b00000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180528 | 17607943 |}}
>  {{| 20180529 | 20741097 |}}
>  {{| 20180530 | 17362364 |}}
>  {{| 20180531 | 16877228 |}}
>  \{{| 20180601 | 17925671 | -> 44k MORE!! }}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 8 row(s) in 0.67s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu where created_date >= 20180525 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where created_date >= 20180525 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:07:25 (Coordinator: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=38483ad3ae5c8eb9:a538cb6300000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180525 | 22309857 |}}
>  {{| 20180526 | 15268520 |}}
>  {{| 20180527 | 14939691 |}}
>  {{| 20180528 | 17607943 |}}
>  {{| 20180529 | 20741097 |}}
>  {{| 20180530 | 17362364 |}}
>  {{| 20180531 | 16903829 |}}
>  {{| 20180601 | 18047010 | -> 165k MORE!!!}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 11 row(s) in 0.85s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu where created_date = 20180601 group by created_date;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where created_date = 20180601 group by created_date}}
>  {{Query submitted at: 2018-06-04 21:07:42 (Coordinator: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=7343ba31f6b4c86f:621a7b8c00000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180601 | 17881253 | -> CORRECT ONE}}
>  {{+---------------+---------+}}
>  {{Fetched 1 row(s) in 0.27s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu where created_date >= 20180525 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where created_date >= 20180525 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:12:02 (Coordinator: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=4141df26117f35c3:9ab2f0700000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180525 | 22309857 |}}
>  {{| 20180526 | 15268520 |}}
>  {{| 20180527 | 14939691 |}}
>  {{| 20180528 | 17607943 |}}
>  {{| 20180529 | 20741097 |}}
>  {{| 20180530 | 17362364 |}}
>  {{| 20180531 | 16903829 |}}
>  {{| 20180601 | 18047010 | -> AGAIN WRONG RESULT!!}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 11 row(s) in 1.04s}}{{}}{{}}
> Again, no other inserts/selects/updates or deletes were running between these statements on the cluster. 
>  
> I checked the explain, if there is a difference,but it looks ok. But the result is different!
>  
> {{[10.197.0.164:21000] > explain select created_date, count(*) from base.usage_kudu where created_date = 20180601 group by created_date;}}
>  {{Query: explain select created_date, count(*) from base.usage_kudu where created_date = 20180601 group by created_date}}
>  {{+--------------------------------------------------+}}
>  {{| Explain String |}}
>  {{+--------------------------------------------------+}}
>  {{| Max Per-Host Resource Reservation: Memory=3.94MB |}}
>  {{| Per-Host Resource Estimates: Memory=20.00MB |}}
>  {{| |}}
>  {{| PLAN-ROOT SINK |}}
>  {{| | |}}
>  {{| 04:EXCHANGE [UNPARTITIONED] |}}
>  {{| | |}}
>  {{| 03:AGGREGATE [FINALIZE] |}}
>  {{| | output: count:merge(*) |}}
>  {{| | group by: created_date |}}
>  {{| | |}}
>  {{| 02:EXCHANGE [HASH(created_date)] |}}
>  {{| | |}}
>  {{| 01:AGGREGATE [STREAMING] |}}
>  {{| | output: count(*) |}}
>  {{| | group by: created_date |}}
>  {{| | |}}
>  {{| 00:SCAN KUDU [base.usage_kudu] |}}
>  {{| kudu predicates: created_date = 20180601 |}}
>  {{+--------------------------------------------------+}}
>  {{Fetched 19 row(s) in 0.06s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu where created_date = 20180601 group by created_date;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where created_date = 20180601 group by created_date}}
>  {{Query submitted at: 2018-06-04 21:17:21 (Coordinator: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=c449aabea51e7456:612f096400000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180601 | 17881253 |}}
>  {{+---------------+---------+}}
>  {{Fetched 1 row(s) in 0.38s}}
>  {{[10.197.0.164:21000] > explain select created_date, count(*) from base.usage_kudu where created_date >= 20180525 group by created_date order by 1;}}
>  {{Query: explain select created_date, count(*) from base.usage_kudu where created_date >= 20180525 group by created_date order by 1}}
>  {{+--------------------------------------------------+}}
>  {{| Explain String |}}
>  {{+--------------------------------------------------+}}
>  {{| Max Per-Host Resource Reservation: Memory=9.94MB |}}
>  {{| Per-Host Resource Estimates: Memory=26.00MB |}}
>  {{| |}}
>  {{| PLAN-ROOT SINK |}}
>  {{| | |}}
>  {{| 05:MERGING-EXCHANGE [UNPARTITIONED] |}}
>  {{| | order by: created_date ASC |}}
>  {{| | |}}
>  {{| 02:SORT |}}
>  {{| | order by: created_date ASC |}}
>  {{| | |}}
>  {{| 04:AGGREGATE [FINALIZE] |}}
>  {{| | output: count:merge(*) |}}
>  {{| | group by: created_date |}}
>  {{| | |}}
>  {{| 03:EXCHANGE [HASH(created_date)] |}}
>  {{| | |}}
>  {{| 01:AGGREGATE [STREAMING] |}}
>  {{| | output: count(*) |}}
>  {{| | group by: created_date |}}
>  {{| | |}}
>  {{| 00:SCAN KUDU [base.usage_kudu] |}}
>  {{| kudu predicates: created_date >= 20180525 |}}
>  {{+--------------------------------------------------+}}
>  {{Fetched 23 row(s) in 0.05s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu where created_date >= 20180525 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where created_date >= 20180525 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:17:32 (Coordinator: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=bc4a36f2a7ad3280:c7b09a5100000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180525 | 22309857 |}}
>  {{| 20180526 | 15268520 |}}
>  {{| 20180527 | 14939691 |}}
>  {{| 20180528 | 17607943 |}}
>  {{| 20180529 | 20741097 |}}
>  {{| 20180530 | 17362364 |}}
>  {{| 20180531 | 16903829 |}}
>  {{| 20180601 | 18047010 |}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 11 row(s) in 0.88s}}
>  
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)