You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "James Turton (Jira)" <ji...@apache.org> on 2022/08/18 05:24:00 UTC

[jira] [Commented] (DRILL-8231) Wrong result in the COUNT function position.

    [ https://issues.apache.org/jira/browse/DRILL-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581151#comment-17581151 ] 

James Turton commented on DRILL-8231:
-------------------------------------

Still broken in Drill master. The COL6408 expression SUM(CAST(val11 as BIGINT)+CAST(val12 as BIGINT)) can be replaced with simplicifications like MAX(val11) or MAX(val12) while still reproducing the bug so it looks like the problem arises when either of these two varchar columns participates in the key used for the hash exchange.


{code:java}
text  00-00    Screen
00-01      Project(COL6408=[$0], COL4452=[$1])
00-02        StreamAgg(group=[{}], COL6408=[MAX($1)], COL4452=[COUNT($0)])
00-03          UnionExchange
01-01            HashAgg(group=[{1}], COL6408=[MAX($0)])
01-02              Project(val11=[$0], val2=[$1])
01-03                HashToRandomExchange(dist0=[[$0]])
02-01                  UnorderedMuxExchange
03-01                    Project(val11=[$0], val2=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0, 1301011:BIGINT)])
03-02                      Scan(table=[[dfs, tmp, /8231/data/*/log_15872_R_79_*.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/8231/data/02/log_15872_R_79_2022051819502000.parquet], ReadEntryWithPath [path=/tmp/8231/data/10/log_15872_R_79_2022051819502000.parquet], ReadEntryWithPath [path=/tmp/8231/data/05/log_15872_R_79_2022051819502000.parquet], ReadEntryWithPath [path=/tmp/8231/data/07/log_15872_R_79_2022051819502000.parquet], ReadEntryWithPath [path=/tmp/8231/data/09/log_15872_R_79_2022051819502000.parquet], ReadEntryWithPath [path=/tmp/8231/data/03/log_15872_R_79_2022051819502000.parquet], ReadEntryWithPath [path=/tmp/8231/data/04/log_15872_R_79_2022051819502000.parquet], ReadEntryWithPath [path=/tmp/8231/data/08/log_15872_R_79_2022051819502000.parquet], ReadEntryWithPath [path=/tmp/8231/data/06/log_15872_R_79_2022051819502000.parquet]], selectionRoot=file:/tmp/8231/data, numFiles=9, numRowGroups=9, usedMetadataFile=false, usedMetastore=false, columns=[`val11`, `val2`]]]){code}

> Wrong result in the COUNT function position.
> --------------------------------------------
>
>                 Key: DRILL-8231
>                 URL: https://issues.apache.org/jira/browse/DRILL-8231
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.18.0, 1.19.0
>            Reporter: manabu nagamine
>            Priority: Major
>         Attachments: drill.zip
>
>
> Hi Team.
> We using Drill 1.18.
> There is a phenomenon that the count values of COL4452 are different in the execution results of the following queries.
> The only difference is that the positions of COL4452 and COL6408 have been changed.
> {code:java}
> 1. 
> select COUNT(DISTINCT val2) COL4452, SUM(CAST(val11 as BIGINT)+CAST(val12 as BIGINT)) COL6408 from dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1  and ( ( dir0 between '01' and '10' )  ) and ( LOG_DATE >= '2022-04-01 00:00:00.000000' and LOG_DATE <= '2022-04-30 23:59:59.000000'); 
> 2.
> select SUM(CAST(val11 as BIGINT)+CAST(val12 as BIGINT)) COL6408, COUNT(DISTINCT val2) COL4452 from dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1  and ( ( dir0 between '01' and '10' )  ) and ( LOG_DATE >= '2022-04-01 00:00:00.000000' and LOG_DATE <= '2022-04-30 23:59:59.000000');{code}
> As for the actual data, the count with COL4452 at the beginning of 1. is correct.
> I am having trouble understanding the cause of this phenomenon.
> Can anybody help me?Thanks in advance.
> Attached the parquet log file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)