You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (JIRA)" <ji...@apache.org> on 2019/01/01 02:36:00 UTC
[jira] [Commented] (DRILL-6938) SQL get the wrong result after
hashjoin and hashagg disabled
[ https://issues.apache.org/jira/browse/DRILL-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731501#comment-16731501 ]
Boaz Ben-Zvi commented on DRILL-6938:
-------------------------------------
I tried to reproduce, but could not see this failure. I created a Parquet table with 500k rows, containing the above values (plus few other values), mixed at random. HIRE_DATE was of type DATE, the other two are VARCHAR.
Also forced the IN clause to produce a join:
{code}
alter session set `planner.in_subquery_threshold` = 2;
{code}
But this did not make a difference. The join was indeed implemented with a merge-join, which does not yet support Semi-Join functionality, however the query did return the expected result (even after adding duplicates in the in-list).
> SQL get the wrong result after hashjoin and hashagg disabled
> ------------------------------------------------------------
>
> Key: DRILL-6938
> URL: https://issues.apache.org/jira/browse/DRILL-6938
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.13.0
> Reporter: Dony Dong
> Assignee: Boaz Ben-Zvi
> Priority: Critical
>
> Hi Team
> After we disable hashjoin and hashagg to fix out of memory issue, we got the wrong result.
> With these two parameters enabled, we will get 8 rows. After we disable them, it only return 3 rows. It seems some MEM_ID had exclude before group or some other step.
> select b.MEM_ID,count(distinct b.DEP_NO)
> from dfs.test.emp b
> where b.DEP_NO<>'-'
> and b.MEM_ID in ('68','412','852','117','657','816','135','751')
> and b.HIRE_DATE>'2014-06-01'
> group by b.MEM_ID
> order by 1;
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)