You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Rahul Challapalli (JIRA)" <ji...@apache.org> on 2015/04/23 22:14:39 UTC
[jira] [Closed] (DRILL-1897) Using 'avg' along with 'length'
results in wrong output when nulls are present
[ https://issues.apache.org/jira/browse/DRILL-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rahul Challapalli closed DRILL-1897.
------------------------------------
Verified and moved the below tests into the passing suite
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q10_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q11_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q12_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q13_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q14_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q15_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q9_DRILL-1897.q
> Using 'avg' along with 'length' results in wrong output when nulls are present
> ------------------------------------------------------------------------------
>
> Key: DRILL-1897
> URL: https://issues.apache.org/jira/browse/DRILL-1897
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Reporter: Rahul Challapalli
> Assignee: Mehant Baid
> Fix For: 0.9.0
>
> Attachments: DRILL-1897.patch
>
>
> git.commit.id.abbrev=9dfa4a1
> Dataset :
> {code}
> {
> "col1":1,
> "col2":"abc"
> }
> {
> "col1":1,
> "col2":null
> }
> {
> "col1":null,
> "col2":null
> }
> {code}
> The below query should return 3 instead of 1 (postgres correctly returns 3).
> Query :
> {code}
> select avg(length(col2)) from `a.json`;
> +------------+
> | EXPR$0 |
> +------------+
> | 1.0 |
> +------------+
> {code}
> Plan for the above query :
> {code}
> 00-00 Screen
> 00-01 Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY])
> 00-02 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()])
> 00-03 Project($f0=[length($0)])
> 00-04 Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/data-shapes/wide-columns/flat/json/a.json, numFiles=1, columns=[`col2`], files=[maprfs:/drill/testdata/data-shapes/wide-columns/flat/json/a.json]]])
> {code}
> The below query which does not involve a second function works as expected even though nulls are present
> {code}
> select avg(col1) from `a.json`;
> +------------+
> | EXPR$0 |
> +------------+
> | 1.0 |
> +------------+
> {code}
> Let me know if you need any more information.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)