You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Rahul Challapalli (JIRA)" <ji...@apache.org> on 2015/04/23 22:14:39 UTC

[jira] [Closed] (DRILL-1897) Using 'avg' along with 'length' results in wrong output when nulls are present

     [ https://issues.apache.org/jira/browse/DRILL-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rahul Challapalli closed DRILL-1897.
------------------------------------

Verified and moved the below tests into the passing suite

Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q10_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q11_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q12_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q13_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q14_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q15_DRILL-1897.q
Functional/Passing/data-shapes/wide-columns/5000/1000rows/parquet/q9_DRILL-1897.q

> Using 'avg' along with 'length' results in wrong output when nulls are present
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-1897
>                 URL: https://issues.apache.org/jira/browse/DRILL-1897
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>            Reporter: Rahul Challapalli
>            Assignee: Mehant Baid
>             Fix For: 0.9.0
>
>         Attachments: DRILL-1897.patch
>
>
> git.commit.id.abbrev=9dfa4a1
> Dataset :
> {code}
> {
>  "col1":1,
>  "col2":"abc"
> }
> {
>  "col1":1,
>  "col2":null
> }
> {
>  "col1":null,
>  "col2":null
> }
> {code}
> The below query should return 3 instead of 1 (postgres correctly returns 3).
> Query :
> {code}
> select avg(length(col2)) from `a.json`;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1.0        |
> +------------+
> {code} 
> Plan for the above query :
> {code}
> 00-00    Screen
> 00-01      Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY])
> 00-02        StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()])
> 00-03          Project($f0=[length($0)])
> 00-04            Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/data-shapes/wide-columns/flat/json/a.json, numFiles=1, columns=[`col2`], files=[maprfs:/drill/testdata/data-shapes/wide-columns/flat/json/a.json]]])
> {code}
> The below query which does not involve a second function works as expected even though nulls are present
> {code}
>  select avg(col1) from `a.json`;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1.0        |
> +------------+
> {code}
> Let me know if you need any more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)