You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by "Hanifi Gunes (JIRA)" <ji...@apache.org> on 2015/07/29 17:56:05 UTC

[jira] [Created] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

Hanifi Gunes created DRILL-3577:
-----------------------------------

             Summary: Counting nested fields on CTAS-created-parquet file/s reports inaccurate results
                 Key: DRILL-3577
                 URL: https://issues.apache.org/jira/browse/DRILL-3577
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 1.1.0
            Reporter: Hanifi Gunes
            Assignee: Mehant Baid
            Priority: Critical


I have not tried this at a smaller scale nor on JSON file directly but the following seems to re-prod the issue

1. Create an input file as follows
20K rows with the following - 
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
200 rows with the following - 
{"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
entries only"}}

2. CTAS as follows
{code:sql}
CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
{code}

This should read

{code}
Fragment Number of records written
0_0	20200
{code}

3. Count on nested fields via
{code:sql}
select count(t.others.additional) from dfs.`tmp`.`tp` t
OR
select count(t.others.other) from dfs.`tmp`.`tp` t
{code}

reports no rows as follows

{code}
EXPR$0
0
{code}

While
{code:sql}
select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not null
{code}

reports expected 200 rows

{code}
EXPR$0
200
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)