You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "benj (Jira)" <ji...@apache.org> on 2019/09/27 13:07:00 UTC

[jira] [Created] (DRILL-7390) kvgen/flatten doesn't produce same result from .json or .parquet

benj created DRILL-7390:
---------------------------

             Summary: kvgen/flatten doesn't produce same result from .json or .parquet
                 Key: DRILL-7390
                 URL: https://issues.apache.org/jira/browse/DRILL-7390
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Data Types, Functions - Drill, Storage - JSON, Storage - Parquet
    Affects Versions: 1.16.0
            Reporter: benj
         Attachments: ANIMALS_json.tar.gz, ANIMALS_pqt.tar.gz

With a Parquet produce from JSON (_ANIMALS_json_ and _ANIMALS_pqt_ in attachment in tar.gz format)
{code:sql}
CREATE TABLE ....`ANIMALS_pqt` AS 
(SELECT * FROM ....`ANIMALS_json`);
{code}
Same request, using kvgen and flatten, applied on JSON and Parquet doesn't produce the same results
{code:sql}
SELECT count(*) FROM 
 (SELECT f FROM 
  (SELECT flatten(k) AS f FROM 
   (SELECT kvgen(animals) AS k FROM ....`ANIMALS_json`)) AS x)
=>
8482290

SELECT count(*) FROM 
 (SELECT f FROM 
  (SELECT flatten(k) AS f FROM 
   (SELECT kvgen(animals) AS k FROM ....`ANIMALS_pqt`)) AS x)
=>
929430
{code}
Or another example:
{code:sql}
SELECT count(*) FROM 
 (SELECT f FROM 
  (SELECT flatten(k) AS f FROM 
   (SELECT kvgen(animals) AS k FROM ....`ANIMALS_json`)) AS x WHERE x.f.key='Cat')
=>
121368

SELECT count(*) FROM 
 (SELECT f FROM 
  (SELECT flatten(k) AS f FROM 
   (SELECT kvgen(animals) AS k FROM ....`ANIMALS_pqt`)) AS x WHERE x.f.key='Cat')
=>
13470
{code}
The real result is the json one, as proved by:
{code:bash}
cat ANIMALS_json/*.json | grep -c "Cat"
121368
{code}
Please note that, here, It's appear the particular file _ANIMALS_pqt/1_0_0.parquet_ is not well computed but the other are correct:
{code:sql}
SELECT count(*) FROM (SELECT f FROM (SELECT flatten(k) AS f FROM (SELECT kvgen(animals) AS k FROM ....`ANIMALS_pqt/1_0_0.parquet`)) AS x WHERE x.f.key='Cat');
=> 107898
SELECT count(*) FROM (SELECT f FROM (SELECT flatten(k) AS f FROM (SELECT kvgen(animals) AS k FROM ....`ANIMALS_pqt/1_1_0.parquet`)) AS x WHERE x.f.key='Cat');
=> 2429
SELECT count(*) FROM (SELECT f FROM (SELECT flatten(k) AS f FROM (SELECT kvgen(animals) AS k FROM ....`ANIMALS_pqt/1_2_0.parquet`)) AS x WHERE x.f.key='Cat');
=> 5419
SELECT count(*) FROM (SELECT f FROM (SELECT flatten(k) AS f FROM (SELECT kvgen(animals) AS k FROM ....`ANIMALS_pqt/1_3_0.parquet`)) AS x WHERE x.f.key='Cat');
=> 5622
{code}
                   2429+5419+5622=13470  (result of request on ANIMALS_pqt)
 107898+2429+5419+5622=121368 (result of request on ANIMALS_json)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)