You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Hanifi Gunes (JIRA)" <ji...@apache.org> on 2015/07/29 17:37:05 UTC

[jira] [Comment Edited] (DRILL-3551) CTAS from complex Json source with schema change is not written (and hence not read back ) correctly

    [ https://issues.apache.org/jira/browse/DRILL-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646242#comment-14646242 ] 

Hanifi Gunes edited comment on DRILL-3551 at 7/29/15 3:36 PM:
--------------------------------------------------------------

-Got it. Let me see then.-

Executing
{code:sql}
CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
{code}
succeeds with
{panel}
Fragment Number of records written
0_0	20200
{panel}

Then running 
{code:sql}
select * from dfs.`tmp`.`tp` t where  t.others.additional is not null
{code}

yields

{code}
some	others
yes	{"other":"true","all":"false","sometimes":"yes","additional":"last entries only"}
yes	{"other":"true","all":"false","sometimes":"yes","additional":"last entries only"}
...
200 times
{code}

[~parthc] what would you think?

----

Interestingly counting on the additional field via
{code:sql}
select count(t.others.additional) from dfs.`tmp`.`tp` t

OR

select count(t.others.other) from dfs.`tmp`.`tp` t
{code}
reports no rows as follows
{code}
EXPR$0
0
{code}

While

{code:sql}
select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not null
{code}

reports expected 200 rows

{code}
EXPR$0
200
{code}

I will file another JIRA for this count issue.


was (Author: hgunes):
Got it. Let me see then.

> CTAS from complex Json source with schema change  is not written (and hence not read back ) correctly
> -----------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3551
>                 URL: https://issues.apache.org/jira/browse/DRILL-3551
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.1.0
>            Reporter: Parth Chandra
>            Assignee: Hanifi Gunes
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> The source data contains - 
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}   
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> Creating a table and reading it back returns incorrect data - 
> CREATE TABLE testparquet as select * from `test.json`;
> SELECT * from testparquet;
> Yields 
> | yes  | {"other":"true","all":"false","sometimes":"yes"}  |
> | yes  | {"other":"true","all":"false","sometimes":"yes"}  |
> | yes  | {"other":"true","all":"false","sometimes":"yes"}  |
> | yes  | {"other":"true","all":"false","sometimes":"yes"}  |
> The "additional" field is missing in all records
> Parquet metadata for the created file does not have the 'additional' field 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)