You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Hanifi Gunes (JIRA)" <ji...@apache.org> on 2015/07/29 16:01:04 UTC
[jira] [Resolved] (DRILL-3551) CTAS from complex Json source with
schema change is not written (and hence not read back ) correctly
[ https://issues.apache.org/jira/browse/DRILL-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hanifi Gunes resolved DRILL-3551.
---------------------------------
Resolution: Fixed
Tested on a small input file of 20 mixed records with and w/o the additional field. Looks like the good old field projection problem surfaces here. So quite likely fixed by DRILL-3476. Please re-open attaching an input file if not fixed.
> CTAS from complex Json source with schema change is not written (and hence not read back ) correctly
> -----------------------------------------------------------------------------------------------------
>
> Key: DRILL-3551
> URL: https://issues.apache.org/jira/browse/DRILL-3551
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Data Types
> Affects Versions: 1.1.0
> Reporter: Parth Chandra
> Assignee: Hanifi Gunes
> Priority: Critical
> Fix For: 1.2.0
>
>
> The source data contains -
> 20K rows with the following -
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following -
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> Creating a table and reading it back returns incorrect data -
> CREATE TABLE testparquet as select * from `test.json`;
> SELECT * from testparquet;
> Yields
> | yes | {"other":"true","all":"false","sometimes":"yes"} |
> | yes | {"other":"true","all":"false","sometimes":"yes"} |
> | yes | {"other":"true","all":"false","sometimes":"yes"} |
> | yes | {"other":"true","all":"false","sometimes":"yes"} |
> The "additional" field is missing in all records
> Parquet metadata for the created file does not have the 'additional' field
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Re: [jira] [Resolved] (DRILL-3551) CTAS from complex Json source with
schema change is not written (and hence not read back ) correctly
Posted by Hanifi Gunes <hg...@maprtech.com>.
Just an fyi I dropped a comment under the issue.
-H+
On Wed, Jul 29, 2015 at 5:40 PM, Hanifi Gunes <hg...@maprtech.com> wrote:
> Would you attach a sample input file manifesting the problem? My
> impression from outset was that a field selection bug that we recently
> fixed might have caused this.
>
>
> Thanks.
> -Hanifi
>
> On Wed, Jul 29, 2015 at 5:07 PM, Stefán Baxter <st...@activitystream.com>
> wrote:
>
>> Hi,
>>
>> I think that this problem only showed it self for large datasets where
>> assumptions were being made after 1k records.
>>
>> Were you able to reproduce this with a smaller set?
>>
>> Regards,
>> -Stefan
>>
>>
>> On Wed, Jul 29, 2015 at 2:01 PM, Hanifi Gunes (JIRA) <ji...@apache.org>
>> wrote:
>>
>> >
>> > [
>> >
>> https://issues.apache.org/jira/browse/DRILL-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> > ]
>> >
>> > Hanifi Gunes resolved DRILL-3551.
>> > ---------------------------------
>> > Resolution: Fixed
>> >
>> > Tested on a small input file of 20 mixed records with and w/o the
>> > additional field. Looks like the good old field projection problem
>> surfaces
>> > here. So quite likely fixed by DRILL-3476. Please re-open attaching an
>> > input file if not fixed.
>> >
>> > > CTAS from complex Json source with schema change is not written (and
>> > hence not read back ) correctly
>> > >
>> >
>> -----------------------------------------------------------------------------------------------------
>> > >
>> > > Key: DRILL-3551
>> > > URL: https://issues.apache.org/jira/browse/DRILL-3551
>>
>> > > Project: Apache Drill
>> > > Issue Type: Bug
>> > > Components: Execution - Data Types
>> > > Affects Versions: 1.1.0
>> > > Reporter: Parth Chandra
>> > > Assignee: Hanifi Gunes
>> > > Priority: Critical
>> > > Fix For: 1.2.0
>> > >
>> > >
>> > > The source data contains -
>> > > 20K rows with the following -
>> > >
>> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
>> > > 200 rows with the following -
>> > >
>> >
>> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
>> > > entries only"}}
>> > > Creating a table and reading it back returns incorrect data -
>> > > CREATE TABLE testparquet as select * from `test.json`;
>> > > SELECT * from testparquet;
>> > > Yields
>> > > | yes | {"other":"true","all":"false","sometimes":"yes"} |
>> > > | yes | {"other":"true","all":"false","sometimes":"yes"} |
>> > > | yes | {"other":"true","all":"false","sometimes":"yes"} |
>> > > | yes | {"other":"true","all":"false","sometimes":"yes"} |
>> > > The "additional" field is missing in all records
>> > > Parquet metadata for the created file does not have the 'additional'
>> > field
>> >
>> >
>> >
>> > --
>> > This message was sent by Atlassian JIRA
>> > (v6.3.4#6332)
>> >
>>
>
>
Re: [jira] [Resolved] (DRILL-3551) CTAS from complex Json source with
schema change is not written (and hence not read back ) correctly
Posted by Hanifi Gunes <hg...@maprtech.com>.
Would you attach a sample input file manifesting the problem? My impression
from outset was that a field selection bug that we recently fixed might
have caused this.
Thanks.
-Hanifi
On Wed, Jul 29, 2015 at 5:07 PM, Stefán Baxter <st...@activitystream.com>
wrote:
> Hi,
>
> I think that this problem only showed it self for large datasets where
> assumptions were being made after 1k records.
>
> Were you able to reproduce this with a smaller set?
>
> Regards,
> -Stefan
>
>
> On Wed, Jul 29, 2015 at 2:01 PM, Hanifi Gunes (JIRA) <ji...@apache.org>
> wrote:
>
> >
> > [
> >
> https://issues.apache.org/jira/browse/DRILL-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> > ]
> >
> > Hanifi Gunes resolved DRILL-3551.
> > ---------------------------------
> > Resolution: Fixed
> >
> > Tested on a small input file of 20 mixed records with and w/o the
> > additional field. Looks like the good old field projection problem
> surfaces
> > here. So quite likely fixed by DRILL-3476. Please re-open attaching an
> > input file if not fixed.
> >
> > > CTAS from complex Json source with schema change is not written (and
> > hence not read back ) correctly
> > >
> >
> -----------------------------------------------------------------------------------------------------
> > >
> > > Key: DRILL-3551
> > > URL: https://issues.apache.org/jira/browse/DRILL-3551
> > > Project: Apache Drill
> > > Issue Type: Bug
> > > Components: Execution - Data Types
> > > Affects Versions: 1.1.0
> > > Reporter: Parth Chandra
> > > Assignee: Hanifi Gunes
> > > Priority: Critical
> > > Fix For: 1.2.0
> > >
> > >
> > > The source data contains -
> > > 20K rows with the following -
> > >
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> > > 200 rows with the following -
> > >
> >
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> > > entries only"}}
> > > Creating a table and reading it back returns incorrect data -
> > > CREATE TABLE testparquet as select * from `test.json`;
> > > SELECT * from testparquet;
> > > Yields
> > > | yes | {"other":"true","all":"false","sometimes":"yes"} |
> > > | yes | {"other":"true","all":"false","sometimes":"yes"} |
> > > | yes | {"other":"true","all":"false","sometimes":"yes"} |
> > > | yes | {"other":"true","all":"false","sometimes":"yes"} |
> > > The "additional" field is missing in all records
> > > Parquet metadata for the created file does not have the 'additional'
> > field
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>
Re: [jira] [Resolved] (DRILL-3551) CTAS from complex Json source with
schema change is not written (and hence not read back ) correctly
Posted by Stefán Baxter <st...@activitystream.com>.
Hi,
I think that this problem only showed it self for large datasets where
assumptions were being made after 1k records.
Were you able to reproduce this with a smaller set?
Regards,
-Stefan
On Wed, Jul 29, 2015 at 2:01 PM, Hanifi Gunes (JIRA) <ji...@apache.org>
wrote:
>
> [
> https://issues.apache.org/jira/browse/DRILL-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Hanifi Gunes resolved DRILL-3551.
> ---------------------------------
> Resolution: Fixed
>
> Tested on a small input file of 20 mixed records with and w/o the
> additional field. Looks like the good old field projection problem surfaces
> here. So quite likely fixed by DRILL-3476. Please re-open attaching an
> input file if not fixed.
>
> > CTAS from complex Json source with schema change is not written (and
> hence not read back ) correctly
> >
> -----------------------------------------------------------------------------------------------------
> >
> > Key: DRILL-3551
> > URL: https://issues.apache.org/jira/browse/DRILL-3551
> > Project: Apache Drill
> > Issue Type: Bug
> > Components: Execution - Data Types
> > Affects Versions: 1.1.0
> > Reporter: Parth Chandra
> > Assignee: Hanifi Gunes
> > Priority: Critical
> > Fix For: 1.2.0
> >
> >
> > The source data contains -
> > 20K rows with the following -
> > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> > 200 rows with the following -
> >
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> > entries only"}}
> > Creating a table and reading it back returns incorrect data -
> > CREATE TABLE testparquet as select * from `test.json`;
> > SELECT * from testparquet;
> > Yields
> > | yes | {"other":"true","all":"false","sometimes":"yes"} |
> > | yes | {"other":"true","all":"false","sometimes":"yes"} |
> > | yes | {"other":"true","all":"false","sometimes":"yes"} |
> > | yes | {"other":"true","all":"false","sometimes":"yes"} |
> > The "additional" field is missing in all records
> > Parquet metadata for the created file does not have the 'additional'
> field
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>