You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Hsuan Yi Chu <hy...@maprtech.com> on 2015/05/11 01:03:18 UTC
Schema of the Recordbatch from scanning Values seems wrong
I tried a query which has a pattern:
column IN (1212 + 1, 1212 + 2, 1212)
For the tuple, Calcite makes a plan like
UnionAll(all=[true])
UnionAll(all=[true])
Project(EXPR$0=[+(1212, 1)])
Values
Project(EXPR$0=[+(1212, 2)])
Values
Values
And I found one surprising thing. At planning time, ValuesPrel has the
correct RelRecordType
However, at execution, the schema of the recordbatch from scanning Values
is [`ZERO`(BIGINT: OPTIONAL), `*`(BIGINT: OPTIONAL)].
I cannot make sense out of the second column (i.e., *). Does that serve
special purpose? Or is it a bug which we should try to remove? Its
existence causes some issues for UNION ALL.
Re: Schema of the Recordbatch from scanning Values seems wrong
Posted by Hsuan Yi Chu <hy...@maprtech.com>.
That makes sense. I also ran with that modification. Things work out.
I will run a unit test to see if it breaks anything.
On Sun, May 10, 2015 at 4:29 PM, Jacques Nadeau <ja...@apache.org> wrote:
> It looks like the issue is:
>
> writeListDataIfTyped and writeMapDataIfTyped don't update atLeastOneWrite
> to true if they write a value. As such, our code is thinking that we
> haven't added any columns so we add a * column.
>
> On Sun, May 10, 2015 at 4:25 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > important given its impact on expression based in queries.
> >
> > On Sun, May 10, 2015 at 4:21 PM, Hsuan Yi Chu <hy...@maprtech.com>
> wrote:
> >
> >> I think so. The * column had been added to the container
> >> after currentReader.next() was called.
> >>
> >> How do we priority this issue ?
> >>
> >>
> >>
> >>
> >> On Sun, May 10, 2015 at 4:09 PM, Jacques Nadeau <ja...@apache.org>
> >> wrote:
> >>
> >> > I think it must be an issue in the ExtendedJsonReader (which is used
> by
> >> the
> >> > ValuesRel). It is probably the same as:
> >> >
> >> > https://issues.apache.org/jira/browse/DRILL-2906
> >> >
> >> > Fixing 2906 should fix this. Can you try to determine issue?
> >> >
> >> > On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com>
> >> wrote:
> >> >
> >> > > I tried a query which has a pattern:
> >> > >
> >> > > column IN (1212 + 1, 1212 + 2, 1212)
> >> > >
> >> > > For the tuple, Calcite makes a plan like
> >> > >
> >> > > UnionAll(all=[true])
> >> > > UnionAll(all=[true])
> >> > > Project(EXPR$0=[+(1212, 1)])
> >> > > Values
> >> > > Project(EXPR$0=[+(1212, 2)])
> >> > > Values
> >> > > Values
> >> > >
> >> > > And I found one surprising thing. At planning time, ValuesPrel has
> the
> >> > > correct RelRecordType
> >> > >
> >> > > However, at execution, the schema of the recordbatch from scanning
> >> Values
> >> > > is [`ZERO`(BIGINT: OPTIONAL), `*`(BIGINT: OPTIONAL)].
> >> > >
> >> > > I cannot make sense out of the second column (i.e., *). Does that
> >> serve
> >> > > special purpose? Or is it a bug which we should try to remove? Its
> >> > > existence causes some issues for UNION ALL.
> >> > >
> >> >
> >>
> >
> >
>
Re: Schema of the Recordbatch from scanning Values seems wrong
Posted by Jacques Nadeau <ja...@apache.org>.
It looks like the issue is:
writeListDataIfTyped and writeMapDataIfTyped don't update atLeastOneWrite
to true if they write a value. As such, our code is thinking that we
haven't added any columns so we add a * column.
On Sun, May 10, 2015 at 4:25 PM, Jacques Nadeau <ja...@apache.org> wrote:
> important given its impact on expression based in queries.
>
> On Sun, May 10, 2015 at 4:21 PM, Hsuan Yi Chu <hy...@maprtech.com> wrote:
>
>> I think so. The * column had been added to the container
>> after currentReader.next() was called.
>>
>> How do we priority this issue ?
>>
>>
>>
>>
>> On Sun, May 10, 2015 at 4:09 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>>
>> > I think it must be an issue in the ExtendedJsonReader (which is used by
>> the
>> > ValuesRel). It is probably the same as:
>> >
>> > https://issues.apache.org/jira/browse/DRILL-2906
>> >
>> > Fixing 2906 should fix this. Can you try to determine issue?
>> >
>> > On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com>
>> wrote:
>> >
>> > > I tried a query which has a pattern:
>> > >
>> > > column IN (1212 + 1, 1212 + 2, 1212)
>> > >
>> > > For the tuple, Calcite makes a plan like
>> > >
>> > > UnionAll(all=[true])
>> > > UnionAll(all=[true])
>> > > Project(EXPR$0=[+(1212, 1)])
>> > > Values
>> > > Project(EXPR$0=[+(1212, 2)])
>> > > Values
>> > > Values
>> > >
>> > > And I found one surprising thing. At planning time, ValuesPrel has the
>> > > correct RelRecordType
>> > >
>> > > However, at execution, the schema of the recordbatch from scanning
>> Values
>> > > is [`ZERO`(BIGINT: OPTIONAL), `*`(BIGINT: OPTIONAL)].
>> > >
>> > > I cannot make sense out of the second column (i.e., *). Does that
>> serve
>> > > special purpose? Or is it a bug which we should try to remove? Its
>> > > existence causes some issues for UNION ALL.
>> > >
>> >
>>
>
>
Re: Schema of the Recordbatch from scanning Values seems wrong
Posted by Jacques Nadeau <ja...@apache.org>.
important given its impact on expression based in queries.
On Sun, May 10, 2015 at 4:21 PM, Hsuan Yi Chu <hy...@maprtech.com> wrote:
> I think so. The * column had been added to the container
> after currentReader.next() was called.
>
> How do we priority this issue ?
>
>
>
>
> On Sun, May 10, 2015 at 4:09 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > I think it must be an issue in the ExtendedJsonReader (which is used by
> the
> > ValuesRel). It is probably the same as:
> >
> > https://issues.apache.org/jira/browse/DRILL-2906
> >
> > Fixing 2906 should fix this. Can you try to determine issue?
> >
> > On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com>
> wrote:
> >
> > > I tried a query which has a pattern:
> > >
> > > column IN (1212 + 1, 1212 + 2, 1212)
> > >
> > > For the tuple, Calcite makes a plan like
> > >
> > > UnionAll(all=[true])
> > > UnionAll(all=[true])
> > > Project(EXPR$0=[+(1212, 1)])
> > > Values
> > > Project(EXPR$0=[+(1212, 2)])
> > > Values
> > > Values
> > >
> > > And I found one surprising thing. At planning time, ValuesPrel has the
> > > correct RelRecordType
> > >
> > > However, at execution, the schema of the recordbatch from scanning
> Values
> > > is [`ZERO`(BIGINT: OPTIONAL), `*`(BIGINT: OPTIONAL)].
> > >
> > > I cannot make sense out of the second column (i.e., *). Does that serve
> > > special purpose? Or is it a bug which we should try to remove? Its
> > > existence causes some issues for UNION ALL.
> > >
> >
>
Re: Schema of the Recordbatch from scanning Values seems wrong
Posted by Hsuan Yi Chu <hy...@maprtech.com>.
I think so. The * column had been added to the container
after currentReader.next() was called.
How do we priority this issue ?
On Sun, May 10, 2015 at 4:09 PM, Jacques Nadeau <ja...@apache.org> wrote:
> I think it must be an issue in the ExtendedJsonReader (which is used by the
> ValuesRel). It is probably the same as:
>
> https://issues.apache.org/jira/browse/DRILL-2906
>
> Fixing 2906 should fix this. Can you try to determine issue?
>
> On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com> wrote:
>
> > I tried a query which has a pattern:
> >
> > column IN (1212 + 1, 1212 + 2, 1212)
> >
> > For the tuple, Calcite makes a plan like
> >
> > UnionAll(all=[true])
> > UnionAll(all=[true])
> > Project(EXPR$0=[+(1212, 1)])
> > Values
> > Project(EXPR$0=[+(1212, 2)])
> > Values
> > Values
> >
> > And I found one surprising thing. At planning time, ValuesPrel has the
> > correct RelRecordType
> >
> > However, at execution, the schema of the recordbatch from scanning Values
> > is [`ZERO`(BIGINT: OPTIONAL), `*`(BIGINT: OPTIONAL)].
> >
> > I cannot make sense out of the second column (i.e., *). Does that serve
> > special purpose? Or is it a bug which we should try to remove? Its
> > existence causes some issues for UNION ALL.
> >
>
Re: Schema of the Recordbatch from scanning Values seems wrong
Posted by Jacques Nadeau <ja...@apache.org>.
I think it must be an issue in the ExtendedJsonReader (which is used by the
ValuesRel). It is probably the same as:
https://issues.apache.org/jira/browse/DRILL-2906
Fixing 2906 should fix this. Can you try to determine issue?
On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com> wrote:
> I tried a query which has a pattern:
>
> column IN (1212 + 1, 1212 + 2, 1212)
>
> For the tuple, Calcite makes a plan like
>
> UnionAll(all=[true])
> UnionAll(all=[true])
> Project(EXPR$0=[+(1212, 1)])
> Values
> Project(EXPR$0=[+(1212, 2)])
> Values
> Values
>
> And I found one surprising thing. At planning time, ValuesPrel has the
> correct RelRecordType
>
> However, at execution, the schema of the recordbatch from scanning Values
> is [`ZERO`(BIGINT: OPTIONAL), `*`(BIGINT: OPTIONAL)].
>
> I cannot make sense out of the second column (i.e., *). Does that serve
> special purpose? Or is it a bug which we should try to remove? Its
> existence causes some issues for UNION ALL.
>