You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Hsuan Yi Chu <hy...@maprtech.com> on 2015/05/11 01:03:18 UTC

Schema of the Recordbatch from scanning Values seems wrong

I tried a query which has a pattern:

column IN (1212 + 1, 1212 + 2, 1212)

For the tuple, Calcite makes a plan like

UnionAll(all=[true])
  UnionAll(all=[true])
    Project(EXPR$0=[+(1212, 1)])
      Values
    Project(EXPR$0=[+(1212, 2)])
      Values
  Values

And I found one surprising thing. At planning time, ValuesPrel has the
correct RelRecordType

However, at execution, the schema of the recordbatch from scanning Values
is [`ZERO`(BIGINT: OPTIONAL),  `*`(BIGINT: OPTIONAL)].

I cannot make sense out of the second column (i.e., *). Does that serve
special purpose? Or is it a bug which we should try to remove? Its
existence causes some issues for UNION ALL.

Re: Schema of the Recordbatch from scanning Values seems wrong

Posted by Hsuan Yi Chu <hy...@maprtech.com>.

That makes sense. I also ran with that modification. Things work out.

I will run a unit test to see if it breaks anything.

On Sun, May 10, 2015 at 4:29 PM, Jacques Nadeau <ja...@apache.org> wrote:

> It looks like the issue is:
>
> writeListDataIfTyped and writeMapDataIfTyped don't update atLeastOneWrite
> to true if they write a value.  As such, our code is thinking that we
> haven't added any columns so we add a * column.
>
> On Sun, May 10, 2015 at 4:25 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > important given its impact on expression based in queries.
> >
> > On Sun, May 10, 2015 at 4:21 PM, Hsuan Yi Chu <hy...@maprtech.com>
> wrote:
> >
> >> I think so. The * column had been added to the container
> >> after currentReader.next() was called.
> >>
> >> How do we priority this issue ?
> >>
> >>
> >>
> >>
> >> On Sun, May 10, 2015 at 4:09 PM, Jacques Nadeau <ja...@apache.org>
> >> wrote:
> >>
> >> > I think it must be an issue in the ExtendedJsonReader (which is used
> by
> >> the
> >> > ValuesRel).  It is probably the same as:
> >> >
> >> > https://issues.apache.org/jira/browse/DRILL-2906
> >> >
> >> > Fixing 2906 should fix this.  Can you try to determine issue?
> >> >
> >> > On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com>
> >> wrote:
> >> >
> >> > > I tried a query which has a pattern:
> >> > >
> >> > > column IN (1212 + 1, 1212 + 2, 1212)
> >> > >
> >> > > For the tuple, Calcite makes a plan like
> >> > >
> >> > > UnionAll(all=[true])
> >> > >   UnionAll(all=[true])
> >> > >     Project(EXPR$0=[+(1212, 1)])
> >> > >       Values
> >> > >     Project(EXPR$0=[+(1212, 2)])
> >> > >       Values
> >> > >   Values
> >> > >
> >> > > And I found one surprising thing. At planning time, ValuesPrel has
> the
> >> > > correct RelRecordType
> >> > >
> >> > > However, at execution, the schema of the recordbatch from scanning
> >> Values
> >> > > is [`ZERO`(BIGINT: OPTIONAL),  `*`(BIGINT: OPTIONAL)].
> >> > >
> >> > > I cannot make sense out of the second column (i.e., *). Does that
> >> serve
> >> > > special purpose? Or is it a bug which we should try to remove? Its
> >> > > existence causes some issues for UNION ALL.
> >> > >
> >> >
> >>
> >
> >
>

Re: Schema of the Recordbatch from scanning Values seems wrong

Posted by Jacques Nadeau <ja...@apache.org>.

It looks like the issue is:

writeListDataIfTyped and writeMapDataIfTyped don't update atLeastOneWrite
to true if they write a value.  As such, our code is thinking that we
haven't added any columns so we add a * column.

On Sun, May 10, 2015 at 4:25 PM, Jacques Nadeau <ja...@apache.org> wrote:

> important given its impact on expression based in queries.
>
> On Sun, May 10, 2015 at 4:21 PM, Hsuan Yi Chu <hy...@maprtech.com> wrote:
>
>> I think so. The * column had been added to the container
>> after currentReader.next() was called.
>>
>> How do we priority this issue ?
>>
>>
>>
>>
>> On Sun, May 10, 2015 at 4:09 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>>
>> > I think it must be an issue in the ExtendedJsonReader (which is used by
>> the
>> > ValuesRel).  It is probably the same as:
>> >
>> > https://issues.apache.org/jira/browse/DRILL-2906
>> >
>> > Fixing 2906 should fix this.  Can you try to determine issue?
>> >
>> > On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com>
>> wrote:
>> >
>> > > I tried a query which has a pattern:
>> > >
>> > > column IN (1212 + 1, 1212 + 2, 1212)
>> > >
>> > > For the tuple, Calcite makes a plan like
>> > >
>> > > UnionAll(all=[true])
>> > >   UnionAll(all=[true])
>> > >     Project(EXPR$0=[+(1212, 1)])
>> > >       Values
>> > >     Project(EXPR$0=[+(1212, 2)])
>> > >       Values
>> > >   Values
>> > >
>> > > And I found one surprising thing. At planning time, ValuesPrel has the
>> > > correct RelRecordType
>> > >
>> > > However, at execution, the schema of the recordbatch from scanning
>> Values
>> > > is [`ZERO`(BIGINT: OPTIONAL),  `*`(BIGINT: OPTIONAL)].
>> > >
>> > > I cannot make sense out of the second column (i.e., *). Does that
>> serve
>> > > special purpose? Or is it a bug which we should try to remove? Its
>> > > existence causes some issues for UNION ALL.
>> > >
>> >
>>
>
>

Re: Schema of the Recordbatch from scanning Values seems wrong

Posted by Jacques Nadeau <ja...@apache.org>.

important given its impact on expression based in queries.

On Sun, May 10, 2015 at 4:21 PM, Hsuan Yi Chu <hy...@maprtech.com> wrote:

> I think so. The * column had been added to the container
> after currentReader.next() was called.
>
> How do we priority this issue ?
>
>
>
>
> On Sun, May 10, 2015 at 4:09 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > I think it must be an issue in the ExtendedJsonReader (which is used by
> the
> > ValuesRel).  It is probably the same as:
> >
> > https://issues.apache.org/jira/browse/DRILL-2906
> >
> > Fixing 2906 should fix this.  Can you try to determine issue?
> >
> > On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com>
> wrote:
> >
> > > I tried a query which has a pattern:
> > >
> > > column IN (1212 + 1, 1212 + 2, 1212)
> > >
> > > For the tuple, Calcite makes a plan like
> > >
> > > UnionAll(all=[true])
> > >   UnionAll(all=[true])
> > >     Project(EXPR$0=[+(1212, 1)])
> > >       Values
> > >     Project(EXPR$0=[+(1212, 2)])
> > >       Values
> > >   Values
> > >
> > > And I found one surprising thing. At planning time, ValuesPrel has the
> > > correct RelRecordType
> > >
> > > However, at execution, the schema of the recordbatch from scanning
> Values
> > > is [`ZERO`(BIGINT: OPTIONAL),  `*`(BIGINT: OPTIONAL)].
> > >
> > > I cannot make sense out of the second column (i.e., *). Does that serve
> > > special purpose? Or is it a bug which we should try to remove? Its
> > > existence causes some issues for UNION ALL.
> > >
> >
>

Re: Schema of the Recordbatch from scanning Values seems wrong

Posted by Hsuan Yi Chu <hy...@maprtech.com>.

I think so. The * column had been added to the container
after currentReader.next() was called.

How do we priority this issue ?




On Sun, May 10, 2015 at 4:09 PM, Jacques Nadeau <ja...@apache.org> wrote:

> I think it must be an issue in the ExtendedJsonReader (which is used by the
> ValuesRel).  It is probably the same as:
>
> https://issues.apache.org/jira/browse/DRILL-2906
>
> Fixing 2906 should fix this.  Can you try to determine issue?
>
> On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com> wrote:
>
> > I tried a query which has a pattern:
> >
> > column IN (1212 + 1, 1212 + 2, 1212)
> >
> > For the tuple, Calcite makes a plan like
> >
> > UnionAll(all=[true])
> >   UnionAll(all=[true])
> >     Project(EXPR$0=[+(1212, 1)])
> >       Values
> >     Project(EXPR$0=[+(1212, 2)])
> >       Values
> >   Values
> >
> > And I found one surprising thing. At planning time, ValuesPrel has the
> > correct RelRecordType
> >
> > However, at execution, the schema of the recordbatch from scanning Values
> > is [`ZERO`(BIGINT: OPTIONAL),  `*`(BIGINT: OPTIONAL)].
> >
> > I cannot make sense out of the second column (i.e., *). Does that serve
> > special purpose? Or is it a bug which we should try to remove? Its
> > existence causes some issues for UNION ALL.
> >
>

Re: Schema of the Recordbatch from scanning Values seems wrong

Posted by Jacques Nadeau <ja...@apache.org>.

I think it must be an issue in the ExtendedJsonReader (which is used by the
ValuesRel).  It is probably the same as:

https://issues.apache.org/jira/browse/DRILL-2906

Fixing 2906 should fix this.  Can you try to determine issue?

On Sun, May 10, 2015 at 4:03 PM, Hsuan Yi Chu <hy...@maprtech.com> wrote:

> I tried a query which has a pattern:
>
> column IN (1212 + 1, 1212 + 2, 1212)
>
> For the tuple, Calcite makes a plan like
>
> UnionAll(all=[true])
>   UnionAll(all=[true])
>     Project(EXPR$0=[+(1212, 1)])
>       Values
>     Project(EXPR$0=[+(1212, 2)])
>       Values
>   Values
>
> And I found one surprising thing. At planning time, ValuesPrel has the
> correct RelRecordType
>
> However, at execution, the schema of the recordbatch from scanning Values
> is [`ZERO`(BIGINT: OPTIONAL),  `*`(BIGINT: OPTIONAL)].
>
> I cannot make sense out of the second column (i.e., *). Does that serve
> special purpose? Or is it a bug which we should try to remove? Its
> existence causes some issues for UNION ALL.
>