You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Rui Wang <ru...@google.com.INVALID> on 2018/09/04 19:52:30 UTC

[Discuss] Make flattening on Struct/Row optional

Hi Community,

While trying to support Row type in Apache Beam SQL on top of Calcite, I
realized flattening Row logic will make structure information of Row lost
after Projections. There is a use case where users want to mix Beam
programming model with Beam SQL together to process a dataset. The
following is an example of the use case:

dataset.apply(something user defined)
            .apply(SELECT ...)
            .apply(something user defined)

As you can see, after the SQL statement is applied, the data structure
should be preserved for further processing.

The most straightforward way to me is to make Struct fattening optional so
I could choose to disable it and the Row structure is preserved. Can I ask
if it is feasible to make it happen? What could happen if Calcite just
doesn't flatten Struct in flattener? (I tried to disable it but had
exceptions in optimizer. I wasn't sure if that were some minor thing to fix
or Struct flattening was a design choice so the impact of change was huge)

Additionally, if there is a way to keep the information that I can use to
reconstruct the Row after projections, it might be ok as well. Does this
idea exist in Calcite? If it does not exist, how is this idea compared with
disabling Struct flattening?

Thanks,
Rui

Re: [Discuss] Make flattening on Struct/Row optional

Posted by Igor Guzenko <ih...@gmail.com>.

Hi Rui,

I'm glad that the fix was useful.

Thanks,
Igor


On Thu, Dec 12, 2019 at 8:16 PM Rui Wang <am...@apache.org> wrote:

> Absolutely. Thanks lgor for the contribution! :)
>
>
> -Rui
>
> On Wed, Dec 11, 2019 at 10:54 PM Stamatis Zampetakis <za...@gmail.com>
> wrote:
>
> > So basically thanks to Igor :)
> >
> > On Wed, Dec 11, 2019 at 9:56 PM Rui Wang <am...@apache.org> wrote:
> >
> > > Thanks Stamatis's suggestion. Indeed a recent effort in [1] enhanced
> the
> > > support that reconstructs ROW in the top SELECT, which is supposed to
> > solve
> > > the problem.
> > >
> > >
> > >
> > > [1]: https://jira.apache.org/jira/browse/CALCITE-3138
> > >
> > > On Mon, Dec 9, 2019 at 3:21 PM Rui Wang <am...@apache.org> wrote:
> > >
> > > > Hello,
> > > >
> > > > Sorry for the long delay on this thread. Recently I heard about
> > requests
> > > > on how to deal with STRUCT without flattening it again in BeamSQL.
> > Also I
> > > > realized Flink has already disabled it in their codebase[1]. I did
> try
> > to
> > > > remove STRUCT flattening and run unit tests of calcite core to see
> how
> > > many
> > > > tests breaks: it was 25, which wasn't that bad. So I would like to
> pick
> > > up
> > > > this effort again.
> > > >
> > > > Before I do it, I just want to ask if Calcite community supports this
> > > > effort (or think if it is a good idea)?
> > > >
> > > > My current execution plan will be the following:
> > > > 1. Add a new flag to FrameworkConfig to specify whether flattening
> > > STRUCT.
> > > > By default, it is yes.
> > > > 2. When disabling struct flatterner, add more tests to test STRUCT
> > > support
> > > > in general. For example, test STRUCT support on projection, join
> > > condition,
> > > > filtering, etc.  If there is something breaks, try to fix it.
> > > > 3. Check the 25 failed tests above and see why they have failed if
> > struct
> > > > flattener is gone. Duplicate those failed tests but have necessary
> > fixes
> > > to
> > > > make sure they can pass without STRUCT flattening.
> > > >
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166
> > > >
> > > >
> > > > -Rui
> > > >
> > > > On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde <jh...@apache.org>
> wrote:
> > > >
> > > >> It might not be minor, but it’s worth a try. At optimization time we
> > > >> treat all fields as fields, regardless of whether they have complex
> > > types
> > > >> (maps, arrays, multisets, records) so there should not be too many
> > > >> problems. The flattening was mainly for the benefit of the runtime.
> > > >>
> > > >>
> > > >> > On Sep 5, 2018, at 11:32 AM, Rui Wang <ru...@google.com.INVALID>
> > > >> wrote:
> > > >> >
> > > >> > Thanks for your helpful response! It seems like disabling the
> > > flattening
> > > >> > will at least affect some rules in optimization. It might not be a
> > > minor
> > > >> > change.
> > > >> >
> > > >> >
> > > >> > -Rui
> > > >> >
> > > >> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <
> > zabetak@gmail.com
> > > >
> > > >> > wrote:
> > > >> >
> > > >> >> Hi Rui,
> > > >> >>
> > > >> >> Disabling flattening in some cases seems reasonable.
> > > >> >>
> > > >> >> If I am not mistaken, even in the existing code it is not used
> all
> > > the
> > > >> time
> > > >> >> so it makes sense to become configurable.
> > > >> >> For example, Calcite prepared statements (CalcitePrepareImpl) are
> > > >> using the
> > > >> >> flattener only for DDL operations that create materialized views
> > (and
> > > >> this
> > > >> >> is because this code at some point passes from the PlannerImpl).
> > > >> >> On the other hand, any query that is using the Planner will also
> > pass
> > > >> from
> > > >> >> the flattener.
> > > >> >>
> > > >> >> Disabling the flattener does not mean that all rules will work
> > > without
> > > >> >> problems. The Javadoc of the RelStructuredTypeFlattener at some
> > point
> > > >> says
> > > >> >> "This approach has the benefit that real optimizer and codegen
> > rules
> > > >> never
> > > >> >> have to deal with structured types.". Due to this, it is very
> > likely
> > > >> that
> > > >> >> some rules were written based on the fact that there are no
> > > structured
> > > >> >> types.
> > > >> >>
> > > >> >> Best,
> > > >> >> Stamatis
> > > >> >>
> > > >> >>
> > > >> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <
> > > jhyde@apache.org
> > > >> >
> > > >> >> έγραψε:
> > > >> >>
> > > >> >>> Flattening was introduced mainly because the original engine
> used
> > > flat
> > > >> >>> column-oriented storage. Now we have several ways to executing,
> > > >> >>> including generating java code.
> > > >> >>>
> > > >> >>> Adding a mode to disable flattening might make sense.
> > > >> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang
> > <ruwang@google.com.invalid
> > > >
> > > >> >>> wrote:
> > > >> >>>>
> > > >> >>>> Hi Community,
> > > >> >>>>
> > > >> >>>> While trying to support Row type in Apache Beam SQL on top of
> > > >> Calcite,
> > > >> >> I
> > > >> >>>> realized flattening Row logic will make structure information
> of
> > > Row
> > > >> >> lost
> > > >> >>>> after Projections. There is a use case where users want to mix
> > Beam
> > > >> >>>> programming model with Beam SQL together to process a dataset.
> > The
> > > >> >>>> following is an example of the use case:
> > > >> >>>>
> > > >> >>>> dataset.apply(something user defined)
> > > >> >>>>            .apply(SELECT ...)
> > > >> >>>>            .apply(something user defined)
> > > >> >>>>
> > > >> >>>> As you can see, after the SQL statement is applied, the data
> > > >> structure
> > > >> >>>> should be preserved for further processing.
> > > >> >>>>
> > > >> >>>> The most straightforward way to me is to make Struct fattening
> > > >> optional
> > > >> >>> so
> > > >> >>>> I could choose to disable it and the Row structure is
> preserved.
> > > Can
> > > >> I
> > > >> >>> ask
> > > >> >>>> if it is feasible to make it happen? What could happen if
> Calcite
> > > >> just
> > > >> >>>> doesn't flatten Struct in flattener? (I tried to disable it but
> > had
> > > >> >>>> exceptions in optimizer. I wasn't sure if that were some minor
> > > thing
> > > >> to
> > > >> >>> fix
> > > >> >>>> or Struct flattening was a design choice so the impact of
> change
> > > was
> > > >> >>> huge)
> > > >> >>>>
> > > >> >>>> Additionally, if there is a way to keep the information that I
> > can
> > > >> use
> > > >> >> to
> > > >> >>>> reconstruct the Row after projections, it might be ok as well.
> > Does
> > > >> >> this
> > > >> >>>> idea exist in Calcite? If it does not exist, how is this idea
> > > >> compared
> > > >> >>> with
> > > >> >>>> disabling Struct flattening?
> > > >> >>>>
> > > >> >>>> Thanks,
> > > >> >>>> Rui
> > > >> >>>
> > > >> >>
> > > >>
> > > >>
> > >
> >
>

Re: [Discuss] Make flattening on Struct/Row optional

Posted by Rui Wang <am...@apache.org>.

Absolutely. Thanks lgor for the contribution! :)


-Rui

On Wed, Dec 11, 2019 at 10:54 PM Stamatis Zampetakis <za...@gmail.com>
wrote:

> So basically thanks to Igor :)
>
> On Wed, Dec 11, 2019 at 9:56 PM Rui Wang <am...@apache.org> wrote:
>
> > Thanks Stamatis's suggestion. Indeed a recent effort in [1] enhanced the
> > support that reconstructs ROW in the top SELECT, which is supposed to
> solve
> > the problem.
> >
> >
> >
> > [1]: https://jira.apache.org/jira/browse/CALCITE-3138
> >
> > On Mon, Dec 9, 2019 at 3:21 PM Rui Wang <am...@apache.org> wrote:
> >
> > > Hello,
> > >
> > > Sorry for the long delay on this thread. Recently I heard about
> requests
> > > on how to deal with STRUCT without flattening it again in BeamSQL.
> Also I
> > > realized Flink has already disabled it in their codebase[1]. I did try
> to
> > > remove STRUCT flattening and run unit tests of calcite core to see how
> > many
> > > tests breaks: it was 25, which wasn't that bad. So I would like to pick
> > up
> > > this effort again.
> > >
> > > Before I do it, I just want to ask if Calcite community supports this
> > > effort (or think if it is a good idea)?
> > >
> > > My current execution plan will be the following:
> > > 1. Add a new flag to FrameworkConfig to specify whether flattening
> > STRUCT.
> > > By default, it is yes.
> > > 2. When disabling struct flatterner, add more tests to test STRUCT
> > support
> > > in general. For example, test STRUCT support on projection, join
> > condition,
> > > filtering, etc.  If there is something breaks, try to fix it.
> > > 3. Check the 25 failed tests above and see why they have failed if
> struct
> > > flattener is gone. Duplicate those failed tests but have necessary
> fixes
> > to
> > > make sure they can pass without STRUCT flattening.
> > >
> > >
> > > [1]:
> > >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166
> > >
> > >
> > > -Rui
> > >
> > > On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde <jh...@apache.org> wrote:
> > >
> > >> It might not be minor, but it’s worth a try. At optimization time we
> > >> treat all fields as fields, regardless of whether they have complex
> > types
> > >> (maps, arrays, multisets, records) so there should not be too many
> > >> problems. The flattening was mainly for the benefit of the runtime.
> > >>
> > >>
> > >> > On Sep 5, 2018, at 11:32 AM, Rui Wang <ru...@google.com.INVALID>
> > >> wrote:
> > >> >
> > >> > Thanks for your helpful response! It seems like disabling the
> > flattening
> > >> > will at least affect some rules in optimization. It might not be a
> > minor
> > >> > change.
> > >> >
> > >> >
> > >> > -Rui
> > >> >
> > >> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <
> zabetak@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> >> Hi Rui,
> > >> >>
> > >> >> Disabling flattening in some cases seems reasonable.
> > >> >>
> > >> >> If I am not mistaken, even in the existing code it is not used all
> > the
> > >> time
> > >> >> so it makes sense to become configurable.
> > >> >> For example, Calcite prepared statements (CalcitePrepareImpl) are
> > >> using the
> > >> >> flattener only for DDL operations that create materialized views
> (and
> > >> this
> > >> >> is because this code at some point passes from the PlannerImpl).
> > >> >> On the other hand, any query that is using the Planner will also
> pass
> > >> from
> > >> >> the flattener.
> > >> >>
> > >> >> Disabling the flattener does not mean that all rules will work
> > without
> > >> >> problems. The Javadoc of the RelStructuredTypeFlattener at some
> point
> > >> says
> > >> >> "This approach has the benefit that real optimizer and codegen
> rules
> > >> never
> > >> >> have to deal with structured types.". Due to this, it is very
> likely
> > >> that
> > >> >> some rules were written based on the fact that there are no
> > structured
> > >> >> types.
> > >> >>
> > >> >> Best,
> > >> >> Stamatis
> > >> >>
> > >> >>
> > >> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <
> > jhyde@apache.org
> > >> >
> > >> >> έγραψε:
> > >> >>
> > >> >>> Flattening was introduced mainly because the original engine used
> > flat
> > >> >>> column-oriented storage. Now we have several ways to executing,
> > >> >>> including generating java code.
> > >> >>>
> > >> >>> Adding a mode to disable flattening might make sense.
> > >> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang
> <ruwang@google.com.invalid
> > >
> > >> >>> wrote:
> > >> >>>>
> > >> >>>> Hi Community,
> > >> >>>>
> > >> >>>> While trying to support Row type in Apache Beam SQL on top of
> > >> Calcite,
> > >> >> I
> > >> >>>> realized flattening Row logic will make structure information of
> > Row
> > >> >> lost
> > >> >>>> after Projections. There is a use case where users want to mix
> Beam
> > >> >>>> programming model with Beam SQL together to process a dataset.
> The
> > >> >>>> following is an example of the use case:
> > >> >>>>
> > >> >>>> dataset.apply(something user defined)
> > >> >>>>            .apply(SELECT ...)
> > >> >>>>            .apply(something user defined)
> > >> >>>>
> > >> >>>> As you can see, after the SQL statement is applied, the data
> > >> structure
> > >> >>>> should be preserved for further processing.
> > >> >>>>
> > >> >>>> The most straightforward way to me is to make Struct fattening
> > >> optional
> > >> >>> so
> > >> >>>> I could choose to disable it and the Row structure is preserved.
> > Can
> > >> I
> > >> >>> ask
> > >> >>>> if it is feasible to make it happen? What could happen if Calcite
> > >> just
> > >> >>>> doesn't flatten Struct in flattener? (I tried to disable it but
> had
> > >> >>>> exceptions in optimizer. I wasn't sure if that were some minor
> > thing
> > >> to
> > >> >>> fix
> > >> >>>> or Struct flattening was a design choice so the impact of change
> > was
> > >> >>> huge)
> > >> >>>>
> > >> >>>> Additionally, if there is a way to keep the information that I
> can
> > >> use
> > >> >> to
> > >> >>>> reconstruct the Row after projections, it might be ok as well.
> Does
> > >> >> this
> > >> >>>> idea exist in Calcite? If it does not exist, how is this idea
> > >> compared
> > >> >>> with
> > >> >>>> disabling Struct flattening?
> > >> >>>>
> > >> >>>> Thanks,
> > >> >>>> Rui
> > >> >>>
> > >> >>
> > >>
> > >>
> >
>

Re: [Discuss] Make flattening on Struct/Row optional

Posted by Stamatis Zampetakis <za...@gmail.com>.

So basically thanks to Igor :)

On Wed, Dec 11, 2019 at 9:56 PM Rui Wang <am...@apache.org> wrote:

> Thanks Stamatis's suggestion. Indeed a recent effort in [1] enhanced the
> support that reconstructs ROW in the top SELECT, which is supposed to solve
> the problem.
>
>
>
> [1]: https://jira.apache.org/jira/browse/CALCITE-3138
>
> On Mon, Dec 9, 2019 at 3:21 PM Rui Wang <am...@apache.org> wrote:
>
> > Hello,
> >
> > Sorry for the long delay on this thread. Recently I heard about requests
> > on how to deal with STRUCT without flattening it again in BeamSQL. Also I
> > realized Flink has already disabled it in their codebase[1]. I did try to
> > remove STRUCT flattening and run unit tests of calcite core to see how
> many
> > tests breaks: it was 25, which wasn't that bad. So I would like to pick
> up
> > this effort again.
> >
> > Before I do it, I just want to ask if Calcite community supports this
> > effort (or think if it is a good idea)?
> >
> > My current execution plan will be the following:
> > 1. Add a new flag to FrameworkConfig to specify whether flattening
> STRUCT.
> > By default, it is yes.
> > 2. When disabling struct flatterner, add more tests to test STRUCT
> support
> > in general. For example, test STRUCT support on projection, join
> condition,
> > filtering, etc.  If there is something breaks, try to fix it.
> > 3. Check the 25 failed tests above and see why they have failed if struct
> > flattener is gone. Duplicate those failed tests but have necessary fixes
> to
> > make sure they can pass without STRUCT flattening.
> >
> >
> > [1]:
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166
> >
> >
> > -Rui
> >
> > On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde <jh...@apache.org> wrote:
> >
> >> It might not be minor, but it’s worth a try. At optimization time we
> >> treat all fields as fields, regardless of whether they have complex
> types
> >> (maps, arrays, multisets, records) so there should not be too many
> >> problems. The flattening was mainly for the benefit of the runtime.
> >>
> >>
> >> > On Sep 5, 2018, at 11:32 AM, Rui Wang <ru...@google.com.INVALID>
> >> wrote:
> >> >
> >> > Thanks for your helpful response! It seems like disabling the
> flattening
> >> > will at least affect some rules in optimization. It might not be a
> minor
> >> > change.
> >> >
> >> >
> >> > -Rui
> >> >
> >> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <zabetak@gmail.com
> >
> >> > wrote:
> >> >
> >> >> Hi Rui,
> >> >>
> >> >> Disabling flattening in some cases seems reasonable.
> >> >>
> >> >> If I am not mistaken, even in the existing code it is not used all
> the
> >> time
> >> >> so it makes sense to become configurable.
> >> >> For example, Calcite prepared statements (CalcitePrepareImpl) are
> >> using the
> >> >> flattener only for DDL operations that create materialized views (and
> >> this
> >> >> is because this code at some point passes from the PlannerImpl).
> >> >> On the other hand, any query that is using the Planner will also pass
> >> from
> >> >> the flattener.
> >> >>
> >> >> Disabling the flattener does not mean that all rules will work
> without
> >> >> problems. The Javadoc of the RelStructuredTypeFlattener at some point
> >> says
> >> >> "This approach has the benefit that real optimizer and codegen rules
> >> never
> >> >> have to deal with structured types.". Due to this, it is very likely
> >> that
> >> >> some rules were written based on the fact that there are no
> structured
> >> >> types.
> >> >>
> >> >> Best,
> >> >> Stamatis
> >> >>
> >> >>
> >> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <
> jhyde@apache.org
> >> >
> >> >> έγραψε:
> >> >>
> >> >>> Flattening was introduced mainly because the original engine used
> flat
> >> >>> column-oriented storage. Now we have several ways to executing,
> >> >>> including generating java code.
> >> >>>
> >> >>> Adding a mode to disable flattening might make sense.
> >> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ruwang@google.com.invalid
> >
> >> >>> wrote:
> >> >>>>
> >> >>>> Hi Community,
> >> >>>>
> >> >>>> While trying to support Row type in Apache Beam SQL on top of
> >> Calcite,
> >> >> I
> >> >>>> realized flattening Row logic will make structure information of
> Row
> >> >> lost
> >> >>>> after Projections. There is a use case where users want to mix Beam
> >> >>>> programming model with Beam SQL together to process a dataset. The
> >> >>>> following is an example of the use case:
> >> >>>>
> >> >>>> dataset.apply(something user defined)
> >> >>>>            .apply(SELECT ...)
> >> >>>>            .apply(something user defined)
> >> >>>>
> >> >>>> As you can see, after the SQL statement is applied, the data
> >> structure
> >> >>>> should be preserved for further processing.
> >> >>>>
> >> >>>> The most straightforward way to me is to make Struct fattening
> >> optional
> >> >>> so
> >> >>>> I could choose to disable it and the Row structure is preserved.
> Can
> >> I
> >> >>> ask
> >> >>>> if it is feasible to make it happen? What could happen if Calcite
> >> just
> >> >>>> doesn't flatten Struct in flattener? (I tried to disable it but had
> >> >>>> exceptions in optimizer. I wasn't sure if that were some minor
> thing
> >> to
> >> >>> fix
> >> >>>> or Struct flattening was a design choice so the impact of change
> was
> >> >>> huge)
> >> >>>>
> >> >>>> Additionally, if there is a way to keep the information that I can
> >> use
> >> >> to
> >> >>>> reconstruct the Row after projections, it might be ok as well. Does
> >> >> this
> >> >>>> idea exist in Calcite? If it does not exist, how is this idea
> >> compared
> >> >>> with
> >> >>>> disabling Struct flattening?
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Rui
> >> >>>
> >> >>
> >>
> >>
>

Re: [Discuss] Make flattening on Struct/Row optional

Posted by Rui Wang <am...@apache.org>.

Thanks Stamatis's suggestion. Indeed a recent effort in [1] enhanced the
support that reconstructs ROW in the top SELECT, which is supposed to solve
the problem.



[1]: https://jira.apache.org/jira/browse/CALCITE-3138

On Mon, Dec 9, 2019 at 3:21 PM Rui Wang <am...@apache.org> wrote:

> Hello,
>
> Sorry for the long delay on this thread. Recently I heard about requests
> on how to deal with STRUCT without flattening it again in BeamSQL. Also I
> realized Flink has already disabled it in their codebase[1]. I did try to
> remove STRUCT flattening and run unit tests of calcite core to see how many
> tests breaks: it was 25, which wasn't that bad. So I would like to pick up
> this effort again.
>
> Before I do it, I just want to ask if Calcite community supports this
> effort (or think if it is a good idea)?
>
> My current execution plan will be the following:
> 1. Add a new flag to FrameworkConfig to specify whether flattening STRUCT.
> By default, it is yes.
> 2. When disabling struct flatterner, add more tests to test STRUCT support
> in general. For example, test STRUCT support on projection, join condition,
> filtering, etc.  If there is something breaks, try to fix it.
> 3. Check the 25 failed tests above and see why they have failed if struct
> flattener is gone. Duplicate those failed tests but have necessary fixes to
> make sure they can pass without STRUCT flattening.
>
>
> [1]:
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166
>
>
> -Rui
>
> On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde <jh...@apache.org> wrote:
>
>> It might not be minor, but it’s worth a try. At optimization time we
>> treat all fields as fields, regardless of whether they have complex types
>> (maps, arrays, multisets, records) so there should not be too many
>> problems. The flattening was mainly for the benefit of the runtime.
>>
>>
>> > On Sep 5, 2018, at 11:32 AM, Rui Wang <ru...@google.com.INVALID>
>> wrote:
>> >
>> > Thanks for your helpful response! It seems like disabling the flattening
>> > will at least affect some rules in optimization. It might not be a minor
>> > change.
>> >
>> >
>> > -Rui
>> >
>> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <za...@gmail.com>
>> > wrote:
>> >
>> >> Hi Rui,
>> >>
>> >> Disabling flattening in some cases seems reasonable.
>> >>
>> >> If I am not mistaken, even in the existing code it is not used all the
>> time
>> >> so it makes sense to become configurable.
>> >> For example, Calcite prepared statements (CalcitePrepareImpl) are
>> using the
>> >> flattener only for DDL operations that create materialized views (and
>> this
>> >> is because this code at some point passes from the PlannerImpl).
>> >> On the other hand, any query that is using the Planner will also pass
>> from
>> >> the flattener.
>> >>
>> >> Disabling the flattener does not mean that all rules will work without
>> >> problems. The Javadoc of the RelStructuredTypeFlattener at some point
>> says
>> >> "This approach has the benefit that real optimizer and codegen rules
>> never
>> >> have to deal with structured types.". Due to this, it is very likely
>> that
>> >> some rules were written based on the fact that there are no structured
>> >> types.
>> >>
>> >> Best,
>> >> Stamatis
>> >>
>> >>
>> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <jhyde@apache.org
>> >
>> >> έγραψε:
>> >>
>> >>> Flattening was introduced mainly because the original engine used flat
>> >>> column-oriented storage. Now we have several ways to executing,
>> >>> including generating java code.
>> >>>
>> >>> Adding a mode to disable flattening might make sense.
>> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ru...@google.com.invalid>
>> >>> wrote:
>> >>>>
>> >>>> Hi Community,
>> >>>>
>> >>>> While trying to support Row type in Apache Beam SQL on top of
>> Calcite,
>> >> I
>> >>>> realized flattening Row logic will make structure information of Row
>> >> lost
>> >>>> after Projections. There is a use case where users want to mix Beam
>> >>>> programming model with Beam SQL together to process a dataset. The
>> >>>> following is an example of the use case:
>> >>>>
>> >>>> dataset.apply(something user defined)
>> >>>>            .apply(SELECT ...)
>> >>>>            .apply(something user defined)
>> >>>>
>> >>>> As you can see, after the SQL statement is applied, the data
>> structure
>> >>>> should be preserved for further processing.
>> >>>>
>> >>>> The most straightforward way to me is to make Struct fattening
>> optional
>> >>> so
>> >>>> I could choose to disable it and the Row structure is preserved. Can
>> I
>> >>> ask
>> >>>> if it is feasible to make it happen? What could happen if Calcite
>> just
>> >>>> doesn't flatten Struct in flattener? (I tried to disable it but had
>> >>>> exceptions in optimizer. I wasn't sure if that were some minor thing
>> to
>> >>> fix
>> >>>> or Struct flattening was a design choice so the impact of change was
>> >>> huge)
>> >>>>
>> >>>> Additionally, if there is a way to keep the information that I can
>> use
>> >> to
>> >>>> reconstruct the Row after projections, it might be ok as well. Does
>> >> this
>> >>>> idea exist in Calcite? If it does not exist, how is this idea
>> compared
>> >>> with
>> >>>> disabling Struct flattening?
>> >>>>
>> >>>> Thanks,
>> >>>> Rui
>> >>>
>> >>
>>
>>

Re: [Discuss] Make flattening on Struct/Row optional

Posted by Rui Wang <am...@apache.org>.

Hello,

Sorry for the long delay on this thread. Recently I heard about requests on
how to deal with STRUCT without flattening it again in BeamSQL. Also I
realized Flink has already disabled it in their codebase[1]. I did try to
remove STRUCT flattening and run unit tests of calcite core to see how many
tests breaks: it was 25, which wasn't that bad. So I would like to pick up
this effort again.

Before I do it, I just want to ask if Calcite community supports this
effort (or think if it is a good idea)?

My current execution plan will be the following:
1. Add a new flag to FrameworkConfig to specify whether flattening STRUCT.
By default, it is yes.
2. When disabling struct flatterner, add more tests to test STRUCT support
in general. For example, test STRUCT support on projection, join condition,
filtering, etc.  If there is something breaks, try to fix it.
3. Check the 25 failed tests above and see why they have failed if struct
flattener is gone. Duplicate those failed tests but have necessary fixes to
make sure they can pass without STRUCT flattening.


[1]:
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166


-Rui

On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde <jh...@apache.org> wrote:

> It might not be minor, but it’s worth a try. At optimization time we treat
> all fields as fields, regardless of whether they have complex types (maps,
> arrays, multisets, records) so there should not be too many problems. The
> flattening was mainly for the benefit of the runtime.
>
>
> > On Sep 5, 2018, at 11:32 AM, Rui Wang <ru...@google.com.INVALID> wrote:
> >
> > Thanks for your helpful response! It seems like disabling the flattening
> > will at least affect some rules in optimization. It might not be a minor
> > change.
> >
> >
> > -Rui
> >
> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <za...@gmail.com>
> > wrote:
> >
> >> Hi Rui,
> >>
> >> Disabling flattening in some cases seems reasonable.
> >>
> >> If I am not mistaken, even in the existing code it is not used all the
> time
> >> so it makes sense to become configurable.
> >> For example, Calcite prepared statements (CalcitePrepareImpl) are using
> the
> >> flattener only for DDL operations that create materialized views (and
> this
> >> is because this code at some point passes from the PlannerImpl).
> >> On the other hand, any query that is using the Planner will also pass
> from
> >> the flattener.
> >>
> >> Disabling the flattener does not mean that all rules will work without
> >> problems. The Javadoc of the RelStructuredTypeFlattener at some point
> says
> >> "This approach has the benefit that real optimizer and codegen rules
> never
> >> have to deal with structured types.". Due to this, it is very likely
> that
> >> some rules were written based on the fact that there are no structured
> >> types.
> >>
> >> Best,
> >> Stamatis
> >>
> >>
> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <jh...@apache.org>
> >> έγραψε:
> >>
> >>> Flattening was introduced mainly because the original engine used flat
> >>> column-oriented storage. Now we have several ways to executing,
> >>> including generating java code.
> >>>
> >>> Adding a mode to disable flattening might make sense.
> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ru...@google.com.invalid>
> >>> wrote:
> >>>>
> >>>> Hi Community,
> >>>>
> >>>> While trying to support Row type in Apache Beam SQL on top of Calcite,
> >> I
> >>>> realized flattening Row logic will make structure information of Row
> >> lost
> >>>> after Projections. There is a use case where users want to mix Beam
> >>>> programming model with Beam SQL together to process a dataset. The
> >>>> following is an example of the use case:
> >>>>
> >>>> dataset.apply(something user defined)
> >>>>            .apply(SELECT ...)
> >>>>            .apply(something user defined)
> >>>>
> >>>> As you can see, after the SQL statement is applied, the data structure
> >>>> should be preserved for further processing.
> >>>>
> >>>> The most straightforward way to me is to make Struct fattening
> optional
> >>> so
> >>>> I could choose to disable it and the Row structure is preserved. Can I
> >>> ask
> >>>> if it is feasible to make it happen? What could happen if Calcite just
> >>>> doesn't flatten Struct in flattener? (I tried to disable it but had
> >>>> exceptions in optimizer. I wasn't sure if that were some minor thing
> to
> >>> fix
> >>>> or Struct flattening was a design choice so the impact of change was
> >>> huge)
> >>>>
> >>>> Additionally, if there is a way to keep the information that I can use
> >> to
> >>>> reconstruct the Row after projections, it might be ok as well. Does
> >> this
> >>>> idea exist in Calcite? If it does not exist, how is this idea compared
> >>> with
> >>>> disabling Struct flattening?
> >>>>
> >>>> Thanks,
> >>>> Rui
> >>>
> >>
>
>

Re: [Discuss] Make flattening on Struct/Row optional

Posted by Julian Hyde <jh...@apache.org>.

It might not be minor, but it’s worth a try. At optimization time we treat all fields as fields, regardless of whether they have complex types (maps, arrays, multisets, records) so there should not be too many problems. The flattening was mainly for the benefit of the runtime.


> On Sep 5, 2018, at 11:32 AM, Rui Wang <ru...@google.com.INVALID> wrote:
> 
> Thanks for your helpful response! It seems like disabling the flattening
> will at least affect some rules in optimization. It might not be a minor
> change.
> 
> 
> -Rui
> 
> On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <za...@gmail.com>
> wrote:
> 
>> Hi Rui,
>> 
>> Disabling flattening in some cases seems reasonable.
>> 
>> If I am not mistaken, even in the existing code it is not used all the time
>> so it makes sense to become configurable.
>> For example, Calcite prepared statements (CalcitePrepareImpl) are using the
>> flattener only for DDL operations that create materialized views (and this
>> is because this code at some point passes from the PlannerImpl).
>> On the other hand, any query that is using the Planner will also pass from
>> the flattener.
>> 
>> Disabling the flattener does not mean that all rules will work without
>> problems. The Javadoc of the RelStructuredTypeFlattener at some point says
>> "This approach has the benefit that real optimizer and codegen rules never
>> have to deal with structured types.". Due to this, it is very likely that
>> some rules were written based on the fact that there are no structured
>> types.
>> 
>> Best,
>> Stamatis
>> 
>> 
>> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <jh...@apache.org>
>> έγραψε:
>> 
>>> Flattening was introduced mainly because the original engine used flat
>>> column-oriented storage. Now we have several ways to executing,
>>> including generating java code.
>>> 
>>> Adding a mode to disable flattening might make sense.
>>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ru...@google.com.invalid>
>>> wrote:
>>>> 
>>>> Hi Community,
>>>> 
>>>> While trying to support Row type in Apache Beam SQL on top of Calcite,
>> I
>>>> realized flattening Row logic will make structure information of Row
>> lost
>>>> after Projections. There is a use case where users want to mix Beam
>>>> programming model with Beam SQL together to process a dataset. The
>>>> following is an example of the use case:
>>>> 
>>>> dataset.apply(something user defined)
>>>>            .apply(SELECT ...)
>>>>            .apply(something user defined)
>>>> 
>>>> As you can see, after the SQL statement is applied, the data structure
>>>> should be preserved for further processing.
>>>> 
>>>> The most straightforward way to me is to make Struct fattening optional
>>> so
>>>> I could choose to disable it and the Row structure is preserved. Can I
>>> ask
>>>> if it is feasible to make it happen? What could happen if Calcite just
>>>> doesn't flatten Struct in flattener? (I tried to disable it but had
>>>> exceptions in optimizer. I wasn't sure if that were some minor thing to
>>> fix
>>>> or Struct flattening was a design choice so the impact of change was
>>> huge)
>>>> 
>>>> Additionally, if there is a way to keep the information that I can use
>> to
>>>> reconstruct the Row after projections, it might be ok as well. Does
>> this
>>>> idea exist in Calcite? If it does not exist, how is this idea compared
>>> with
>>>> disabling Struct flattening?
>>>> 
>>>> Thanks,
>>>> Rui
>>> 
>>

Re: [Discuss] Make flattening on Struct/Row optional

Posted by Rui Wang <ru...@google.com.INVALID>.

Thanks for your helpful response! It seems like disabling the flattening
will at least affect some rules in optimization. It might not be a minor
change.


-Rui

On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Hi Rui,
>
> Disabling flattening in some cases seems reasonable.
>
> If I am not mistaken, even in the existing code it is not used all the time
> so it makes sense to become configurable.
> For example, Calcite prepared statements (CalcitePrepareImpl) are using the
> flattener only for DDL operations that create materialized views (and this
> is because this code at some point passes from the PlannerImpl).
> On the other hand, any query that is using the Planner will also pass from
> the flattener.
>
> Disabling the flattener does not mean that all rules will work without
> problems. The Javadoc of the RelStructuredTypeFlattener at some point says
> "This approach has the benefit that real optimizer and codegen rules never
> have to deal with structured types.". Due to this, it is very likely that
> some rules were written based on the fact that there are no structured
> types.
>
> Best,
> Stamatis
>
>
> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <jh...@apache.org>
> έγραψε:
>
> > Flattening was introduced mainly because the original engine used flat
> > column-oriented storage. Now we have several ways to executing,
> > including generating java code.
> >
> > Adding a mode to disable flattening might make sense.
> > On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ru...@google.com.invalid>
> > wrote:
> > >
> > > Hi Community,
> > >
> > > While trying to support Row type in Apache Beam SQL on top of Calcite,
> I
> > > realized flattening Row logic will make structure information of Row
> lost
> > > after Projections. There is a use case where users want to mix Beam
> > > programming model with Beam SQL together to process a dataset. The
> > > following is an example of the use case:
> > >
> > > dataset.apply(something user defined)
> > >             .apply(SELECT ...)
> > >             .apply(something user defined)
> > >
> > > As you can see, after the SQL statement is applied, the data structure
> > > should be preserved for further processing.
> > >
> > > The most straightforward way to me is to make Struct fattening optional
> > so
> > > I could choose to disable it and the Row structure is preserved. Can I
> > ask
> > > if it is feasible to make it happen? What could happen if Calcite just
> > > doesn't flatten Struct in flattener? (I tried to disable it but had
> > > exceptions in optimizer. I wasn't sure if that were some minor thing to
> > fix
> > > or Struct flattening was a design choice so the impact of change was
> > huge)
> > >
> > > Additionally, if there is a way to keep the information that I can use
> to
> > > reconstruct the Row after projections, it might be ok as well. Does
> this
> > > idea exist in Calcite? If it does not exist, how is this idea compared
> > with
> > > disabling Struct flattening?
> > >
> > > Thanks,
> > > Rui
> >
>

Re: [Discuss] Make flattening on Struct/Row optional

Posted by Stamatis Zampetakis <za...@gmail.com>.

Hi Rui,

Disabling flattening in some cases seems reasonable.

If I am not mistaken, even in the existing code it is not used all the time
so it makes sense to become configurable.
For example, Calcite prepared statements (CalcitePrepareImpl) are using the
flattener only for DDL operations that create materialized views (and this
is because this code at some point passes from the PlannerImpl).
On the other hand, any query that is using the Planner will also pass from
the flattener.

Disabling the flattener does not mean that all rules will work without
problems. The Javadoc of the RelStructuredTypeFlattener at some point says
"This approach has the benefit that real optimizer and codegen rules never
have to deal with structured types.". Due to this, it is very likely that
some rules were written based on the fact that there are no structured
types.

Best,
Stamatis

Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <jh...@apache.org>
έγραψε:

> Flattening was introduced mainly because the original engine used flat
> column-oriented storage. Now we have several ways to executing,
> including generating java code.
>
> Adding a mode to disable flattening might make sense.
> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ru...@google.com.invalid>
> wrote:
> >
> > Hi Community,
> >
> > While trying to support Row type in Apache Beam SQL on top of Calcite, I
> > realized flattening Row logic will make structure information of Row lost
> > after Projections. There is a use case where users want to mix Beam
> > programming model with Beam SQL together to process a dataset. The
> > following is an example of the use case:
> >
> > dataset.apply(something user defined)
> >             .apply(SELECT ...)
> >             .apply(something user defined)
> >
> > As you can see, after the SQL statement is applied, the data structure
> > should be preserved for further processing.
> >
> > The most straightforward way to me is to make Struct fattening optional
> so
> > I could choose to disable it and the Row structure is preserved. Can I
> ask
> > if it is feasible to make it happen? What could happen if Calcite just
> > doesn't flatten Struct in flattener? (I tried to disable it but had
> > exceptions in optimizer. I wasn't sure if that were some minor thing to
> fix
> > or Struct flattening was a design choice so the impact of change was
> huge)
> >
> > Additionally, if there is a way to keep the information that I can use to
> > reconstruct the Row after projections, it might be ok as well. Does this
> > idea exist in Calcite? If it does not exist, how is this idea compared
> with
> > disabling Struct flattening?
> >
> > Thanks,
> > Rui
>

Re: [Discuss] Make flattening on Struct/Row optional

Posted by Julian Hyde <jh...@apache.org>.

Flattening was introduced mainly because the original engine used flat
column-oriented storage. Now we have several ways to executing,
including generating java code.

Adding a mode to disable flattening might make sense.
On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ru...@google.com.invalid> wrote:
>
> Hi Community,
>
> While trying to support Row type in Apache Beam SQL on top of Calcite, I
> realized flattening Row logic will make structure information of Row lost
> after Projections. There is a use case where users want to mix Beam
> programming model with Beam SQL together to process a dataset. The
> following is an example of the use case:
>
> dataset.apply(something user defined)
>             .apply(SELECT ...)
>             .apply(something user defined)
>
> As you can see, after the SQL statement is applied, the data structure
> should be preserved for further processing.
>
> The most straightforward way to me is to make Struct fattening optional so
> I could choose to disable it and the Row structure is preserved. Can I ask
> if it is feasible to make it happen? What could happen if Calcite just
> doesn't flatten Struct in flattener? (I tried to disable it but had
> exceptions in optimizer. I wasn't sure if that were some minor thing to fix
> or Struct flattening was a design choice so the impact of change was huge)
>
> Additionally, if there is a way to keep the information that I can use to
> reconstruct the Row after projections, it might be ok as well. Does this
> idea exist in Calcite? If it does not exist, how is this idea compared with
> disabling Struct flattening?
>
> Thanks,
> Rui