You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Francesco Guardiani <fr...@ververica.com> on 2021/12/08 17:02:30 UTC

[DISCUSS][FLINK-24427] Hide Scala from table planner

Hi all,
In case you haven't seen, last week I published in the issue comments this
document to explain how we're proceeding to hide Scala from table planner:
https://docs.google.com/document/d/12yDUCnvcwU2mODBKTHQ1xhfOq1ujYUrXltiN_rbhT34/edit?usp=sharing

There is a section I've added yesterday which is particularly relevant,
because it explains the impact on the distribution. I strongly encourage
people to look at it.

Once we perform all the changes, I'm gonna announce them on the user
mailing list as well, together with the package name changes already
brought in by #17897 <https://github.com/apache/flink/pull/17897> to
flink-parquet and flink-orc.

Thanks,
FG

-- 

Francesco Guardiani | Software Engineer

francesco@ververica.com


<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH

Registered at Amtsgericht Charlottenburg: HRB 158244 B

Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
Jinwei (Kevin) Zhang

Re: [DISCUSS][FLINK-24427] Hide Scala from table planner

Posted by Francesco Guardiani <fr...@ververica.com>.
> When would the follow ups (mentioned under out of
scope) be done?

For PyFlink, I think we can reasonably get it done soon-ish. What needs to
be done is for PyFlink to start using the new type system and then the
planner needs to expose a way to plug in rules and to use the code
generator. But we're not talking about this release cycle for sure, and of
course that depends on the bandwidth PyFlink contributors can allocate to
it.

For tests, we'll see what we can do in this release cycle, but this
requires an iterative process consisting on fixing and even rewriting
existing test utilities, and will definitely span the next release and
probably even the one after. On the other hand, the tests issue is relevant
only to who develops format and connectors and rely on internal
undocumented test utilities, other users can just use the new
planner-loader module for developing tests.

For Hive, I can't really say because, as far as I investigated, the leaks
are too deep and it requires significant amount of work to isolate them.

> For the old type system for UDFs: naive question as I am not involved in
SQL much, is there an agreed upon deprecation/removal plan for the legacy
type system yet?

IIRC The legacy type system was deprecated a year ago, and I'm not aware of
any component in particular, except pyflink, that relies heavily on it. It
wasn't removed until now just because it's so deep in the codebase, that
it's an effort on its own and no one in the community had time to get on it
:)

> I am asking because the intermediate state of the uber JAR
described in the document seems a bit messy and I fear that users will
stumble across that.

I wouldn't say it's messy, on the contrary what we have right now is messy,
as it's just one big jar with everything, including apis, scala, runtime
and planner. The splitting in different jars is way nicer, as you have apis
in a single jar, runtime in a single jar and planner in a single jar as
three different components. The result is:

* You can swap planner and planner-loader depending on the fact you need
planner internals or not
* You can use the scala version you want
* Potentially, your task manager deployments can remove the planner from
the classpath, as only runtime and apis are required to run the actual job

I think what's really important here is that we communicate this change to
the users, both through mailing list mails, release notes and documentation
update.

Because we're going to continue to ship the old planner jar, together with
the new planner-loader jar, I suggest to start using the planner-loader as
default, as described in the doc, and during the RC, if we see the new
planner-loader is unstable, we swap the default planner in "/lib" with the
old jar before the 1.15 release.

FG

On Fri, Dec 10, 2021 at 1:04 PM Konstantin Knauf <kn...@apache.org> wrote:

> Hi Francesco,
>
> Thanks for this summary. When would the follow ups (mentioned under out of
> scope) be done? I am asking because the intermediate state of the uber JAR
> described in the document seems a bit messy and I fear that users will
> stumble across that.
>
> For the old type system for UDFs: naive question as I am not involved in
> SQL much, is there an agreed upon deprecation/removal plan for the legacy
> type system yet?
>
> Cheers,
>
> Konstantin
>
> On Wed, Dec 8, 2021 at 6:02 PM Francesco Guardiani <
> francesco@ververica.com>
> wrote:
>
> > Hi all,
> > In case you haven't seen, last week I published in the issue comments
> this
> > document to explain how we're proceeding to hide Scala from table
> planner:
> >
> >
> https://docs.google.com/document/d/12yDUCnvcwU2mODBKTHQ1xhfOq1ujYUrXltiN_rbhT34/edit?usp=sharing
> >
> > There is a section I've added yesterday which is particularly relevant,
> > because it explains the impact on the distribution. I strongly encourage
> > people to look at it.
> >
> > Once we perform all the changes, I'm gonna announce them on the user
> > mailing list as well, together with the package name changes already
> > brought in by #17897 <https://github.com/apache/flink/pull/17897> to
> > flink-parquet and flink-orc.
> >
> > Thanks,
> > FG
> >
> > --
> >
> > Francesco Guardiani | Software Engineer
> >
> > francesco@ververica.com
> >
> >
> > <https://www.ververica.com/>
> >
> > Follow us @VervericaData
> >
> > --
> >
> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> >
> > Ververica GmbH
> >
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >
> > Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
> > Jinwei (Kevin) Zhang
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>

Re: [DISCUSS][FLINK-24427] Hide Scala from table planner

Posted by Konstantin Knauf <kn...@apache.org>.
Hi Francesco,

Thanks for this summary. When would the follow ups (mentioned under out of
scope) be done? I am asking because the intermediate state of the uber JAR
described in the document seems a bit messy and I fear that users will
stumble across that.

For the old type system for UDFs: naive question as I am not involved in
SQL much, is there an agreed upon deprecation/removal plan for the legacy
type system yet?

Cheers,

Konstantin

On Wed, Dec 8, 2021 at 6:02 PM Francesco Guardiani <fr...@ververica.com>
wrote:

> Hi all,
> In case you haven't seen, last week I published in the issue comments this
> document to explain how we're proceeding to hide Scala from table planner:
>
> https://docs.google.com/document/d/12yDUCnvcwU2mODBKTHQ1xhfOq1ujYUrXltiN_rbhT34/edit?usp=sharing
>
> There is a section I've added yesterday which is particularly relevant,
> because it explains the impact on the distribution. I strongly encourage
> people to look at it.
>
> Once we perform all the changes, I'm gonna announce them on the user
> mailing list as well, together with the package name changes already
> brought in by #17897 <https://github.com/apache/flink/pull/17897> to
> flink-parquet and flink-orc.
>
> Thanks,
> FG
>
> --
>
> Francesco Guardiani | Software Engineer
>
> francesco@ververica.com
>
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
>
> Ververica GmbH
>
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>
> Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
> Jinwei (Kevin) Zhang
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk