You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Xingbo Huang <hx...@gmail.com> on 2021/03/12 06:30:55 UTC

[DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Hi everyone,

Since release-1.11, pyflink has introduced cython support and we will
release 7 packages (for different platforms and Python versions) to PyPI
for each release and the size of each package is more than 200MB as we need
to bundle the jar files into the package. The entire project space in PyPI
grows very fast, and we need to apply to PyPI for more project space
frequently. Please refer to [https://github.com/pypa/pypi-support/issues/831]
for more details.

The root cause to this problem is that we bundled the jar files in each
package. This is actually unnecessary if we could extract the jar files
into a separate package which is dedicated to hold the jar files.

I’d like to propose to split the pyflink package into two packages: the
original apache-flink  and apache-flink-libraries (Any suggestions for the
name?). The package apache-flink-libraries only contains jar files and
there is only one apache-flink-libraries package for each release. The
package apache-flink depends on apache-flink-libraries and for users, they
still only need to install apache-flink and there is nothing different from
before. We still need to release multiple wheel packages of apache-flink.
However, the size will be very small as it doesn't contain the jar files
any more.

Looking forward to your feedback.

Best,

Xingbo

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Xingbo Huang <hx...@gmail.com>.
Thanks for the feedback everyone. I will proceed if there is no objection.

Best,
Xingbo

Till Rohrmann <tr...@apache.org> 于2021年3月22日周一 下午5:30写道:

> If there is no other way, then I would say let's go with splitting the
> modules. This is already better than keeping the Flink binaries bundled
> with every Python/platform package.
>
> Cheers,
> Till
>
> On Mon, Mar 22, 2021 at 8:28 AM Xingbo Huang <hx...@gmail.com> wrote:
>
> > When we **pip install** a wheel package, it just unpacks the wheel
> package
> > and installs its dependencies[1]. There is no way to download things from
> > an external website during installation. It works differently from the
> > source package where we could download something in the setup.py. This is
> > explained in detail in [2]. So I'm afraid that splitting the package is
> the
> > only solution we have if we want to reduce the package size of pyflink.
> >
> > [1] https://www.python.org/dev/peps/pep-0427/
> > [2] https://realpython.com/python-wheels/#advantages-of-python-wheels
> >
> > Best,
> > Xingbo
> >
> > Till Rohrmann <tr...@apache.org> 于2021年3月19日周五 下午6:32写道:
> >
> > > I think that we should try to reduce the size of the packages by either
> > > splitting them or by having another means to retrieve the Java
> binaries.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Mar 19, 2021 at 2:58 AM Xingbo Huang <hx...@gmail.com>
> wrote:
> > >
> > > > Hi Till,
> > > >
> > > > The package size of tensorflow[1] is also very big(about 300MB+).
> > > However,
> > > > it does not try to solve the problem, but expands the space limit in
> > PyPI
> > > > frequently whenever the project space is full. We could also choose
> > this
> > > > option. According to our current release frequency, we probably need
> to
> > > > apply for 15GB expansion every year. There are not too many similar
> > > cases,
> > > > so there is also no standard solution to refer to. But the behavior
> of
> > > > splitting a project into multiple packages is quite common. For
> > example,
> > > > apache airflow will prepare a corresponding release package for each
> > > > provider[2].
> > > >
> > > > So I think there are currently two solutions in my mind which could
> > work.
> > > >
> > > > 1. Just keep the current solution and expand the space limit in PyPI
> > > > whenever the space is full.
> > > >
> > > > 2. Split into two packages to reduce the wheel package size.
> > > >
> > > > [1] https://pypi.org/project/tensorflow/#files
> > > > [2] https://pypi.org/search/?q=apache-airflow-*&o=
> > > >
> > > > Best,
> > > > Xingbo
> > > >
> > > > Till Rohrmann <tr...@apache.org> 于2021年3月17日周三 下午9:22写道:
> > > >
> > > > > How do other projects solve this problem?
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Wed, Mar 17, 2021 at 3:45 AM Xingbo Huang <hx...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Chesnay,
> > > > > >
> > > > > > Yes, in most cases, we can indeed download the required jars in
> > > > > `setup.py`,
> > > > > > which is also the solution I originally thought of reducing the
> > size
> > > of
> > > > > > wheel packages. However, I'm afraid that it will not work in
> > > scenarios
> > > > > when
> > > > > > accessing the external network is not possible which is very
> common
> > > in
> > > > > the
> > > > > > production cluster.
> > > > > >
> > > > > > Best,
> > > > > > Xingbo
> > > > > >
> > > > > > Chesnay Schepler <ch...@apache.org> 于2021年3月16日周二 下午8:32写道:
> > > > > >
> > > > > > > This proposed apache-flink-libraries package would just contain
> > the
> > > > > > > binary, right? And effectively be unusable to the python
> audience
> > > on
> > > > > > > it's own.
> > > > > > >
> > > > > > > Essentially we are just abusing Pypi for shipping a java
> binary.
> > Is
> > > > > > > there no way for us to download the jars when the python
> package
> > is
> > > > > > > being installed? (e.g., in setup.py)
> > > > > > >
> > > > > > > On 3/16/2021 1:23 PM, Dian Fu wrote:
> > > > > > > > Yes, the size of .whl file in PyFlink will also be about 3MB
> if
> > > we
> > > > > > split
> > > > > > > the package. Currently the package is big because we bundled
> the
> > > jar
> > > > > > files
> > > > > > > in it.
> > > > > > > >
> > > > > > > >> 2021年3月16日 下午8:13,Chesnay Schepler <ch...@apache.org> 写道:
> > > > > > > >>
> > > > > > > >> key difference being that the beam .whl files are 3mb large,
> > aka
> > > > 60x
> > > > > > > smaller.
> > > > > > > >>
> > > > > > > >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> > > > > > > >>> Hi Chesnay,
> > > > > > > >>>
> > > > > > > >>> We will publish binary packages separately for:
> > > > > > > >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> > > > > > > >>> 2) Linux / Mac separately
> > > > > > > >>>
> > > > > > > >>> Besides, there is also a source package which is used when
> > none
> > > > of
> > > > > > the
> > > > > > > above binary packages is usable, e.g. for Window users.
> > > > > > > >>>
> > > > > > > >>> PS: publishing multiple binary packages is very common in
> > > Python
> > > > > > > world, e.g. Beam published 22 packages in 2.28, Pandas
> published
> > 16
> > > > > > > packages in 1.2.3 [2]. We could also publishing more packages
> if
> > we
> > > > > > > splitting the packages as the cost of adding another package
> will
> > > be
> > > > > very
> > > > > > > small.
> > > > > > > >>>
> > > > > > > >>> Regards,
> > > > > > > >>> Dian
> > > > > > > >>>
> > > > > > > >>> [1] https://pypi.org/project/apache-beam/#files <
> > > > > > > https://pypi.org/project/apache-beam/#files> <
> > > > > > > https://pypi.org/project/apache-beam/#files <
> > > > > > > https://pypi.org/project/apache-beam/#files>>
> > > > > > > >>> [2] https://pypi.org/project/pandas/#files
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Hi Xintong,
> > > > > > > >>>
> > > > > > > >>> Yes, you are right that there is 9 packages in 1.12 as we
> > added
> > > > > > Python
> > > > > > > 3.8 support in 1.12.
> > > > > > > >>>
> > > > > > > >>> Regards,
> > > > > > > >>> Dian
> > > > > > > >>>
> > > > > > > >>>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com>
> 写道:
> > > > > > > >>>>
> > > > > > > >>>> And it's not only uploaded to PyPI, but the ASF mirrors as
> > > well.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > >
> > > https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> > > > > > > >>>>
> > > > > > > >>>> Thank you~
> > > > > > > >>>>
> > > > > > > >>>> Xintong Song
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <
> > > > > tonysong820@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > > >>>>
> > > > > > > >>>>> Actually, I think it's 9 packages, not 7.
> > > > > > > >>>>>
> > > > > > > >>>>> Check here for the 1.12.2 packages.
> > > > > > > >>>>> https://pypi.org/project/apache-flink/#files
> > > > > > > >>>>>
> > > > > > > >>>>> Thank you~
> > > > > > > >>>>>
> > > > > > > >>>>> Xintong Song
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <
> > > > > > chesnay@apache.org
> > > > > > > >
> > > > > > > >>>>> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> Am I reading this correctly that we publish 7 different
> > > > > artifacts
> > > > > > > just
> > > > > > > >>>>>> for python?
> > > > > > > >>>>>> What does the release matrix look like?
> > > > > > > >>>>>>
> > > > > > > >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> > > > > > > >>>>>>> Hi Xingbo,
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Thanks a lot for bringing up this discussion. Actually
> > the
> > > > size
> > > > > > > limit
> > > > > > > >>>>>> already becomes an issue during releasing 1.11.3 and
> > 1.12.1.
> > > > It
> > > > > > > blocks us
> > > > > > > >>>>>> to publish PyFlink packages to PyPI during the release
> as
> > > > there
> > > > > is
> > > > > > > no
> > > > > > > >>>>>> enough space left (PS: already published the packages
> > after
> > > > > > > increasing the
> > > > > > > >>>>>> size limit).
> > > > > > > >>>>>>> Considering that the total package size are about 1.5GB
> > > > (220MB
> > > > > *
> > > > > > > 7) for
> > > > > > > >>>>>> each release, it makes sense to split the PyFlink
> package.
> > > It
> > > > > > could
> > > > > > > reduce
> > > > > > > >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB)
> > for
> > > > > each
> > > > > > > release.
> > > > > > > >>>>>> We don’t need to increase the size limit any more in the
> > > next
> > > > > few
> > > > > > > years as
> > > > > > > >>>>>> currently we still have about 7.5 GB space left.
> > > > > > > >>>>>>> So +1 from my side.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Regards,
> > > > > > > >>>>>>> Dian
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com>
> 写道:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Hi everyone,
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Since release-1.11, pyflink has introduced cython
> > support
> > > > and
> > > > > we
> > > > > > > will
> > > > > > > >>>>>>>> release 7 packages (for different platforms and Python
> > > > > versions)
> > > > > > > to
> > > > > > > >>>>>> PyPI
> > > > > > > >>>>>>>> for each release and the size of each package is more
> > than
> > > > > 200MB
> > > > > > > as we
> > > > > > > >>>>>> need
> > > > > > > >>>>>>>> to bundle the jar files into the package. The entire
> > > project
> > > > > > > space in
> > > > > > > >>>>>> PyPI
> > > > > > > >>>>>>>> grows very fast, and we need to apply to PyPI for more
> > > > project
> > > > > > > space
> > > > > > > >>>>>>>> frequently. Please refer to [
> > > > > > > >>>>>> https://github.com/pypa/pypi-support/issues/831]
> > > > > > > >>>>>>>> for more details.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> The root cause to this problem is that we bundled the
> > jar
> > > > > files
> > > > > > > in each
> > > > > > > >>>>>>>> package. This is actually unnecessary if we could
> > extract
> > > > the
> > > > > > jar
> > > > > > > files
> > > > > > > >>>>>>>> into a separate package which is dedicated to hold the
> > jar
> > > > > > files.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> I’d like to propose to split the pyflink package into
> > two
> > > > > > > packages: the
> > > > > > > >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> > > > > > > suggestions for
> > > > > > > >>>>>> the
> > > > > > > >>>>>>>> name?). The package apache-flink-libraries only
> contains
> > > jar
> > > > > > > files and
> > > > > > > >>>>>>>> there is only one apache-flink-libraries package for
> > each
> > > > > > > release. The
> > > > > > > >>>>>>>> package apache-flink depends on apache-flink-libraries
> > and
> > > > for
> > > > > > > users,
> > > > > > > >>>>>> they
> > > > > > > >>>>>>>> still only need to install apache-flink and there is
> > > nothing
> > > > > > > different
> > > > > > > >>>>>> from
> > > > > > > >>>>>>>> before. We still need to release multiple wheel
> packages
> > > of
> > > > > > > >>>>>> apache-flink.
> > > > > > > >>>>>>>> However, the size will be very small as it doesn't
> > contain
> > > > the
> > > > > > jar
> > > > > > > >>>>>> files
> > > > > > > >>>>>>>> any more.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Looking forward to your feedback.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Best,
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Xingbo
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Till Rohrmann <tr...@apache.org>.
If there is no other way, then I would say let's go with splitting the
modules. This is already better than keeping the Flink binaries bundled
with every Python/platform package.

Cheers,
Till

On Mon, Mar 22, 2021 at 8:28 AM Xingbo Huang <hx...@gmail.com> wrote:

> When we **pip install** a wheel package, it just unpacks the wheel package
> and installs its dependencies[1]. There is no way to download things from
> an external website during installation. It works differently from the
> source package where we could download something in the setup.py. This is
> explained in detail in [2]. So I'm afraid that splitting the package is the
> only solution we have if we want to reduce the package size of pyflink.
>
> [1] https://www.python.org/dev/peps/pep-0427/
> [2] https://realpython.com/python-wheels/#advantages-of-python-wheels
>
> Best,
> Xingbo
>
> Till Rohrmann <tr...@apache.org> 于2021年3月19日周五 下午6:32写道:
>
> > I think that we should try to reduce the size of the packages by either
> > splitting them or by having another means to retrieve the Java binaries.
> >
> > Cheers,
> > Till
> >
> > On Fri, Mar 19, 2021 at 2:58 AM Xingbo Huang <hx...@gmail.com> wrote:
> >
> > > Hi Till,
> > >
> > > The package size of tensorflow[1] is also very big(about 300MB+).
> > However,
> > > it does not try to solve the problem, but expands the space limit in
> PyPI
> > > frequently whenever the project space is full. We could also choose
> this
> > > option. According to our current release frequency, we probably need to
> > > apply for 15GB expansion every year. There are not too many similar
> > cases,
> > > so there is also no standard solution to refer to. But the behavior of
> > > splitting a project into multiple packages is quite common. For
> example,
> > > apache airflow will prepare a corresponding release package for each
> > > provider[2].
> > >
> > > So I think there are currently two solutions in my mind which could
> work.
> > >
> > > 1. Just keep the current solution and expand the space limit in PyPI
> > > whenever the space is full.
> > >
> > > 2. Split into two packages to reduce the wheel package size.
> > >
> > > [1] https://pypi.org/project/tensorflow/#files
> > > [2] https://pypi.org/search/?q=apache-airflow-*&o=
> > >
> > > Best,
> > > Xingbo
> > >
> > > Till Rohrmann <tr...@apache.org> 于2021年3月17日周三 下午9:22写道:
> > >
> > > > How do other projects solve this problem?
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Wed, Mar 17, 2021 at 3:45 AM Xingbo Huang <hx...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Chesnay,
> > > > >
> > > > > Yes, in most cases, we can indeed download the required jars in
> > > > `setup.py`,
> > > > > which is also the solution I originally thought of reducing the
> size
> > of
> > > > > wheel packages. However, I'm afraid that it will not work in
> > scenarios
> > > > when
> > > > > accessing the external network is not possible which is very common
> > in
> > > > the
> > > > > production cluster.
> > > > >
> > > > > Best,
> > > > > Xingbo
> > > > >
> > > > > Chesnay Schepler <ch...@apache.org> 于2021年3月16日周二 下午8:32写道:
> > > > >
> > > > > > This proposed apache-flink-libraries package would just contain
> the
> > > > > > binary, right? And effectively be unusable to the python audience
> > on
> > > > > > it's own.
> > > > > >
> > > > > > Essentially we are just abusing Pypi for shipping a java binary.
> Is
> > > > > > there no way for us to download the jars when the python package
> is
> > > > > > being installed? (e.g., in setup.py)
> > > > > >
> > > > > > On 3/16/2021 1:23 PM, Dian Fu wrote:
> > > > > > > Yes, the size of .whl file in PyFlink will also be about 3MB if
> > we
> > > > > split
> > > > > > the package. Currently the package is big because we bundled the
> > jar
> > > > > files
> > > > > > in it.
> > > > > > >
> > > > > > >> 2021年3月16日 下午8:13,Chesnay Schepler <ch...@apache.org> 写道:
> > > > > > >>
> > > > > > >> key difference being that the beam .whl files are 3mb large,
> aka
> > > 60x
> > > > > > smaller.
> > > > > > >>
> > > > > > >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> > > > > > >>> Hi Chesnay,
> > > > > > >>>
> > > > > > >>> We will publish binary packages separately for:
> > > > > > >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> > > > > > >>> 2) Linux / Mac separately
> > > > > > >>>
> > > > > > >>> Besides, there is also a source package which is used when
> none
> > > of
> > > > > the
> > > > > > above binary packages is usable, e.g. for Window users.
> > > > > > >>>
> > > > > > >>> PS: publishing multiple binary packages is very common in
> > Python
> > > > > > world, e.g. Beam published 22 packages in 2.28, Pandas published
> 16
> > > > > > packages in 1.2.3 [2]. We could also publishing more packages if
> we
> > > > > > splitting the packages as the cost of adding another package will
> > be
> > > > very
> > > > > > small.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Dian
> > > > > > >>>
> > > > > > >>> [1] https://pypi.org/project/apache-beam/#files <
> > > > > > https://pypi.org/project/apache-beam/#files> <
> > > > > > https://pypi.org/project/apache-beam/#files <
> > > > > > https://pypi.org/project/apache-beam/#files>>
> > > > > > >>> [2] https://pypi.org/project/pandas/#files
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Hi Xintong,
> > > > > > >>>
> > > > > > >>> Yes, you are right that there is 9 packages in 1.12 as we
> added
> > > > > Python
> > > > > > 3.8 support in 1.12.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Dian
> > > > > > >>>
> > > > > > >>>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
> > > > > > >>>>
> > > > > > >>>> And it's not only uploaded to PyPI, but the ASF mirrors as
> > well.
> > > > > > >>>>
> > > > > > >>>>
> > > > >
> > https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> > > > > > >>>>
> > > > > > >>>> Thank you~
> > > > > > >>>>
> > > > > > >>>> Xintong Song
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <
> > > > tonysong820@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > > >>>>
> > > > > > >>>>> Actually, I think it's 9 packages, not 7.
> > > > > > >>>>>
> > > > > > >>>>> Check here for the 1.12.2 packages.
> > > > > > >>>>> https://pypi.org/project/apache-flink/#files
> > > > > > >>>>>
> > > > > > >>>>> Thank you~
> > > > > > >>>>>
> > > > > > >>>>> Xintong Song
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <
> > > > > chesnay@apache.org
> > > > > > >
> > > > > > >>>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Am I reading this correctly that we publish 7 different
> > > > artifacts
> > > > > > just
> > > > > > >>>>>> for python?
> > > > > > >>>>>> What does the release matrix look like?
> > > > > > >>>>>>
> > > > > > >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> > > > > > >>>>>>> Hi Xingbo,
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> Thanks a lot for bringing up this discussion. Actually
> the
> > > size
> > > > > > limit
> > > > > > >>>>>> already becomes an issue during releasing 1.11.3 and
> 1.12.1.
> > > It
> > > > > > blocks us
> > > > > > >>>>>> to publish PyFlink packages to PyPI during the release as
> > > there
> > > > is
> > > > > > no
> > > > > > >>>>>> enough space left (PS: already published the packages
> after
> > > > > > increasing the
> > > > > > >>>>>> size limit).
> > > > > > >>>>>>> Considering that the total package size are about 1.5GB
> > > (220MB
> > > > *
> > > > > > 7) for
> > > > > > >>>>>> each release, it makes sense to split the PyFlink package.
> > It
> > > > > could
> > > > > > reduce
> > > > > > >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB)
> for
> > > > each
> > > > > > release.
> > > > > > >>>>>> We don’t need to increase the size limit any more in the
> > next
> > > > few
> > > > > > years as
> > > > > > >>>>>> currently we still have about 7.5 GB space left.
> > > > > > >>>>>>> So +1 from my side.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Regards,
> > > > > > >>>>>>> Dian
> > > > > > >>>>>>>
> > > > > > >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Hi everyone,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Since release-1.11, pyflink has introduced cython
> support
> > > and
> > > > we
> > > > > > will
> > > > > > >>>>>>>> release 7 packages (for different platforms and Python
> > > > versions)
> > > > > > to
> > > > > > >>>>>> PyPI
> > > > > > >>>>>>>> for each release and the size of each package is more
> than
> > > > 200MB
> > > > > > as we
> > > > > > >>>>>> need
> > > > > > >>>>>>>> to bundle the jar files into the package. The entire
> > project
> > > > > > space in
> > > > > > >>>>>> PyPI
> > > > > > >>>>>>>> grows very fast, and we need to apply to PyPI for more
> > > project
> > > > > > space
> > > > > > >>>>>>>> frequently. Please refer to [
> > > > > > >>>>>> https://github.com/pypa/pypi-support/issues/831]
> > > > > > >>>>>>>> for more details.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> The root cause to this problem is that we bundled the
> jar
> > > > files
> > > > > > in each
> > > > > > >>>>>>>> package. This is actually unnecessary if we could
> extract
> > > the
> > > > > jar
> > > > > > files
> > > > > > >>>>>>>> into a separate package which is dedicated to hold the
> jar
> > > > > files.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> I’d like to propose to split the pyflink package into
> two
> > > > > > packages: the
> > > > > > >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> > > > > > suggestions for
> > > > > > >>>>>> the
> > > > > > >>>>>>>> name?). The package apache-flink-libraries only contains
> > jar
> > > > > > files and
> > > > > > >>>>>>>> there is only one apache-flink-libraries package for
> each
> > > > > > release. The
> > > > > > >>>>>>>> package apache-flink depends on apache-flink-libraries
> and
> > > for
> > > > > > users,
> > > > > > >>>>>> they
> > > > > > >>>>>>>> still only need to install apache-flink and there is
> > nothing
> > > > > > different
> > > > > > >>>>>> from
> > > > > > >>>>>>>> before. We still need to release multiple wheel packages
> > of
> > > > > > >>>>>> apache-flink.
> > > > > > >>>>>>>> However, the size will be very small as it doesn't
> contain
> > > the
> > > > > jar
> > > > > > >>>>>> files
> > > > > > >>>>>>>> any more.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Looking forward to your feedback.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Best,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Xingbo
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Xingbo Huang <hx...@gmail.com>.
When we **pip install** a wheel package, it just unpacks the wheel package
and installs its dependencies[1]. There is no way to download things from
an external website during installation. It works differently from the
source package where we could download something in the setup.py. This is
explained in detail in [2]. So I'm afraid that splitting the package is the
only solution we have if we want to reduce the package size of pyflink.

[1] https://www.python.org/dev/peps/pep-0427/
[2] https://realpython.com/python-wheels/#advantages-of-python-wheels

Best,
Xingbo

Till Rohrmann <tr...@apache.org> 于2021年3月19日周五 下午6:32写道:

> I think that we should try to reduce the size of the packages by either
> splitting them or by having another means to retrieve the Java binaries.
>
> Cheers,
> Till
>
> On Fri, Mar 19, 2021 at 2:58 AM Xingbo Huang <hx...@gmail.com> wrote:
>
> > Hi Till,
> >
> > The package size of tensorflow[1] is also very big(about 300MB+).
> However,
> > it does not try to solve the problem, but expands the space limit in PyPI
> > frequently whenever the project space is full. We could also choose this
> > option. According to our current release frequency, we probably need to
> > apply for 15GB expansion every year. There are not too many similar
> cases,
> > so there is also no standard solution to refer to. But the behavior of
> > splitting a project into multiple packages is quite common. For example,
> > apache airflow will prepare a corresponding release package for each
> > provider[2].
> >
> > So I think there are currently two solutions in my mind which could work.
> >
> > 1. Just keep the current solution and expand the space limit in PyPI
> > whenever the space is full.
> >
> > 2. Split into two packages to reduce the wheel package size.
> >
> > [1] https://pypi.org/project/tensorflow/#files
> > [2] https://pypi.org/search/?q=apache-airflow-*&o=
> >
> > Best,
> > Xingbo
> >
> > Till Rohrmann <tr...@apache.org> 于2021年3月17日周三 下午9:22写道:
> >
> > > How do other projects solve this problem?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Mar 17, 2021 at 3:45 AM Xingbo Huang <hx...@gmail.com>
> wrote:
> > >
> > > > Hi Chesnay,
> > > >
> > > > Yes, in most cases, we can indeed download the required jars in
> > > `setup.py`,
> > > > which is also the solution I originally thought of reducing the size
> of
> > > > wheel packages. However, I'm afraid that it will not work in
> scenarios
> > > when
> > > > accessing the external network is not possible which is very common
> in
> > > the
> > > > production cluster.
> > > >
> > > > Best,
> > > > Xingbo
> > > >
> > > > Chesnay Schepler <ch...@apache.org> 于2021年3月16日周二 下午8:32写道:
> > > >
> > > > > This proposed apache-flink-libraries package would just contain the
> > > > > binary, right? And effectively be unusable to the python audience
> on
> > > > > it's own.
> > > > >
> > > > > Essentially we are just abusing Pypi for shipping a java binary. Is
> > > > > there no way for us to download the jars when the python package is
> > > > > being installed? (e.g., in setup.py)
> > > > >
> > > > > On 3/16/2021 1:23 PM, Dian Fu wrote:
> > > > > > Yes, the size of .whl file in PyFlink will also be about 3MB if
> we
> > > > split
> > > > > the package. Currently the package is big because we bundled the
> jar
> > > > files
> > > > > in it.
> > > > > >
> > > > > >> 2021年3月16日 下午8:13,Chesnay Schepler <ch...@apache.org> 写道:
> > > > > >>
> > > > > >> key difference being that the beam .whl files are 3mb large, aka
> > 60x
> > > > > smaller.
> > > > > >>
> > > > > >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> > > > > >>> Hi Chesnay,
> > > > > >>>
> > > > > >>> We will publish binary packages separately for:
> > > > > >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> > > > > >>> 2) Linux / Mac separately
> > > > > >>>
> > > > > >>> Besides, there is also a source package which is used when none
> > of
> > > > the
> > > > > above binary packages is usable, e.g. for Window users.
> > > > > >>>
> > > > > >>> PS: publishing multiple binary packages is very common in
> Python
> > > > > world, e.g. Beam published 22 packages in 2.28, Pandas published 16
> > > > > packages in 1.2.3 [2]. We could also publishing more packages if we
> > > > > splitting the packages as the cost of adding another package will
> be
> > > very
> > > > > small.
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Dian
> > > > > >>>
> > > > > >>> [1] https://pypi.org/project/apache-beam/#files <
> > > > > https://pypi.org/project/apache-beam/#files> <
> > > > > https://pypi.org/project/apache-beam/#files <
> > > > > https://pypi.org/project/apache-beam/#files>>
> > > > > >>> [2] https://pypi.org/project/pandas/#files
> > > > > >>>
> > > > > >>>
> > > > > >>> Hi Xintong,
> > > > > >>>
> > > > > >>> Yes, you are right that there is 9 packages in 1.12 as we added
> > > > Python
> > > > > 3.8 support in 1.12.
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Dian
> > > > > >>>
> > > > > >>>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
> > > > > >>>>
> > > > > >>>> And it's not only uploaded to PyPI, but the ASF mirrors as
> well.
> > > > > >>>>
> > > > > >>>>
> > > >
> https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> > > > > >>>>
> > > > > >>>> Thank you~
> > > > > >>>>
> > > > > >>>> Xintong Song
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <
> > > tonysong820@gmail.com
> > > > >
> > > > > wrote:
> > > > > >>>>
> > > > > >>>>> Actually, I think it's 9 packages, not 7.
> > > > > >>>>>
> > > > > >>>>> Check here for the 1.12.2 packages.
> > > > > >>>>> https://pypi.org/project/apache-flink/#files
> > > > > >>>>>
> > > > > >>>>> Thank you~
> > > > > >>>>>
> > > > > >>>>> Xintong Song
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <
> > > > chesnay@apache.org
> > > > > >
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> Am I reading this correctly that we publish 7 different
> > > artifacts
> > > > > just
> > > > > >>>>>> for python?
> > > > > >>>>>> What does the release matrix look like?
> > > > > >>>>>>
> > > > > >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> > > > > >>>>>>> Hi Xingbo,
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks a lot for bringing up this discussion. Actually the
> > size
> > > > > limit
> > > > > >>>>>> already becomes an issue during releasing 1.11.3 and 1.12.1.
> > It
> > > > > blocks us
> > > > > >>>>>> to publish PyFlink packages to PyPI during the release as
> > there
> > > is
> > > > > no
> > > > > >>>>>> enough space left (PS: already published the packages after
> > > > > increasing the
> > > > > >>>>>> size limit).
> > > > > >>>>>>> Considering that the total package size are about 1.5GB
> > (220MB
> > > *
> > > > > 7) for
> > > > > >>>>>> each release, it makes sense to split the PyFlink package.
> It
> > > > could
> > > > > reduce
> > > > > >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for
> > > each
> > > > > release.
> > > > > >>>>>> We don’t need to increase the size limit any more in the
> next
> > > few
> > > > > years as
> > > > > >>>>>> currently we still have about 7.5 GB space left.
> > > > > >>>>>>> So +1 from my side.
> > > > > >>>>>>>
> > > > > >>>>>>> Regards,
> > > > > >>>>>>> Dian
> > > > > >>>>>>>
> > > > > >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
> > > > > >>>>>>>>
> > > > > >>>>>>>> Hi everyone,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Since release-1.11, pyflink has introduced cython support
> > and
> > > we
> > > > > will
> > > > > >>>>>>>> release 7 packages (for different platforms and Python
> > > versions)
> > > > > to
> > > > > >>>>>> PyPI
> > > > > >>>>>>>> for each release and the size of each package is more than
> > > 200MB
> > > > > as we
> > > > > >>>>>> need
> > > > > >>>>>>>> to bundle the jar files into the package. The entire
> project
> > > > > space in
> > > > > >>>>>> PyPI
> > > > > >>>>>>>> grows very fast, and we need to apply to PyPI for more
> > project
> > > > > space
> > > > > >>>>>>>> frequently. Please refer to [
> > > > > >>>>>> https://github.com/pypa/pypi-support/issues/831]
> > > > > >>>>>>>> for more details.
> > > > > >>>>>>>>
> > > > > >>>>>>>> The root cause to this problem is that we bundled the jar
> > > files
> > > > > in each
> > > > > >>>>>>>> package. This is actually unnecessary if we could extract
> > the
> > > > jar
> > > > > files
> > > > > >>>>>>>> into a separate package which is dedicated to hold the jar
> > > > files.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I’d like to propose to split the pyflink package into two
> > > > > packages: the
> > > > > >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> > > > > suggestions for
> > > > > >>>>>> the
> > > > > >>>>>>>> name?). The package apache-flink-libraries only contains
> jar
> > > > > files and
> > > > > >>>>>>>> there is only one apache-flink-libraries package for each
> > > > > release. The
> > > > > >>>>>>>> package apache-flink depends on apache-flink-libraries and
> > for
> > > > > users,
> > > > > >>>>>> they
> > > > > >>>>>>>> still only need to install apache-flink and there is
> nothing
> > > > > different
> > > > > >>>>>> from
> > > > > >>>>>>>> before. We still need to release multiple wheel packages
> of
> > > > > >>>>>> apache-flink.
> > > > > >>>>>>>> However, the size will be very small as it doesn't contain
> > the
> > > > jar
> > > > > >>>>>> files
> > > > > >>>>>>>> any more.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Looking forward to your feedback.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Xingbo
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Till Rohrmann <tr...@apache.org>.
I think that we should try to reduce the size of the packages by either
splitting them or by having another means to retrieve the Java binaries.

Cheers,
Till

On Fri, Mar 19, 2021 at 2:58 AM Xingbo Huang <hx...@gmail.com> wrote:

> Hi Till,
>
> The package size of tensorflow[1] is also very big(about 300MB+). However,
> it does not try to solve the problem, but expands the space limit in PyPI
> frequently whenever the project space is full. We could also choose this
> option. According to our current release frequency, we probably need to
> apply for 15GB expansion every year. There are not too many similar cases,
> so there is also no standard solution to refer to. But the behavior of
> splitting a project into multiple packages is quite common. For example,
> apache airflow will prepare a corresponding release package for each
> provider[2].
>
> So I think there are currently two solutions in my mind which could work.
>
> 1. Just keep the current solution and expand the space limit in PyPI
> whenever the space is full.
>
> 2. Split into two packages to reduce the wheel package size.
>
> [1] https://pypi.org/project/tensorflow/#files
> [2] https://pypi.org/search/?q=apache-airflow-*&o=
>
> Best,
> Xingbo
>
> Till Rohrmann <tr...@apache.org> 于2021年3月17日周三 下午9:22写道:
>
> > How do other projects solve this problem?
> >
> > Cheers,
> > Till
> >
> > On Wed, Mar 17, 2021 at 3:45 AM Xingbo Huang <hx...@gmail.com> wrote:
> >
> > > Hi Chesnay,
> > >
> > > Yes, in most cases, we can indeed download the required jars in
> > `setup.py`,
> > > which is also the solution I originally thought of reducing the size of
> > > wheel packages. However, I'm afraid that it will not work in scenarios
> > when
> > > accessing the external network is not possible which is very common in
> > the
> > > production cluster.
> > >
> > > Best,
> > > Xingbo
> > >
> > > Chesnay Schepler <ch...@apache.org> 于2021年3月16日周二 下午8:32写道:
> > >
> > > > This proposed apache-flink-libraries package would just contain the
> > > > binary, right? And effectively be unusable to the python audience on
> > > > it's own.
> > > >
> > > > Essentially we are just abusing Pypi for shipping a java binary. Is
> > > > there no way for us to download the jars when the python package is
> > > > being installed? (e.g., in setup.py)
> > > >
> > > > On 3/16/2021 1:23 PM, Dian Fu wrote:
> > > > > Yes, the size of .whl file in PyFlink will also be about 3MB if we
> > > split
> > > > the package. Currently the package is big because we bundled the jar
> > > files
> > > > in it.
> > > > >
> > > > >> 2021年3月16日 下午8:13,Chesnay Schepler <ch...@apache.org> 写道:
> > > > >>
> > > > >> key difference being that the beam .whl files are 3mb large, aka
> 60x
> > > > smaller.
> > > > >>
> > > > >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> > > > >>> Hi Chesnay,
> > > > >>>
> > > > >>> We will publish binary packages separately for:
> > > > >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> > > > >>> 2) Linux / Mac separately
> > > > >>>
> > > > >>> Besides, there is also a source package which is used when none
> of
> > > the
> > > > above binary packages is usable, e.g. for Window users.
> > > > >>>
> > > > >>> PS: publishing multiple binary packages is very common in Python
> > > > world, e.g. Beam published 22 packages in 2.28, Pandas published 16
> > > > packages in 1.2.3 [2]. We could also publishing more packages if we
> > > > splitting the packages as the cost of adding another package will be
> > very
> > > > small.
> > > > >>>
> > > > >>> Regards,
> > > > >>> Dian
> > > > >>>
> > > > >>> [1] https://pypi.org/project/apache-beam/#files <
> > > > https://pypi.org/project/apache-beam/#files> <
> > > > https://pypi.org/project/apache-beam/#files <
> > > > https://pypi.org/project/apache-beam/#files>>
> > > > >>> [2] https://pypi.org/project/pandas/#files
> > > > >>>
> > > > >>>
> > > > >>> Hi Xintong,
> > > > >>>
> > > > >>> Yes, you are right that there is 9 packages in 1.12 as we added
> > > Python
> > > > 3.8 support in 1.12.
> > > > >>>
> > > > >>> Regards,
> > > > >>> Dian
> > > > >>>
> > > > >>>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
> > > > >>>>
> > > > >>>> And it's not only uploaded to PyPI, but the ASF mirrors as well.
> > > > >>>>
> > > > >>>>
> > > https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> > > > >>>>
> > > > >>>> Thank you~
> > > > >>>>
> > > > >>>> Xintong Song
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <
> > tonysong820@gmail.com
> > > >
> > > > wrote:
> > > > >>>>
> > > > >>>>> Actually, I think it's 9 packages, not 7.
> > > > >>>>>
> > > > >>>>> Check here for the 1.12.2 packages.
> > > > >>>>> https://pypi.org/project/apache-flink/#files
> > > > >>>>>
> > > > >>>>> Thank you~
> > > > >>>>>
> > > > >>>>> Xintong Song
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <
> > > chesnay@apache.org
> > > > >
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Am I reading this correctly that we publish 7 different
> > artifacts
> > > > just
> > > > >>>>>> for python?
> > > > >>>>>> What does the release matrix look like?
> > > > >>>>>>
> > > > >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> > > > >>>>>>> Hi Xingbo,
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Thanks a lot for bringing up this discussion. Actually the
> size
> > > > limit
> > > > >>>>>> already becomes an issue during releasing 1.11.3 and 1.12.1.
> It
> > > > blocks us
> > > > >>>>>> to publish PyFlink packages to PyPI during the release as
> there
> > is
> > > > no
> > > > >>>>>> enough space left (PS: already published the packages after
> > > > increasing the
> > > > >>>>>> size limit).
> > > > >>>>>>> Considering that the total package size are about 1.5GB
> (220MB
> > *
> > > > 7) for
> > > > >>>>>> each release, it makes sense to split the PyFlink package. It
> > > could
> > > > reduce
> > > > >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for
> > each
> > > > release.
> > > > >>>>>> We don’t need to increase the size limit any more in the next
> > few
> > > > years as
> > > > >>>>>> currently we still have about 7.5 GB space left.
> > > > >>>>>>> So +1 from my side.
> > > > >>>>>>>
> > > > >>>>>>> Regards,
> > > > >>>>>>> Dian
> > > > >>>>>>>
> > > > >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
> > > > >>>>>>>>
> > > > >>>>>>>> Hi everyone,
> > > > >>>>>>>>
> > > > >>>>>>>> Since release-1.11, pyflink has introduced cython support
> and
> > we
> > > > will
> > > > >>>>>>>> release 7 packages (for different platforms and Python
> > versions)
> > > > to
> > > > >>>>>> PyPI
> > > > >>>>>>>> for each release and the size of each package is more than
> > 200MB
> > > > as we
> > > > >>>>>> need
> > > > >>>>>>>> to bundle the jar files into the package. The entire project
> > > > space in
> > > > >>>>>> PyPI
> > > > >>>>>>>> grows very fast, and we need to apply to PyPI for more
> project
> > > > space
> > > > >>>>>>>> frequently. Please refer to [
> > > > >>>>>> https://github.com/pypa/pypi-support/issues/831]
> > > > >>>>>>>> for more details.
> > > > >>>>>>>>
> > > > >>>>>>>> The root cause to this problem is that we bundled the jar
> > files
> > > > in each
> > > > >>>>>>>> package. This is actually unnecessary if we could extract
> the
> > > jar
> > > > files
> > > > >>>>>>>> into a separate package which is dedicated to hold the jar
> > > files.
> > > > >>>>>>>>
> > > > >>>>>>>> I’d like to propose to split the pyflink package into two
> > > > packages: the
> > > > >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> > > > suggestions for
> > > > >>>>>> the
> > > > >>>>>>>> name?). The package apache-flink-libraries only contains jar
> > > > files and
> > > > >>>>>>>> there is only one apache-flink-libraries package for each
> > > > release. The
> > > > >>>>>>>> package apache-flink depends on apache-flink-libraries and
> for
> > > > users,
> > > > >>>>>> they
> > > > >>>>>>>> still only need to install apache-flink and there is nothing
> > > > different
> > > > >>>>>> from
> > > > >>>>>>>> before. We still need to release multiple wheel packages of
> > > > >>>>>> apache-flink.
> > > > >>>>>>>> However, the size will be very small as it doesn't contain
> the
> > > jar
> > > > >>>>>> files
> > > > >>>>>>>> any more.
> > > > >>>>>>>>
> > > > >>>>>>>> Looking forward to your feedback.
> > > > >>>>>>>>
> > > > >>>>>>>> Best,
> > > > >>>>>>>>
> > > > >>>>>>>> Xingbo
> > > > >
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Xingbo Huang <hx...@gmail.com>.
Hi Till,

The package size of tensorflow[1] is also very big(about 300MB+). However,
it does not try to solve the problem, but expands the space limit in PyPI
frequently whenever the project space is full. We could also choose this
option. According to our current release frequency, we probably need to
apply for 15GB expansion every year. There are not too many similar cases,
so there is also no standard solution to refer to. But the behavior of
splitting a project into multiple packages is quite common. For example,
apache airflow will prepare a corresponding release package for each
provider[2].

So I think there are currently two solutions in my mind which could work.

1. Just keep the current solution and expand the space limit in PyPI
whenever the space is full.

2. Split into two packages to reduce the wheel package size.

[1] https://pypi.org/project/tensorflow/#files
[2] https://pypi.org/search/?q=apache-airflow-*&o=

Best,
Xingbo

Till Rohrmann <tr...@apache.org> 于2021年3月17日周三 下午9:22写道:

> How do other projects solve this problem?
>
> Cheers,
> Till
>
> On Wed, Mar 17, 2021 at 3:45 AM Xingbo Huang <hx...@gmail.com> wrote:
>
> > Hi Chesnay,
> >
> > Yes, in most cases, we can indeed download the required jars in
> `setup.py`,
> > which is also the solution I originally thought of reducing the size of
> > wheel packages. However, I'm afraid that it will not work in scenarios
> when
> > accessing the external network is not possible which is very common in
> the
> > production cluster.
> >
> > Best,
> > Xingbo
> >
> > Chesnay Schepler <ch...@apache.org> 于2021年3月16日周二 下午8:32写道:
> >
> > > This proposed apache-flink-libraries package would just contain the
> > > binary, right? And effectively be unusable to the python audience on
> > > it's own.
> > >
> > > Essentially we are just abusing Pypi for shipping a java binary. Is
> > > there no way for us to download the jars when the python package is
> > > being installed? (e.g., in setup.py)
> > >
> > > On 3/16/2021 1:23 PM, Dian Fu wrote:
> > > > Yes, the size of .whl file in PyFlink will also be about 3MB if we
> > split
> > > the package. Currently the package is big because we bundled the jar
> > files
> > > in it.
> > > >
> > > >> 2021年3月16日 下午8:13,Chesnay Schepler <ch...@apache.org> 写道:
> > > >>
> > > >> key difference being that the beam .whl files are 3mb large, aka 60x
> > > smaller.
> > > >>
> > > >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> > > >>> Hi Chesnay,
> > > >>>
> > > >>> We will publish binary packages separately for:
> > > >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> > > >>> 2) Linux / Mac separately
> > > >>>
> > > >>> Besides, there is also a source package which is used when none of
> > the
> > > above binary packages is usable, e.g. for Window users.
> > > >>>
> > > >>> PS: publishing multiple binary packages is very common in Python
> > > world, e.g. Beam published 22 packages in 2.28, Pandas published 16
> > > packages in 1.2.3 [2]. We could also publishing more packages if we
> > > splitting the packages as the cost of adding another package will be
> very
> > > small.
> > > >>>
> > > >>> Regards,
> > > >>> Dian
> > > >>>
> > > >>> [1] https://pypi.org/project/apache-beam/#files <
> > > https://pypi.org/project/apache-beam/#files> <
> > > https://pypi.org/project/apache-beam/#files <
> > > https://pypi.org/project/apache-beam/#files>>
> > > >>> [2] https://pypi.org/project/pandas/#files
> > > >>>
> > > >>>
> > > >>> Hi Xintong,
> > > >>>
> > > >>> Yes, you are right that there is 9 packages in 1.12 as we added
> > Python
> > > 3.8 support in 1.12.
> > > >>>
> > > >>> Regards,
> > > >>> Dian
> > > >>>
> > > >>>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
> > > >>>>
> > > >>>> And it's not only uploaded to PyPI, but the ASF mirrors as well.
> > > >>>>
> > > >>>>
> > https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> > > >>>>
> > > >>>> Thank you~
> > > >>>>
> > > >>>> Xintong Song
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <
> tonysong820@gmail.com
> > >
> > > wrote:
> > > >>>>
> > > >>>>> Actually, I think it's 9 packages, not 7.
> > > >>>>>
> > > >>>>> Check here for the 1.12.2 packages.
> > > >>>>> https://pypi.org/project/apache-flink/#files
> > > >>>>>
> > > >>>>> Thank you~
> > > >>>>>
> > > >>>>> Xintong Song
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <
> > chesnay@apache.org
> > > >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Am I reading this correctly that we publish 7 different
> artifacts
> > > just
> > > >>>>>> for python?
> > > >>>>>> What does the release matrix look like?
> > > >>>>>>
> > > >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> > > >>>>>>> Hi Xingbo,
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Thanks a lot for bringing up this discussion. Actually the size
> > > limit
> > > >>>>>> already becomes an issue during releasing 1.11.3 and 1.12.1. It
> > > blocks us
> > > >>>>>> to publish PyFlink packages to PyPI during the release as there
> is
> > > no
> > > >>>>>> enough space left (PS: already published the packages after
> > > increasing the
> > > >>>>>> size limit).
> > > >>>>>>> Considering that the total package size are about 1.5GB (220MB
> *
> > > 7) for
> > > >>>>>> each release, it makes sense to split the PyFlink package. It
> > could
> > > reduce
> > > >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for
> each
> > > release.
> > > >>>>>> We don’t need to increase the size limit any more in the next
> few
> > > years as
> > > >>>>>> currently we still have about 7.5 GB space left.
> > > >>>>>>> So +1 from my side.
> > > >>>>>>>
> > > >>>>>>> Regards,
> > > >>>>>>> Dian
> > > >>>>>>>
> > > >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
> > > >>>>>>>>
> > > >>>>>>>> Hi everyone,
> > > >>>>>>>>
> > > >>>>>>>> Since release-1.11, pyflink has introduced cython support and
> we
> > > will
> > > >>>>>>>> release 7 packages (for different platforms and Python
> versions)
> > > to
> > > >>>>>> PyPI
> > > >>>>>>>> for each release and the size of each package is more than
> 200MB
> > > as we
> > > >>>>>> need
> > > >>>>>>>> to bundle the jar files into the package. The entire project
> > > space in
> > > >>>>>> PyPI
> > > >>>>>>>> grows very fast, and we need to apply to PyPI for more project
> > > space
> > > >>>>>>>> frequently. Please refer to [
> > > >>>>>> https://github.com/pypa/pypi-support/issues/831]
> > > >>>>>>>> for more details.
> > > >>>>>>>>
> > > >>>>>>>> The root cause to this problem is that we bundled the jar
> files
> > > in each
> > > >>>>>>>> package. This is actually unnecessary if we could extract the
> > jar
> > > files
> > > >>>>>>>> into a separate package which is dedicated to hold the jar
> > files.
> > > >>>>>>>>
> > > >>>>>>>> I’d like to propose to split the pyflink package into two
> > > packages: the
> > > >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> > > suggestions for
> > > >>>>>> the
> > > >>>>>>>> name?). The package apache-flink-libraries only contains jar
> > > files and
> > > >>>>>>>> there is only one apache-flink-libraries package for each
> > > release. The
> > > >>>>>>>> package apache-flink depends on apache-flink-libraries and for
> > > users,
> > > >>>>>> they
> > > >>>>>>>> still only need to install apache-flink and there is nothing
> > > different
> > > >>>>>> from
> > > >>>>>>>> before. We still need to release multiple wheel packages of
> > > >>>>>> apache-flink.
> > > >>>>>>>> However, the size will be very small as it doesn't contain the
> > jar
> > > >>>>>> files
> > > >>>>>>>> any more.
> > > >>>>>>>>
> > > >>>>>>>> Looking forward to your feedback.
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>>
> > > >>>>>>>> Xingbo
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Till Rohrmann <tr...@apache.org>.
How do other projects solve this problem?

Cheers,
Till

On Wed, Mar 17, 2021 at 3:45 AM Xingbo Huang <hx...@gmail.com> wrote:

> Hi Chesnay,
>
> Yes, in most cases, we can indeed download the required jars in `setup.py`,
> which is also the solution I originally thought of reducing the size of
> wheel packages. However, I'm afraid that it will not work in scenarios when
> accessing the external network is not possible which is very common in the
> production cluster.
>
> Best,
> Xingbo
>
> Chesnay Schepler <ch...@apache.org> 于2021年3月16日周二 下午8:32写道:
>
> > This proposed apache-flink-libraries package would just contain the
> > binary, right? And effectively be unusable to the python audience on
> > it's own.
> >
> > Essentially we are just abusing Pypi for shipping a java binary. Is
> > there no way for us to download the jars when the python package is
> > being installed? (e.g., in setup.py)
> >
> > On 3/16/2021 1:23 PM, Dian Fu wrote:
> > > Yes, the size of .whl file in PyFlink will also be about 3MB if we
> split
> > the package. Currently the package is big because we bundled the jar
> files
> > in it.
> > >
> > >> 2021年3月16日 下午8:13,Chesnay Schepler <ch...@apache.org> 写道:
> > >>
> > >> key difference being that the beam .whl files are 3mb large, aka 60x
> > smaller.
> > >>
> > >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> > >>> Hi Chesnay,
> > >>>
> > >>> We will publish binary packages separately for:
> > >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> > >>> 2) Linux / Mac separately
> > >>>
> > >>> Besides, there is also a source package which is used when none of
> the
> > above binary packages is usable, e.g. for Window users.
> > >>>
> > >>> PS: publishing multiple binary packages is very common in Python
> > world, e.g. Beam published 22 packages in 2.28, Pandas published 16
> > packages in 1.2.3 [2]. We could also publishing more packages if we
> > splitting the packages as the cost of adding another package will be very
> > small.
> > >>>
> > >>> Regards,
> > >>> Dian
> > >>>
> > >>> [1] https://pypi.org/project/apache-beam/#files <
> > https://pypi.org/project/apache-beam/#files> <
> > https://pypi.org/project/apache-beam/#files <
> > https://pypi.org/project/apache-beam/#files>>
> > >>> [2] https://pypi.org/project/pandas/#files
> > >>>
> > >>>
> > >>> Hi Xintong,
> > >>>
> > >>> Yes, you are right that there is 9 packages in 1.12 as we added
> Python
> > 3.8 support in 1.12.
> > >>>
> > >>> Regards,
> > >>> Dian
> > >>>
> > >>>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
> > >>>>
> > >>>> And it's not only uploaded to PyPI, but the ASF mirrors as well.
> > >>>>
> > >>>>
> https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> > >>>>
> > >>>> Thank you~
> > >>>>
> > >>>> Xintong Song
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <tonysong820@gmail.com
> >
> > wrote:
> > >>>>
> > >>>>> Actually, I think it's 9 packages, not 7.
> > >>>>>
> > >>>>> Check here for the 1.12.2 packages.
> > >>>>> https://pypi.org/project/apache-flink/#files
> > >>>>>
> > >>>>> Thank you~
> > >>>>>
> > >>>>> Xintong Song
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <
> chesnay@apache.org
> > >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Am I reading this correctly that we publish 7 different artifacts
> > just
> > >>>>>> for python?
> > >>>>>> What does the release matrix look like?
> > >>>>>>
> > >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> > >>>>>>> Hi Xingbo,
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Thanks a lot for bringing up this discussion. Actually the size
> > limit
> > >>>>>> already becomes an issue during releasing 1.11.3 and 1.12.1. It
> > blocks us
> > >>>>>> to publish PyFlink packages to PyPI during the release as there is
> > no
> > >>>>>> enough space left (PS: already published the packages after
> > increasing the
> > >>>>>> size limit).
> > >>>>>>> Considering that the total package size are about 1.5GB (220MB *
> > 7) for
> > >>>>>> each release, it makes sense to split the PyFlink package. It
> could
> > reduce
> > >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for each
> > release.
> > >>>>>> We don’t need to increase the size limit any more in the next few
> > years as
> > >>>>>> currently we still have about 7.5 GB space left.
> > >>>>>>> So +1 from my side.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Dian
> > >>>>>>>
> > >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
> > >>>>>>>>
> > >>>>>>>> Hi everyone,
> > >>>>>>>>
> > >>>>>>>> Since release-1.11, pyflink has introduced cython support and we
> > will
> > >>>>>>>> release 7 packages (for different platforms and Python versions)
> > to
> > >>>>>> PyPI
> > >>>>>>>> for each release and the size of each package is more than 200MB
> > as we
> > >>>>>> need
> > >>>>>>>> to bundle the jar files into the package. The entire project
> > space in
> > >>>>>> PyPI
> > >>>>>>>> grows very fast, and we need to apply to PyPI for more project
> > space
> > >>>>>>>> frequently. Please refer to [
> > >>>>>> https://github.com/pypa/pypi-support/issues/831]
> > >>>>>>>> for more details.
> > >>>>>>>>
> > >>>>>>>> The root cause to this problem is that we bundled the jar files
> > in each
> > >>>>>>>> package. This is actually unnecessary if we could extract the
> jar
> > files
> > >>>>>>>> into a separate package which is dedicated to hold the jar
> files.
> > >>>>>>>>
> > >>>>>>>> I’d like to propose to split the pyflink package into two
> > packages: the
> > >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> > suggestions for
> > >>>>>> the
> > >>>>>>>> name?). The package apache-flink-libraries only contains jar
> > files and
> > >>>>>>>> there is only one apache-flink-libraries package for each
> > release. The
> > >>>>>>>> package apache-flink depends on apache-flink-libraries and for
> > users,
> > >>>>>> they
> > >>>>>>>> still only need to install apache-flink and there is nothing
> > different
> > >>>>>> from
> > >>>>>>>> before. We still need to release multiple wheel packages of
> > >>>>>> apache-flink.
> > >>>>>>>> However, the size will be very small as it doesn't contain the
> jar
> > >>>>>> files
> > >>>>>>>> any more.
> > >>>>>>>>
> > >>>>>>>> Looking forward to your feedback.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>>
> > >>>>>>>> Xingbo
> > >
> >
> >
>

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Xingbo Huang <hx...@gmail.com>.
Hi Chesnay,

Yes, in most cases, we can indeed download the required jars in `setup.py`,
which is also the solution I originally thought of reducing the size of
wheel packages. However, I'm afraid that it will not work in scenarios when
accessing the external network is not possible which is very common in the
production cluster.

Best,
Xingbo

Chesnay Schepler <ch...@apache.org> 于2021年3月16日周二 下午8:32写道:

> This proposed apache-flink-libraries package would just contain the
> binary, right? And effectively be unusable to the python audience on
> it's own.
>
> Essentially we are just abusing Pypi for shipping a java binary. Is
> there no way for us to download the jars when the python package is
> being installed? (e.g., in setup.py)
>
> On 3/16/2021 1:23 PM, Dian Fu wrote:
> > Yes, the size of .whl file in PyFlink will also be about 3MB if we split
> the package. Currently the package is big because we bundled the jar files
> in it.
> >
> >> 2021年3月16日 下午8:13,Chesnay Schepler <ch...@apache.org> 写道:
> >>
> >> key difference being that the beam .whl files are 3mb large, aka 60x
> smaller.
> >>
> >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> >>> Hi Chesnay,
> >>>
> >>> We will publish binary packages separately for:
> >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> >>> 2) Linux / Mac separately
> >>>
> >>> Besides, there is also a source package which is used when none of the
> above binary packages is usable, e.g. for Window users.
> >>>
> >>> PS: publishing multiple binary packages is very common in Python
> world, e.g. Beam published 22 packages in 2.28, Pandas published 16
> packages in 1.2.3 [2]. We could also publishing more packages if we
> splitting the packages as the cost of adding another package will be very
> small.
> >>>
> >>> Regards,
> >>> Dian
> >>>
> >>> [1] https://pypi.org/project/apache-beam/#files <
> https://pypi.org/project/apache-beam/#files> <
> https://pypi.org/project/apache-beam/#files <
> https://pypi.org/project/apache-beam/#files>>
> >>> [2] https://pypi.org/project/pandas/#files
> >>>
> >>>
> >>> Hi Xintong,
> >>>
> >>> Yes, you are right that there is 9 packages in 1.12 as we added Python
> 3.8 support in 1.12.
> >>>
> >>> Regards,
> >>> Dian
> >>>
> >>>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
> >>>>
> >>>> And it's not only uploaded to PyPI, but the ASF mirrors as well.
> >>>>
> >>>> https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> >>>>
> >>>> Thank you~
> >>>>
> >>>> Xintong Song
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <to...@gmail.com>
> wrote:
> >>>>
> >>>>> Actually, I think it's 9 packages, not 7.
> >>>>>
> >>>>> Check here for the 1.12.2 packages.
> >>>>> https://pypi.org/project/apache-flink/#files
> >>>>>
> >>>>> Thank you~
> >>>>>
> >>>>> Xintong Song
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <chesnay@apache.org
> >
> >>>>> wrote:
> >>>>>
> >>>>>> Am I reading this correctly that we publish 7 different artifacts
> just
> >>>>>> for python?
> >>>>>> What does the release matrix look like?
> >>>>>>
> >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> >>>>>>> Hi Xingbo,
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks a lot for bringing up this discussion. Actually the size
> limit
> >>>>>> already becomes an issue during releasing 1.11.3 and 1.12.1. It
> blocks us
> >>>>>> to publish PyFlink packages to PyPI during the release as there is
> no
> >>>>>> enough space left (PS: already published the packages after
> increasing the
> >>>>>> size limit).
> >>>>>>> Considering that the total package size are about 1.5GB (220MB *
> 7) for
> >>>>>> each release, it makes sense to split the PyFlink package. It could
> reduce
> >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for each
> release.
> >>>>>> We don’t need to increase the size limit any more in the next few
> years as
> >>>>>> currently we still have about 7.5 GB space left.
> >>>>>>> So +1 from my side.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Dian
> >>>>>>>
> >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
> >>>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> Since release-1.11, pyflink has introduced cython support and we
> will
> >>>>>>>> release 7 packages (for different platforms and Python versions)
> to
> >>>>>> PyPI
> >>>>>>>> for each release and the size of each package is more than 200MB
> as we
> >>>>>> need
> >>>>>>>> to bundle the jar files into the package. The entire project
> space in
> >>>>>> PyPI
> >>>>>>>> grows very fast, and we need to apply to PyPI for more project
> space
> >>>>>>>> frequently. Please refer to [
> >>>>>> https://github.com/pypa/pypi-support/issues/831]
> >>>>>>>> for more details.
> >>>>>>>>
> >>>>>>>> The root cause to this problem is that we bundled the jar files
> in each
> >>>>>>>> package. This is actually unnecessary if we could extract the jar
> files
> >>>>>>>> into a separate package which is dedicated to hold the jar files.
> >>>>>>>>
> >>>>>>>> I’d like to propose to split the pyflink package into two
> packages: the
> >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> suggestions for
> >>>>>> the
> >>>>>>>> name?). The package apache-flink-libraries only contains jar
> files and
> >>>>>>>> there is only one apache-flink-libraries package for each
> release. The
> >>>>>>>> package apache-flink depends on apache-flink-libraries and for
> users,
> >>>>>> they
> >>>>>>>> still only need to install apache-flink and there is nothing
> different
> >>>>>> from
> >>>>>>>> before. We still need to release multiple wheel packages of
> >>>>>> apache-flink.
> >>>>>>>> However, the size will be very small as it doesn't contain the jar
> >>>>>> files
> >>>>>>>> any more.
> >>>>>>>>
> >>>>>>>> Looking forward to your feedback.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Xingbo
> >
>
>

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Chesnay Schepler <ch...@apache.org>.
This proposed apache-flink-libraries package would just contain the 
binary, right? And effectively be unusable to the python audience on 
it's own.

Essentially we are just abusing Pypi for shipping a java binary. Is 
there no way for us to download the jars when the python package is 
being installed? (e.g., in setup.py)

On 3/16/2021 1:23 PM, Dian Fu wrote:
> Yes, the size of .whl file in PyFlink will also be about 3MB if we split the package. Currently the package is big because we bundled the jar files in it.
>
>> 2021年3月16日 下午8:13,Chesnay Schepler <ch...@apache.org> 写道:
>>
>> key difference being that the beam .whl files are 3mb large, aka 60x smaller.
>>
>> On 3/16/2021 1:06 PM, Dian Fu wrote:
>>> Hi Chesnay,
>>>
>>> We will publish binary packages separately for:
>>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
>>> 2) Linux / Mac separately
>>>
>>> Besides, there is also a source package which is used when none of the above binary packages is usable, e.g. for Window users.
>>>
>>> PS: publishing multiple binary packages is very common in Python world, e.g. Beam published 22 packages in 2.28, Pandas published 16 packages in 1.2.3 [2]. We could also publishing more packages if we splitting the packages as the cost of adding another package will be very small.
>>>
>>> Regards,
>>> Dian
>>>
>>> [1] https://pypi.org/project/apache-beam/#files <https://pypi.org/project/apache-beam/#files> <https://pypi.org/project/apache-beam/#files <https://pypi.org/project/apache-beam/#files>>
>>> [2] https://pypi.org/project/pandas/#files
>>>
>>>
>>> Hi Xintong,
>>>
>>> Yes, you are right that there is 9 packages in 1.12 as we added Python 3.8 support in 1.12.
>>>
>>> Regards,
>>> Dian
>>>
>>>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
>>>>
>>>> And it's not only uploaded to PyPI, but the ASF mirrors as well.
>>>>
>>>> https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
>>>>
>>>> Thank you~
>>>>
>>>> Xintong Song
>>>>
>>>>
>>>>
>>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <to...@gmail.com> wrote:
>>>>
>>>>> Actually, I think it's 9 packages, not 7.
>>>>>
>>>>> Check here for the 1.12.2 packages.
>>>>> https://pypi.org/project/apache-flink/#files
>>>>>
>>>>> Thank you~
>>>>>
>>>>> Xintong Song
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <ch...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Am I reading this correctly that we publish 7 different artifacts just
>>>>>> for python?
>>>>>> What does the release matrix look like?
>>>>>>
>>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
>>>>>>> Hi Xingbo,
>>>>>>>
>>>>>>>
>>>>>>> Thanks a lot for bringing up this discussion. Actually the size limit
>>>>>> already becomes an issue during releasing 1.11.3 and 1.12.1. It blocks us
>>>>>> to publish PyFlink packages to PyPI during the release as there is no
>>>>>> enough space left (PS: already published the packages after increasing the
>>>>>> size limit).
>>>>>>> Considering that the total package size are about 1.5GB (220MB * 7) for
>>>>>> each release, it makes sense to split the PyFlink package. It could reduce
>>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for each release.
>>>>>> We don’t need to increase the size limit any more in the next few years as
>>>>>> currently we still have about 7.5 GB space left.
>>>>>>> So +1 from my side.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Dian
>>>>>>>
>>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
>>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> Since release-1.11, pyflink has introduced cython support and we will
>>>>>>>> release 7 packages (for different platforms and Python versions) to
>>>>>> PyPI
>>>>>>>> for each release and the size of each package is more than 200MB as we
>>>>>> need
>>>>>>>> to bundle the jar files into the package. The entire project space in
>>>>>> PyPI
>>>>>>>> grows very fast, and we need to apply to PyPI for more project space
>>>>>>>> frequently. Please refer to [
>>>>>> https://github.com/pypa/pypi-support/issues/831]
>>>>>>>> for more details.
>>>>>>>>
>>>>>>>> The root cause to this problem is that we bundled the jar files in each
>>>>>>>> package. This is actually unnecessary if we could extract the jar files
>>>>>>>> into a separate package which is dedicated to hold the jar files.
>>>>>>>>
>>>>>>>> I’d like to propose to split the pyflink package into two packages: the
>>>>>>>> original apache-flink  and apache-flink-libraries (Any suggestions for
>>>>>> the
>>>>>>>> name?). The package apache-flink-libraries only contains jar files and
>>>>>>>> there is only one apache-flink-libraries package for each release. The
>>>>>>>> package apache-flink depends on apache-flink-libraries and for users,
>>>>>> they
>>>>>>>> still only need to install apache-flink and there is nothing different
>>>>>> from
>>>>>>>> before. We still need to release multiple wheel packages of
>>>>>> apache-flink.
>>>>>>>> However, the size will be very small as it doesn't contain the jar
>>>>>> files
>>>>>>>> any more.
>>>>>>>>
>>>>>>>> Looking forward to your feedback.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Xingbo
>


Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Dian Fu <di...@gmail.com>.
Yes, the size of .whl file in PyFlink will also be about 3MB if we split the package. Currently the package is big because we bundled the jar files in it.

> 2021年3月16日 下午8:13,Chesnay Schepler <ch...@apache.org> 写道:
> 
> key difference being that the beam .whl files are 3mb large, aka 60x smaller.
> 
> On 3/16/2021 1:06 PM, Dian Fu wrote:
>> Hi Chesnay,
>> 
>> We will publish binary packages separately for:
>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
>> 2) Linux / Mac separately
>> 
>> Besides, there is also a source package which is used when none of the above binary packages is usable, e.g. for Window users.
>> 
>> PS: publishing multiple binary packages is very common in Python world, e.g. Beam published 22 packages in 2.28, Pandas published 16 packages in 1.2.3 [2]. We could also publishing more packages if we splitting the packages as the cost of adding another package will be very small.
>> 
>> Regards,
>> Dian
>> 
>> [1] https://pypi.org/project/apache-beam/#files <https://pypi.org/project/apache-beam/#files> <https://pypi.org/project/apache-beam/#files <https://pypi.org/project/apache-beam/#files>>
>> [2] https://pypi.org/project/pandas/#files
>> 
>> 
>> Hi Xintong,
>> 
>> Yes, you are right that there is 9 packages in 1.12 as we added Python 3.8 support in 1.12.
>> 
>> Regards,
>> Dian
>> 
>>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
>>> 
>>> And it's not only uploaded to PyPI, but the ASF mirrors as well.
>>> 
>>> https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
>>> 
>>> Thank you~
>>> 
>>> Xintong Song
>>> 
>>> 
>>> 
>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <to...@gmail.com> wrote:
>>> 
>>>> Actually, I think it's 9 packages, not 7.
>>>> 
>>>> Check here for the 1.12.2 packages.
>>>> https://pypi.org/project/apache-flink/#files
>>>> 
>>>> Thank you~
>>>> 
>>>> Xintong Song
>>>> 
>>>> 
>>>> 
>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <ch...@apache.org>
>>>> wrote:
>>>> 
>>>>> Am I reading this correctly that we publish 7 different artifacts just
>>>>> for python?
>>>>> What does the release matrix look like?
>>>>> 
>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
>>>>>> Hi Xingbo,
>>>>>> 
>>>>>> 
>>>>>> Thanks a lot for bringing up this discussion. Actually the size limit
>>>>> already becomes an issue during releasing 1.11.3 and 1.12.1. It blocks us
>>>>> to publish PyFlink packages to PyPI during the release as there is no
>>>>> enough space left (PS: already published the packages after increasing the
>>>>> size limit).
>>>>>> 
>>>>>> Considering that the total package size are about 1.5GB (220MB * 7) for
>>>>> each release, it makes sense to split the PyFlink package. It could reduce
>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for each release.
>>>>> We don’t need to increase the size limit any more in the next few years as
>>>>> currently we still have about 7.5 GB space left.
>>>>>> So +1 from my side.
>>>>>> 
>>>>>> Regards,
>>>>>> Dian
>>>>>> 
>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
>>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> Since release-1.11, pyflink has introduced cython support and we will
>>>>>>> release 7 packages (for different platforms and Python versions) to
>>>>> PyPI
>>>>>>> for each release and the size of each package is more than 200MB as we
>>>>> need
>>>>>>> to bundle the jar files into the package. The entire project space in
>>>>> PyPI
>>>>>>> grows very fast, and we need to apply to PyPI for more project space
>>>>>>> frequently. Please refer to [
>>>>> https://github.com/pypa/pypi-support/issues/831]
>>>>>>> for more details.
>>>>>>> 
>>>>>>> The root cause to this problem is that we bundled the jar files in each
>>>>>>> package. This is actually unnecessary if we could extract the jar files
>>>>>>> into a separate package which is dedicated to hold the jar files.
>>>>>>> 
>>>>>>> I’d like to propose to split the pyflink package into two packages: the
>>>>>>> original apache-flink  and apache-flink-libraries (Any suggestions for
>>>>> the
>>>>>>> name?). The package apache-flink-libraries only contains jar files and
>>>>>>> there is only one apache-flink-libraries package for each release. The
>>>>>>> package apache-flink depends on apache-flink-libraries and for users,
>>>>> they
>>>>>>> still only need to install apache-flink and there is nothing different
>>>>> from
>>>>>>> before. We still need to release multiple wheel packages of
>>>>> apache-flink.
>>>>>>> However, the size will be very small as it doesn't contain the jar
>>>>> files
>>>>>>> any more.
>>>>>>> 
>>>>>>> Looking forward to your feedback.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Xingbo


Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Chesnay Schepler <ch...@apache.org>.
key difference being that the beam .whl files are 3mb large, aka 60x 
smaller.

On 3/16/2021 1:06 PM, Dian Fu wrote:
> Hi Chesnay,
>
> We will publish binary packages separately for:
> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> 2) Linux / Mac separately
>
> Besides, there is also a source package which is used when none of the above binary packages is usable, e.g. for Window users.
>
> PS: publishing multiple binary packages is very common in Python world, e.g. Beam published 22 packages in 2.28, Pandas published 16 packages in 1.2.3 [2]. We could also publishing more packages if we splitting the packages as the cost of adding another package will be very small.
>
> Regards,
> Dian
>
> [1] https://pypi.org/project/apache-beam/#files <https://pypi.org/project/apache-beam/#files>
> [2] https://pypi.org/project/pandas/#files
>
>
> Hi Xintong,
>
> Yes, you are right that there is 9 packages in 1.12 as we added Python 3.8 support in 1.12.
>
> Regards,
> Dian
>
>> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
>>
>> And it's not only uploaded to PyPI, but the ASF mirrors as well.
>>
>> https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <to...@gmail.com> wrote:
>>
>>> Actually, I think it's 9 packages, not 7.
>>>
>>> Check here for the 1.12.2 packages.
>>> https://pypi.org/project/apache-flink/#files
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>>
>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <ch...@apache.org>
>>> wrote:
>>>
>>>> Am I reading this correctly that we publish 7 different artifacts just
>>>> for python?
>>>> What does the release matrix look like?
>>>>
>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
>>>>> Hi Xingbo,
>>>>>
>>>>>
>>>>> Thanks a lot for bringing up this discussion. Actually the size limit
>>>> already becomes an issue during releasing 1.11.3 and 1.12.1. It blocks us
>>>> to publish PyFlink packages to PyPI during the release as there is no
>>>> enough space left (PS: already published the packages after increasing the
>>>> size limit).
>>>>>
>>>>> Considering that the total package size are about 1.5GB (220MB * 7) for
>>>> each release, it makes sense to split the PyFlink package. It could reduce
>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for each release.
>>>> We don’t need to increase the size limit any more in the next few years as
>>>> currently we still have about 7.5 GB space left.
>>>>> So +1 from my side.
>>>>>
>>>>> Regards,
>>>>> Dian
>>>>>
>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Since release-1.11, pyflink has introduced cython support and we will
>>>>>> release 7 packages (for different platforms and Python versions) to
>>>> PyPI
>>>>>> for each release and the size of each package is more than 200MB as we
>>>> need
>>>>>> to bundle the jar files into the package. The entire project space in
>>>> PyPI
>>>>>> grows very fast, and we need to apply to PyPI for more project space
>>>>>> frequently. Please refer to [
>>>> https://github.com/pypa/pypi-support/issues/831]
>>>>>> for more details.
>>>>>>
>>>>>> The root cause to this problem is that we bundled the jar files in each
>>>>>> package. This is actually unnecessary if we could extract the jar files
>>>>>> into a separate package which is dedicated to hold the jar files.
>>>>>>
>>>>>> I’d like to propose to split the pyflink package into two packages: the
>>>>>> original apache-flink  and apache-flink-libraries (Any suggestions for
>>>> the
>>>>>> name?). The package apache-flink-libraries only contains jar files and
>>>>>> there is only one apache-flink-libraries package for each release. The
>>>>>> package apache-flink depends on apache-flink-libraries and for users,
>>>> they
>>>>>> still only need to install apache-flink and there is nothing different
>>>> from
>>>>>> before. We still need to release multiple wheel packages of
>>>> apache-flink.
>>>>>> However, the size will be very small as it doesn't contain the jar
>>>> files
>>>>>> any more.
>>>>>>
>>>>>> Looking forward to your feedback.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Xingbo
>>>>
>>>>
>


Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Dian Fu <di...@gmail.com>.
Hi Chesnay,

We will publish binary packages separately for:
1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
2) Linux / Mac separately

Besides, there is also a source package which is used when none of the above binary packages is usable, e.g. for Window users.

PS: publishing multiple binary packages is very common in Python world, e.g. Beam published 22 packages in 2.28, Pandas published 16 packages in 1.2.3 [2]. We could also publishing more packages if we splitting the packages as the cost of adding another package will be very small.

Regards,
Dian

[1] https://pypi.org/project/apache-beam/#files <https://pypi.org/project/apache-beam/#files>
[2] https://pypi.org/project/pandas/#files


Hi Xintong,

Yes, you are right that there is 9 packages in 1.12 as we added Python 3.8 support in 1.12.

Regards,
Dian

> 2021年3月16日 下午7:45,Xintong Song <to...@gmail.com> 写道:
> 
> And it's not only uploaded to PyPI, but the ASF mirrors as well.
> 
> https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> 
> Thank you~
> 
> Xintong Song
> 
> 
> 
> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <to...@gmail.com> wrote:
> 
>> Actually, I think it's 9 packages, not 7.
>> 
>> Check here for the 1.12.2 packages.
>> https://pypi.org/project/apache-flink/#files
>> 
>> Thank you~
>> 
>> Xintong Song
>> 
>> 
>> 
>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <ch...@apache.org>
>> wrote:
>> 
>>> Am I reading this correctly that we publish 7 different artifacts just
>>> for python?
>>> What does the release matrix look like?
>>> 
>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
>>>> Hi Xingbo,
>>>> 
>>>> 
>>>> Thanks a lot for bringing up this discussion. Actually the size limit
>>> already becomes an issue during releasing 1.11.3 and 1.12.1. It blocks us
>>> to publish PyFlink packages to PyPI during the release as there is no
>>> enough space left (PS: already published the packages after increasing the
>>> size limit).
>>>> 
>>>> 
>>>> Considering that the total package size are about 1.5GB (220MB * 7) for
>>> each release, it makes sense to split the PyFlink package. It could reduce
>>> the total package size to about 250MB (3MB * 7 + 220 MB) for each release.
>>> We don’t need to increase the size limit any more in the next few years as
>>> currently we still have about 7.5 GB space left.
>>>> 
>>>> So +1 from my side.
>>>> 
>>>> Regards,
>>>> Dian
>>>> 
>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
>>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> Since release-1.11, pyflink has introduced cython support and we will
>>>>> release 7 packages (for different platforms and Python versions) to
>>> PyPI
>>>>> for each release and the size of each package is more than 200MB as we
>>> need
>>>>> to bundle the jar files into the package. The entire project space in
>>> PyPI
>>>>> grows very fast, and we need to apply to PyPI for more project space
>>>>> frequently. Please refer to [
>>> https://github.com/pypa/pypi-support/issues/831]
>>>>> for more details.
>>>>> 
>>>>> The root cause to this problem is that we bundled the jar files in each
>>>>> package. This is actually unnecessary if we could extract the jar files
>>>>> into a separate package which is dedicated to hold the jar files.
>>>>> 
>>>>> I’d like to propose to split the pyflink package into two packages: the
>>>>> original apache-flink  and apache-flink-libraries (Any suggestions for
>>> the
>>>>> name?). The package apache-flink-libraries only contains jar files and
>>>>> there is only one apache-flink-libraries package for each release. The
>>>>> package apache-flink depends on apache-flink-libraries and for users,
>>> they
>>>>> still only need to install apache-flink and there is nothing different
>>> from
>>>>> before. We still need to release multiple wheel packages of
>>> apache-flink.
>>>>> However, the size will be very small as it doesn't contain the jar
>>> files
>>>>> any more.
>>>>> 
>>>>> Looking forward to your feedback.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Xingbo
>>> 
>>> 
>>> 


Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Xintong Song <to...@gmail.com>.
And it's not only uploaded to PyPI, but the ASF mirrors as well.

https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/

Thank you~

Xintong Song



On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <to...@gmail.com> wrote:

> Actually, I think it's 9 packages, not 7.
>
> Check here for the 1.12.2 packages.
> https://pypi.org/project/apache-flink/#files
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <ch...@apache.org>
> wrote:
>
>> Am I reading this correctly that we publish 7 different artifacts just
>> for python?
>> What does the release matrix look like?
>>
>> On 3/16/2021 3:45 AM, Dian Fu wrote:
>> > Hi Xingbo,
>> >
>> >
>> > Thanks a lot for bringing up this discussion. Actually the size limit
>> already becomes an issue during releasing 1.11.3 and 1.12.1. It blocks us
>> to publish PyFlink packages to PyPI during the release as there is no
>> enough space left (PS: already published the packages after increasing the
>> size limit).
>> >
>> >
>> > Considering that the total package size are about 1.5GB (220MB * 7) for
>> each release, it makes sense to split the PyFlink package. It could reduce
>> the total package size to about 250MB (3MB * 7 + 220 MB) for each release.
>> We don’t need to increase the size limit any more in the next few years as
>> currently we still have about 7.5 GB space left.
>> >
>> > So +1 from my side.
>> >
>> > Regards,
>> > Dian
>> >
>> >> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
>> >>
>> >> Hi everyone,
>> >>
>> >> Since release-1.11, pyflink has introduced cython support and we will
>> >> release 7 packages (for different platforms and Python versions) to
>> PyPI
>> >> for each release and the size of each package is more than 200MB as we
>> need
>> >> to bundle the jar files into the package. The entire project space in
>> PyPI
>> >> grows very fast, and we need to apply to PyPI for more project space
>> >> frequently. Please refer to [
>> https://github.com/pypa/pypi-support/issues/831]
>> >> for more details.
>> >>
>> >> The root cause to this problem is that we bundled the jar files in each
>> >> package. This is actually unnecessary if we could extract the jar files
>> >> into a separate package which is dedicated to hold the jar files.
>> >>
>> >> I’d like to propose to split the pyflink package into two packages: the
>> >> original apache-flink  and apache-flink-libraries (Any suggestions for
>> the
>> >> name?). The package apache-flink-libraries only contains jar files and
>> >> there is only one apache-flink-libraries package for each release. The
>> >> package apache-flink depends on apache-flink-libraries and for users,
>> they
>> >> still only need to install apache-flink and there is nothing different
>> from
>> >> before. We still need to release multiple wheel packages of
>> apache-flink.
>> >> However, the size will be very small as it doesn't contain the jar
>> files
>> >> any more.
>> >>
>> >> Looking forward to your feedback.
>> >>
>> >> Best,
>> >>
>> >> Xingbo
>>
>>
>>

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Xintong Song <to...@gmail.com>.
Actually, I think it's 9 packages, not 7.

Check here for the 1.12.2 packages.
https://pypi.org/project/apache-flink/#files

Thank you~

Xintong Song



On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <ch...@apache.org> wrote:

> Am I reading this correctly that we publish 7 different artifacts just
> for python?
> What does the release matrix look like?
>
> On 3/16/2021 3:45 AM, Dian Fu wrote:
> > Hi Xingbo,
> >
> >
> > Thanks a lot for bringing up this discussion. Actually the size limit
> already becomes an issue during releasing 1.11.3 and 1.12.1. It blocks us
> to publish PyFlink packages to PyPI during the release as there is no
> enough space left (PS: already published the packages after increasing the
> size limit).
> >
> >
> > Considering that the total package size are about 1.5GB (220MB * 7) for
> each release, it makes sense to split the PyFlink package. It could reduce
> the total package size to about 250MB (3MB * 7 + 220 MB) for each release.
> We don’t need to increase the size limit any more in the next few years as
> currently we still have about 7.5 GB space left.
> >
> > So +1 from my side.
> >
> > Regards,
> > Dian
> >
> >> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
> >>
> >> Hi everyone,
> >>
> >> Since release-1.11, pyflink has introduced cython support and we will
> >> release 7 packages (for different platforms and Python versions) to PyPI
> >> for each release and the size of each package is more than 200MB as we
> need
> >> to bundle the jar files into the package. The entire project space in
> PyPI
> >> grows very fast, and we need to apply to PyPI for more project space
> >> frequently. Please refer to [
> https://github.com/pypa/pypi-support/issues/831]
> >> for more details.
> >>
> >> The root cause to this problem is that we bundled the jar files in each
> >> package. This is actually unnecessary if we could extract the jar files
> >> into a separate package which is dedicated to hold the jar files.
> >>
> >> I’d like to propose to split the pyflink package into two packages: the
> >> original apache-flink  and apache-flink-libraries (Any suggestions for
> the
> >> name?). The package apache-flink-libraries only contains jar files and
> >> there is only one apache-flink-libraries package for each release. The
> >> package apache-flink depends on apache-flink-libraries and for users,
> they
> >> still only need to install apache-flink and there is nothing different
> from
> >> before. We still need to release multiple wheel packages of
> apache-flink.
> >> However, the size will be very small as it doesn't contain the jar files
> >> any more.
> >>
> >> Looking forward to your feedback.
> >>
> >> Best,
> >>
> >> Xingbo
>
>
>

Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Chesnay Schepler <ch...@apache.org>.
Am I reading this correctly that we publish 7 different artifacts just 
for python?
What does the release matrix look like?

On 3/16/2021 3:45 AM, Dian Fu wrote:
> Hi Xingbo,
>
>
> Thanks a lot for bringing up this discussion. Actually the size limit already becomes an issue during releasing 1.11.3 and 1.12.1. It blocks us to publish PyFlink packages to PyPI during the release as there is no enough space left (PS: already published the packages after increasing the size limit).
>
>
> Considering that the total package size are about 1.5GB (220MB * 7) for each release, it makes sense to split the PyFlink package. It could reduce the total package size to about 250MB (3MB * 7 + 220 MB) for each release. We don’t need to increase the size limit any more in the next few years as currently we still have about 7.5 GB space left.
>
> So +1 from my side.
>
> Regards,
> Dian
>
>> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
>>
>> Hi everyone,
>>
>> Since release-1.11, pyflink has introduced cython support and we will
>> release 7 packages (for different platforms and Python versions) to PyPI
>> for each release and the size of each package is more than 200MB as we need
>> to bundle the jar files into the package. The entire project space in PyPI
>> grows very fast, and we need to apply to PyPI for more project space
>> frequently. Please refer to [https://github.com/pypa/pypi-support/issues/831]
>> for more details.
>>
>> The root cause to this problem is that we bundled the jar files in each
>> package. This is actually unnecessary if we could extract the jar files
>> into a separate package which is dedicated to hold the jar files.
>>
>> I’d like to propose to split the pyflink package into two packages: the
>> original apache-flink  and apache-flink-libraries (Any suggestions for the
>> name?). The package apache-flink-libraries only contains jar files and
>> there is only one apache-flink-libraries package for each release. The
>> package apache-flink depends on apache-flink-libraries and for users, they
>> still only need to install apache-flink and there is nothing different from
>> before. We still need to release multiple wheel packages of apache-flink.
>> However, the size will be very small as it doesn't contain the jar files
>> any more.
>>
>> Looking forward to your feedback.
>>
>> Best,
>>
>> Xingbo



Re: [DISCUSS] Split PyFlink packages into two packages: apache-flink and apache-flink-libraries

Posted by Dian Fu <di...@gmail.com>.
Hi Xingbo,


Thanks a lot for bringing up this discussion. Actually the size limit already becomes an issue during releasing 1.11.3 and 1.12.1. It blocks us to publish PyFlink packages to PyPI during the release as there is no enough space left (PS: already published the packages after increasing the size limit).


Considering that the total package size are about 1.5GB (220MB * 7) for each release, it makes sense to split the PyFlink package. It could reduce the total package size to about 250MB (3MB * 7 + 220 MB) for each release. We don’t need to increase the size limit any more in the next few years as currently we still have about 7.5 GB space left.

So +1 from my side.

Regards,
Dian

> 2021年3月12日 下午2:30,Xingbo Huang <hx...@gmail.com> 写道:
> 
> Hi everyone,
> 
> Since release-1.11, pyflink has introduced cython support and we will
> release 7 packages (for different platforms and Python versions) to PyPI
> for each release and the size of each package is more than 200MB as we need
> to bundle the jar files into the package. The entire project space in PyPI
> grows very fast, and we need to apply to PyPI for more project space
> frequently. Please refer to [https://github.com/pypa/pypi-support/issues/831]
> for more details.
> 
> The root cause to this problem is that we bundled the jar files in each
> package. This is actually unnecessary if we could extract the jar files
> into a separate package which is dedicated to hold the jar files.
> 
> I’d like to propose to split the pyflink package into two packages: the
> original apache-flink  and apache-flink-libraries (Any suggestions for the
> name?). The package apache-flink-libraries only contains jar files and
> there is only one apache-flink-libraries package for each release. The
> package apache-flink depends on apache-flink-libraries and for users, they
> still only need to install apache-flink and there is nothing different from
> before. We still need to release multiple wheel packages of apache-flink.
> However, the size will be very small as it doesn't contain the jar files
> any more.
> 
> Looking forward to your feedback.
> 
> Best,
> 
> Xingbo