You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by jincheng sun <su...@gmail.com> on 2019/08/01 07:27:33 UTC

Re: [DISCUSS] Publish the PyFlink into PyPI

Thanks for your confirm Till !
Publish the PyFlink into PyPI is very important for our user,  I
have initiated a voting thread.

Best,
Jincheng

Till Rohrmann <tr...@apache.org> 于2019年7月29日周一 下午3:01写道:

> Sounds good to me. Thanks for driving this discussion.
>
> Cheers,
> Till
>
> On Mon, Jul 29, 2019 at 9:24 AM jincheng sun <su...@gmail.com>
> wrote:
>
> > Yes Till, I think you are correct that we should make sure that the
> > published Flink Python API cannot be arbitrarily deleted.
> >
> > So, It seems that our current consensus is:
> >
> > 1. Should we re publish the PyFlink into PyPI --> YES
> > 2. PyPI Project Name ---> apache-flink
> > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one
> binary
> > with the default Scala version same with flink default config.
> > 4. PyPI account for release --> Create an account such as 'pyflink' as
> > owner(only PMC can manage it) and adds the release manager's account as
> > maintainers of the project. Release managers publish the package to PyPI
> > using their own account but can not delete the release.
> >
> > So, If there no other comments, I think we should initiate a voting
> thread.
> >
> > What do you think?
> >
> > Best, Jincheng
> >
> >
> > Till Rohrmann <tr...@apache.org> 于2019年7月24日周三 下午1:17写道:
> >
> > > Sorry for chiming in so late. I would be in favor of option #2.
> > >
> > > I guess that the PMC would need to give the credentials to the release
> > > manager for option #1. Hence, the PMC could also add the release
> manager
> > as
> > > a maintainer which makes sure that only the PMC can delete artifacts.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jul 24, 2019 at 12:33 PM jincheng sun <
> sunjincheng121@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thanks for all of your reply!
> > > >
> > > > Hi Stephan, thanks for the reply and prove the details we need to pay
> > > > attention to. such as: Readme and Trademark compliance. Regarding the
> > > PyPI
> > > > account for release,  #1 may have some risk that our release package
> > can
> > > be
> > > > deleted by anyone who know the password of the account. And in this
> > case
> > > > PMC would not have means to correct problems. So, I think the #2 is
> > > pretty
> > > > safe for flink community.
> > > >
> > > > Hi Jeff&Dian, thanks for share your thoughts. Python API just a
> > language
> > > > entry point. I think which binary should be contained in the release
> we
> > > > should make consistency with Java release policy.  So, currently we
> do
> > > not
> > > > add the Hadoop, connectors JARs into the release package.
> > > >
> > > > Hi Chesnay, agree that we should ship the very common binary in
> feature
> > > if
> > > > Java side already make the decision.
> > > >
> > > > So, our current consensus is:
> > > > 1. Should we re publish the PyFlink into PyPI --> YES
> > > > 2. PyPI Project Name ---> apache-flink
> > > > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one
> > > binary
> > > > with the default Scala version same with flink default config.
> > > >
> > > > We still need discuss how to manage PyPI account for release:
> > > > --------
> > > > > 1) Create an account such as 'pyflink' as the owner share it with
> all
> > > the
> > > > release managers and then release managers can publish the package to
> > > PyPI
> > > > using this account.
> > > >     2) Create an account such as 'pyflink' as owner(only PMC can
> manage
> > > it)
> > > > and adds the release manager's account as maintainers of the project.
> > > > Release managers publish the package to PyPI using their own account.
> > > > --------
> > > > Stephan like the #1 but want PMC can correct the problems. (sounds
> like
> > > #2)
> > > > can you conform that ? @Stephan
> > > > Chesnay and I prefer to #2
> > > >
> > > > Best, Jincheng
> > > >
> > > > Chesnay Schepler <ch...@apache.org> 于2019年7月24日周三 下午3:57写道:
> > > >
> > > > > if we ship a binary, we should ship the binary we usually ship, not
> > > some
> > > > > highly customized version.
> > > > >
> > > > > On 24/07/2019 05:19, Dian Fu wrote:
> > > > > > Hi Stephan & Jeff,
> > > > > >
> > > > > > Thanks a lot for sharing your thoughts!
> > > > > >
> > > > > > Regarding the bundled jars, currently only the jars in the flink
> > > binary
> > > > > distribution is packaged in the pyflink package. That maybe a good
> > idea
> > > > to
> > > > > also bundle the other jars such as flink-hadoop-compatibility. We
> may
> > > > need
> > > > > also consider whether to bundle the format jars such as flink-avro,
> > > > > flink-json, flink-csv and the connector jars such as
> > > > flink-connector-kafka,
> > > > > etc.
> > > > > >
> > > > > > If FLINK_HOME is set, the binary distribution specified by
> > FLINK_HOME
> > > > > will be used instead.
> > > > > >
> > > > > > Regards,
> > > > > > Dian
> > > > > >
> > > > > >> 在 2019年7月24日,上午9:47,Jeff Zhang <zj...@gmail.com> 写道:
> > > > > >>
> > > > > >> +1 for publishing pyflink to pypi.
> > > > > >>
> > > > > >> Regarding including jar, I just want to make sure which flink
> > binary
> > > > > >> distribution we would ship with pyflink since we have multiple
> > flink
> > > > > binary
> > > > > >> distributions (w/o hadoop).
> > > > > >> Personally, I prefer to use the hadoop-included binary
> > distribution.
> > > > > >>
> > > > > >> And I just want to confirm whether it is possible for users to
> > use a
> > > > > >> different flink binary distribution as long as he set env
> > > FLINK_HOME.
> > > > > >>
> > > > > >> Besides that, I hope that there will be bi-direction link
> > reference
> > > > > between
> > > > > >> flink doc and pypi doc.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:07写道:
> > > > > >>
> > > > > >>> Hi!
> > > > > >>>
> > > > > >>> Sorry for the late involvement. Here are some thoughts from my
> > > side:
> > > > > >>>
> > > > > >>> Definitely +1 to publishing to PyPy, even if it is a binary
> > > release.
> > > > > >>> Community growth into other communities is great, and if this
> is
> > > the
> > > > > >>> natural way to reach developers in the Python community, let's
> do
> > > it.
> > > > > This
> > > > > >>> is not about our convenience, but reaching users.
> > > > > >>>
> > > > > >>> I think the way to look at this is that this is a convenience
> > > > > distribution
> > > > > >>> channel, courtesy of the Flink community. It is not an Apache
> > > > release,
> > > > > we
> > > > > >>> make this clear in the Readme.
> > > > > >>> Of course, this doesn't mean we don't try to uphold similar
> > > standards
> > > > > as
> > > > > >>> for our official release (like proper license information).
> > > > > >>>
> > > > > >>> Concerning credentials sharing, I would be fine with whatever
> > > option.
> > > > > The
> > > > > >>> PMC doesn't own it (it is an initiative by some community
> > members),
> > > > > but the
> > > > > >>> PMC needs to ensure trademark compliance, so slight preference
> > for
> > > > > option
> > > > > >>> #1 (PMC would have means to correct problems).
> > > > > >>>
> > > > > >>> I believe there is no need to differentiate between Scala
> > versions,
> > > > > because
> > > > > >>> this is merely a convenience thing for pure Python users. Users
> > > that
> > > > > mix
> > > > > >>> python and scala (and thus depend on specific scala versions)
> can
> > > > still
> > > > > >>> download from Apache or build themselves.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Stephan
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <
> > > > sunjincheng121@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Hi All,
> > > > > >>>>
> > > > > >>>> Thanks for the feedback @Chesnay Schepler <chesnay@apache.org
> >
> > > > @Dian!
> > > > > >>>>
> > > > > >>>> I think using `apache-flink` for the project name also makes
> > sense
> > > > to
> > > > > me.
> > > > > >>>> due to we should always keep in mind that Flink is owned by
> > > Apache.
> > > > > (And
> > > > > >>>> beam also using this pattern `apache-beam` for Python API)
> > > > > >>>>
> > > > > >>>> Regarding the Python API release with the JAVA JARs, I think
> the
> > > > > >>> principle
> > > > > >>>> of consideration is the convenience of the user. So, Thanks
> for
> > > the
> > > > > >>>> explanation @Dian!
> > > > > >>>>
> > > > > >>>> And your right @Chesnay Schepler <ch...@apache.org>  we
> can't
> > > > make
> > > > > a
> > > > > >>>> hasty decision and we need more people's opinions!
> > > > > >>>>
> > > > > >>>> So, I appreciate it if anyone can give us feedback and
> > > suggestions!
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>> Jincheng
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Chesnay Schepler <ch...@apache.org> 于2019年7月3日周三 下午8:46写道:
> > > > > >>>>
> > > > > >>>>> So this would not be a source release then, but a full-blown
> > > binary
> > > > > >>>>> release.
> > > > > >>>>>
> > > > > >>>>> Maybe it is just me, but I find it a bit suspect to ship an
> > > entire
> > > > > java
> > > > > >>>>> application via PyPI, just because there's a Python API for
> it.
> > > > > >>>>>
> > > > > >>>>> We definitely need input from more people here.
> > > > > >>>>>
> > > > > >>>>> On 03/07/2019 14:09, Dian Fu wrote:
> > > > > >>>>>> Hi Chesnay,
> > > > > >>>>>>
> > > > > >>>>>> Thanks a lot for the suggestions.
> > > > > >>>>>>
> > > > > >>>>>> Regarding “distributing java/scala code to PyPI”:
> > > > > >>>>>> The Python Table API is just a wrapper of the Java Table API
> > and
> > > > > >>>> without
> > > > > >>>>> the java/scala code, two steps will be needed to set up an
> > > > > environment
> > > > > >>> to
> > > > > >>>>> execute a Python Table API program:
> > > > > >>>>>> 1) Install pyflink using "pip install apache-flink"
> > > > > >>>>>> 2) Download the flink distribution and set the FLINK_HOME to
> > it.
> > > > > >>>>>> Besides, users have to make sure that the manually installed
> > > Flink
> > > > > is
> > > > > >>>>> compatible with the pip installed pyflink.
> > > > > >>>>>> Bundle the java/scala code inside the Python package will
> > > > eliminate
> > > > > >>>> step
> > > > > >>>>> 2) and makes it more simple for users to install pyflink.
> There
> > > > was a
> > > > > >>>> short
> > > > > >>>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267
> >
> > on
> > > > > this
> > > > > >>> in
> > > > > >>>>> Spark community and they finally decide to package the
> > java/scala
> > > > > code
> > > > > >>> in
> > > > > >>>>> the python package. (BTW, PySpark only bundle the jars of
> scala
> > > > > 2.11).
> > > > > >>>>>> Regards,
> > > > > >>>>>> Dian
> > > > > >>>>>>
> > > > > >>>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <ch...@apache.org>
> 写道:
> > > > > >>>>>>>
> > > > > >>>>>>> The existing artifact in the pyflink project was neither
> > > released
> > > > > by
> > > > > >>>>> the Flink project / anyone affiliated with it nor approved by
> > the
> > > > > Flink
> > > > > >>>> PMC.
> > > > > >>>>>>> As such, if we were to use this account I believe we should
> > > > delete
> > > > > >>> it
> > > > > >>>>> to not mislead users that this is in any way an
> apache-provided
> > > > > >>>>> distribution. Since this goes against the users wishes, I
> would
> > > be
> > > > in
> > > > > >>>> favor
> > > > > >>>>> of creating a separate account, and giving back control over
> > the
> > > > > >>> pyflink
> > > > > >>>>> account.
> > > > > >>>>>>> My take on the raised points:
> > > > > >>>>>>> 1.1) "apache-flink"
> > > > > >>>>>>> 1.2)  option 2
> > > > > >>>>>>> 2) Given that we only distribute python code there should
> be
> > no
> > > > > >>> reason
> > > > > >>>>> to differentiate between scala versions. We should not be
> > > > > distributing
> > > > > >>>> any
> > > > > >>>>> java/scala code and/or modules to PyPi. Currently, I'm a bit
> > > > confused
> > > > > >>>> about
> > > > > >>>>> this question and wonder what exactly we are trying to
> publish
> > > > here.
> > > > > >>>>>>> 3) The should be treated as any other source release; i.e.,
> > it
> > > > > >>> needs a
> > > > > >>>>> LICENSE and NOTICE file, signatures and a PMC vote. My
> > suggestion
> > > > > would
> > > > > >>>> be
> > > > > >>>>> to make this part of our normal release process. There will
> be
> > > > _one_
> > > > > >>>> source
> > > > > >>>>> release on dist.apache.org encompassing everything, and a
> > > separate
> > > > > >>>> python
> > > > > >>>>> of focused source release that we push to PyPi. The LICENSE
> and
> > > > > NOTICE
> > > > > >>>>> contained in the python source release must also be present
> in
> > > the
> > > > > >>> source
> > > > > >>>>> release of Flink; so basically the python source release is
> > just
> > > > the
> > > > > >>>>> contents of flink-python module the maven pom.xml, with no
> > other
> > > > > >>> special
> > > > > >>>>> sauce added during the release process.
> > > > > >>>>>>> On 02/07/2019 05:42, jincheng sun wrote:
> > > > > >>>>>>>> Hi all,
> > > > > >>>>>>>>
> > > > > >>>>>>>> With the effort of FLIP-38 [1], the Python Table
> API(without
> > > UDF
> > > > > >>>>> support
> > > > > >>>>>>>> for now) will be supported in the coming release-1.9.
> > > > > >>>>>>>> As described in "Build PyFlink"[2], if users want to use
> the
> > > > > Python
> > > > > >>>>> Table
> > > > > >>>>>>>> API, they can manually install it using the command:
> > > > > >>>>>>>> "cd flink-python && python3 setup.py sdist && pip install
> > > > > >>>>> dist/*.tar.gz".
> > > > > >>>>>>>> This is non-trivial for users and it will be better if we
> > can
> > > > > >>> follow
> > > > > >>>>> the
> > > > > >>>>>>>> Python way to publish PyFlink to PyPI
> > > > > >>>>>>>> which is a repository of software for the Python
> programming
> > > > > >>>> language.
> > > > > >>>>> Then
> > > > > >>>>>>>> users can use the standard Python package
> > > > > >>>>>>>> manager "pip" to install PyFlink: "pip install pyflink".
> So,
> > > > there
> > > > > >>>> are
> > > > > >>>>> some
> > > > > >>>>>>>> topic need to be discussed as follows:
> > > > > >>>>>>>>
> > > > > >>>>>>>> 1. How to publish PyFlink to PyPI
> > > > > >>>>>>>>
> > > > > >>>>>>>> 1.1 Project Name
> > > > > >>>>>>>>       We need to decide the project name of PyPI to use,
> for
> > > > > >>> example,
> > > > > >>>>>>>> apache-flink,  pyflink, etc.
> > > > > >>>>>>>>
> > > > > >>>>>>>>      Regarding to the name "pyflink", it has already been
> > > > > >>> registered
> > > > > >>>> by
> > > > > >>>>>>>> @ueqt and there is already a package '1.0' released under
> > this
> > > > > >>>> project
> > > > > >>>>>>>> which is based on flink-libraries/flink-python.
> > > > > >>>>>>>>
> > > > > >>>>>>>>     @ueqt has kindly agreed to give this project back to
> the
> > > > > >>>>> community. And
> > > > > >>>>>>>> he has requested that the released package '1.0' should
> not
> > be
> > > > > >>>> removed
> > > > > >>>>> as
> > > > > >>>>>>>> it has already been used in their company.
> > > > > >>>>>>>>
> > > > > >>>>>>>>      So we need to decide whether to use the name
> 'pyflink'?
> > > If
> > > > > >>> yes,
> > > > > >>>>> we
> > > > > >>>>>>>> need to figure out how to tackle with the package '1.0'
> > under
> > > > this
> > > > > >>>>> project.
> > > > > >>>>>>>>      From the points of my view, the "pyflink" is better
> for
> > > our
> > > > > >>>>> project
> > > > > >>>>>>>> name and we can keep the release of 1.0, maybe more people
> > > want
> > > > to
> > > > > >>>> use.
> > > > > >>>>>>>> 1.2 PyPI account for release
> > > > > >>>>>>>>      We need also decide on which account to use to
> publish
> > > > > >>> packages
> > > > > >>>>> to PyPI.
> > > > > >>>>>>>>      There are two permissions in PyPI: owner and
> > maintainer:
> > > > > >>>>>>>>
> > > > > >>>>>>>>      1) The owner can upload releases, delete files,
> > releases
> > > or
> > > > > >>> the
> > > > > >>>>> entire
> > > > > >>>>>>>> project.
> > > > > >>>>>>>>      2) The maintainer can also upload releases. However,
> > they
> > > > > >>> cannot
> > > > > >>>>> delete
> > > > > >>>>>>>> files, releases, or the project.
> > > > > >>>>>>>>
> > > > > >>>>>>>>      So there are two options in my mind:
> > > > > >>>>>>>>
> > > > > >>>>>>>>      1) Create an account such as 'pyflink' as the owner
> > share
> > > > it
> > > > > >>>> with
> > > > > >>>>> all
> > > > > >>>>>>>> the release managers and then release managers can publish
> > the
> > > > > >>>> package
> > > > > >>>>> to
> > > > > >>>>>>>> PyPI using this account.
> > > > > >>>>>>>>      2) Create an account such as 'pyflink' as owner(only
> > PMC
> > > > can
> > > > > >>>>> manage it)
> > > > > >>>>>>>> and adds the release manager's account as maintainers of
> the
> > > > > >>> project.
> > > > > >>>>>>>> Release managers publish the package to PyPI using their
> own
> > > > > >>> account.
> > > > > >>>>>>>>      As I know, PySpark takes Option 1) and Apache Beam
> > takes
> > > > > >>> Option
> > > > > >>>>> 2).
> > > > > >>>>>>>>      From the points of my view, I prefer option 2) as
> it's
> > > > pretty
> > > > > >>>>> safer as
> > > > > >>>>>>>> it eliminate the risk of deleting old releases
> occasionally
> > > and
> > > > at
> > > > > >>>> the
> > > > > >>>>> same
> > > > > >>>>>>>> time keeps the trace of who is operating.
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2. How to handle Scala_2.11 and Scala_2.12
> > > > > >>>>>>>>
> > > > > >>>>>>>> The PyFlink package bundles the jars in the package. As we
> > > know,
> > > > > >>>> there
> > > > > >>>>> are
> > > > > >>>>>>>> two versions of jars for each module: one for Scala 2.11
> and
> > > the
> > > > > >>>> other
> > > > > >>>>> for
> > > > > >>>>>>>> Scala 2.12. So there will be two PyFlink packages
> > > theoretically.
> > > > > We
> > > > > >>>>> need to
> > > > > >>>>>>>> decide which one to publish to PyPI or both. If both
> > packages
> > > > will
> > > > > >>> be
> > > > > >>>>>>>> published to PyPI, we may need two projects, such as
> > > pyflink_211
> > > > > >>> and
> > > > > >>>>>>>> pyflink_212 separately. Maybe more in the future such as
> > > > > >>> pyflink_213.
> > > > > >>>>>>>>      (BTW, I think we should bring up a discussion for
> dorp
> > > > > >>>> Scala_2.11
> > > > > >>>>> in
> > > > > >>>>>>>> Flink 1.10 release due to 2.13 is available in early
> June.)
> > > > > >>>>>>>>
> > > > > >>>>>>>>      From the points of my view, for now, we can only
> > release
> > > > the
> > > > > >>>>> scala_2.11
> > > > > >>>>>>>> version, due to scala_2.11 is our default version in
> Flink.
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3. Legal problems of publishing to PyPI
> > > > > >>>>>>>>
> > > > > >>>>>>>> As @Chesnay Schepler <ch...@apache.org>  pointed out in
> > > > > >>>>> FLINK-13011[3],
> > > > > >>>>>>>> publishing PyFlink to PyPI means that we will publish
> > binaries
> > > > to
> > > > > a
> > > > > >>>>>>>> distribution channel not owned by Apache. We need to
> figure
> > > out
> > > > if
> > > > > >>>>> there
> > > > > >>>>>>>> are legal problems. From my point of view, there are no
> > > problems
> > > > > >>> as a
> > > > > >>>>> few
> > > > > >>>>>>>> Apache projects such as Spark, Beam, etc have already done
> > it.
> > > > > >>>> Frankly
> > > > > >>>>>>>> speaking, I am not familiar with this problem, welcome any
> > > > > feedback
> > > > > >>>> on
> > > > > >>>>> this
> > > > > >>>>>>>> if somebody is more family with this.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's
> project
> > > > name
> > > > > >>>>> `pyflink`
> > > > > >>>>>>>> to the Apache Flink community!!!
> > > > > >>>>>>>> Great thanks to @Dian for the offline effort!!!
> > > > > >>>>>>>>
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>> Jincheng
> > > > > >>>>>>>>
> > > > > >>>>>>>> [1]
> > > > > >>>>>>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API
> > > > > >>>>>>>> [2]
> > > > > >>>>>>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink
> > > > > >>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011
> > > > > >>>>>>>>
> > > > > >>>>>
> > > > > >>
> > > > > >> --
> > > > > >> Best Regards
> > > > > >>
> > > > > >> Jeff Zhang
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>