You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Sergio Fernández <wi...@apache.org> on 2016/08/12 16:49:12 UTC

Integrate Python SDK in the Maven build

Hi everybody,

few weeks ago we started the discussion about integrating the Python SDK in
the main Maven buid [1]. With the support from Ahmet Altay, we got a PR [2]
that I think should be ready to be merged into the python-sdk branch.

What's the feeling of the project? Do you think is valuable to have an
integrated build?

That leads us to the next aspect: versioning. Initially I had the idea of
using the same version that the Maven artifact. After talking with Silviu
Calinoiu we kept that aside for the moment. But I think at some point we
should also discuss if the project wants to align the versions [3].

Thanks for the feedback.

Cheers,

[1] https://issues.apache.org/jira/browse/BEAM-378
[2] https://github.com/apache/incubator-beam/pull/537
[3] https://issues.apache.org/jira/browse/BEAM-547

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: Integrate Python SDK in the Maven build

Posted by Sergio Fernández <wi...@apache.org>.
Thanks, Ahmet. I think PR #855 plus your explanation gives a pretty good
explanation about the change that needs to be discussed within the podling.

I'll be off-line for three weeks, but I'd like to catch up with this and
other topics when I'll be back from vacations.

Cheers,

On Fri, Aug 19, 2016 at 8:49 PM, Ahmet Altay <al...@google.com.invalid>
wrote:

> Thank you Sergio. I agree, let's first focus on where do we want to keep
> the
> version information.
>
> I would like to summarize the available options to help with the
> discussion.
> We have two options to store the version information:
>
> 1. In version.py file
>
> Version is kept in a global variable (__version__) in the version.py file.
> setup.py directly refers to this global variable. Note that setup.py needs
> to
> access this information before package is installed so it cannot easily
> rely on
> imports, method calls etc. Other parts of the code also do exactly the same
> thing to access this information through the global variable.
>
> Pros:
> - Unified access pattern between setup.py and other files.
> - Simple Pythonic way to manage information.
>
> Cons:
> - Another version field that needs to be maintained together with the
> version
> in the pom.xml
>
> Changes Required:
> - Minimal, since this is the current method in use. We will need to remove
> some
> unused methods from the version.py. Those were added in anticipation of a
> change.
>
> 2. In pom.xml file
>
> Version is kept inside the pom.xml file. This is probably most familiar to
> the
> people used to Maven. version.py file has methods to parse pom.xml file to
> extract version information. Then it does some string modifications on the
> version information to make it PEP440 compliant [1]. setup.py cannot call
> these method directly. The file has a main method (a self executable) that
> will initiate the version parsing. setup.py executes the file first to get
> the
> version. Other methods inside the package directly calls the get_version()
> method.
>
> Pros:
> - Single version maintained in the pom.xml file.
>
> Cons:
> - Not PEP440 compliant version out of the box.
> - Different access pattern for setup.py and other files.
> - Requires packaging of a single pom.xml file (without any of its parent
> pom.xml files) in the distributed Python package.
> - Requires fragile file parsing to access version.
>
> Change required:
> - Small change required, I prepared this PR to illustrate what is needed:
> https://github.com/apache/incubator-beam/pull/855
>
> I prefer the first option as it is the more Pythonic way of addressing this
> problem. What do you think?
>
> Thank you,
> Ahmet
>
> [1] https://www.python.org/dev/peps/pep-0440/
>
>
> On Tue, Aug 16, 2016 at 6:04 AM, Sergio Fernández <wi...@apache.org>
> wrote:
>
> > Hi Ahmet,
> >
> > On Fri, Aug 12, 2016 at 8:55 PM, Ahmet Altay <al...@google.com.invalid>
> > wrote:
> >
> > > Hi Sergio,
> > >
> > > Thank you for the integration PR. This will be very useful. As an
> > immediate
> > > benefit Python SDK will have precommit coverage through Jenkins. There
> > was
> > > already existing coverage with Travis, nevertheless this will give us
> > > additional testing. It is also an important step towards maturing the
> > > python-
> > > sdk branch to be merged into the master branch. My opinion is to get
> this
> > > bit
> > > in and benefit from it now.
> > >
> > > Related to versioning bit. I believe Silviu wanted to get the Maven
> > > integration first. It is now a good time to have this discussion.
> >
> >
> > Sure, and I agreed on that, step by step is always better. I just wanted
> to
> > bring the discussion to dev@, where I expect people has a broader
> opinion
> > where to go in the mid-long term.
> >
> >
> > > There
> > > are two components to this discussion:
> > >
> > > (1) Reading the version information from the pom.xml. This should
> happen
> > > regardless of what versioning strategy we choose. We can continue this
> > > discussion on how to implement that in BEAM-547.
> > >
> > > (2) What version to use for Python SDK? This is a better forum to
> answer
> > > this question.
> > >
> > > As a context, there was an earlier community discussion on this [1][2]:
> > >
> > > > Finally, we propose the standard approach where the entire source
> code
> > > lives
> > > > in each branch and is released concurrently. We’d like to avoid the
> > case
> > > > where individual modules are released on different cadences and are
> > being
> > > > maintained in different branches of the main repository. This is
> > > beneficial
> > > > because we don’t need to worry as much about versioning some of the
> > > surfaces
> > > > between components. However, for components that have a stable
> surface
> > or
> > > > across languages, we can relax this, as appropriate. Additionally,
> this
> > > can
> > > > be relaxed for hotfixes and different SDK languages.
> > >
> > > Based on the above quote by default we should use the same version,
> > however
> > > it
> > > is also possible to relax that requirement for a different SDK.
> > >
> >
> > Interesting...
> >
> >
> > I propose that Python SDK to be versioned differently for two reasons:
> > >
> > > (1) Python SDK does not have all the features yet and it is likely that
> > it
> > > will
> > > play catch up for a while. During this time it would be confusing to
> the
> > > users. A user might assume that a released Python SDK with version X to
> > > have
> > > all the features that are available in Java SDK version X.
> > > (2) In the future, it is possible that different user communities might
> > > embrace
> > > different SDKs. Having different versions would give the flexibility to
> > the
> > > SDK
> > > developers to prioritize feature request differently and potentially
> have
> > > non-synchronized release schedules.
> > >
> >
> > I agree, SDKs has a different lifecycle that APIs, so that need to be
> > covered by the versioning strategy.
> >
> > Not sure if it has been discussed. But commonly to version APIs and SDKs
> > sometimes people uses the following schema:
> >
> > - version X.Y identifies the API
> > - version X.Y.Z identified the SDK compatible with API X.Y
> >
> > Anyway, as we did with the code PR, I'd prefer to keep the versioning for
> > later discussion. Firstly I'd focus on agreeing, or not, in using the
> Maven
> > build also for the Python SDK.
> >
> > Cheers,
> >
> > --
> > Sergio Fernández
> > Partner Technology Manager
> > Redlink GmbH
> > m: +43 6602747925
> > e: sergio.fernandez@redlink.co
> > w: http://redlink.co
> >
>



-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: Integrate Python SDK in the Maven build

Posted by Ahmet Altay <al...@google.com.INVALID>.
Thank you Sergio. I agree, let's first focus on where do we want to keep the
version information.

I would like to summarize the available options to help with the discussion.
We have two options to store the version information:

1. In version.py file

Version is kept in a global variable (__version__) in the version.py file.
setup.py directly refers to this global variable. Note that setup.py needs
to
access this information before package is installed so it cannot easily
rely on
imports, method calls etc. Other parts of the code also do exactly the same
thing to access this information through the global variable.

Pros:
- Unified access pattern between setup.py and other files.
- Simple Pythonic way to manage information.

Cons:
- Another version field that needs to be maintained together with the
version
in the pom.xml

Changes Required:
- Minimal, since this is the current method in use. We will need to remove
some
unused methods from the version.py. Those were added in anticipation of a
change.

2. In pom.xml file

Version is kept inside the pom.xml file. This is probably most familiar to
the
people used to Maven. version.py file has methods to parse pom.xml file to
extract version information. Then it does some string modifications on the
version information to make it PEP440 compliant [1]. setup.py cannot call
these method directly. The file has a main method (a self executable) that
will initiate the version parsing. setup.py executes the file first to get
the
version. Other methods inside the package directly calls the get_version()
method.

Pros:
- Single version maintained in the pom.xml file.

Cons:
- Not PEP440 compliant version out of the box.
- Different access pattern for setup.py and other files.
- Requires packaging of a single pom.xml file (without any of its parent
pom.xml files) in the distributed Python package.
- Requires fragile file parsing to access version.

Change required:
- Small change required, I prepared this PR to illustrate what is needed:
https://github.com/apache/incubator-beam/pull/855

I prefer the first option as it is the more Pythonic way of addressing this
problem. What do you think?

Thank you,
Ahmet

[1] https://www.python.org/dev/peps/pep-0440/


On Tue, Aug 16, 2016 at 6:04 AM, Sergio Fernández <wi...@apache.org> wrote:

> Hi Ahmet,
>
> On Fri, Aug 12, 2016 at 8:55 PM, Ahmet Altay <al...@google.com.invalid>
> wrote:
>
> > Hi Sergio,
> >
> > Thank you for the integration PR. This will be very useful. As an
> immediate
> > benefit Python SDK will have precommit coverage through Jenkins. There
> was
> > already existing coverage with Travis, nevertheless this will give us
> > additional testing. It is also an important step towards maturing the
> > python-
> > sdk branch to be merged into the master branch. My opinion is to get this
> > bit
> > in and benefit from it now.
> >
> > Related to versioning bit. I believe Silviu wanted to get the Maven
> > integration first. It is now a good time to have this discussion.
>
>
> Sure, and I agreed on that, step by step is always better. I just wanted to
> bring the discussion to dev@, where I expect people has a broader opinion
> where to go in the mid-long term.
>
>
> > There
> > are two components to this discussion:
> >
> > (1) Reading the version information from the pom.xml. This should happen
> > regardless of what versioning strategy we choose. We can continue this
> > discussion on how to implement that in BEAM-547.
> >
> > (2) What version to use for Python SDK? This is a better forum to answer
> > this question.
> >
> > As a context, there was an earlier community discussion on this [1][2]:
> >
> > > Finally, we propose the standard approach where the entire source code
> > lives
> > > in each branch and is released concurrently. We’d like to avoid the
> case
> > > where individual modules are released on different cadences and are
> being
> > > maintained in different branches of the main repository. This is
> > beneficial
> > > because we don’t need to worry as much about versioning some of the
> > surfaces
> > > between components. However, for components that have a stable surface
> or
> > > across languages, we can relax this, as appropriate. Additionally, this
> > can
> > > be relaxed for hotfixes and different SDK languages.
> >
> > Based on the above quote by default we should use the same version,
> however
> > it
> > is also possible to relax that requirement for a different SDK.
> >
>
> Interesting...
>
>
> I propose that Python SDK to be versioned differently for two reasons:
> >
> > (1) Python SDK does not have all the features yet and it is likely that
> it
> > will
> > play catch up for a while. During this time it would be confusing to the
> > users. A user might assume that a released Python SDK with version X to
> > have
> > all the features that are available in Java SDK version X.
> > (2) In the future, it is possible that different user communities might
> > embrace
> > different SDKs. Having different versions would give the flexibility to
> the
> > SDK
> > developers to prioritize feature request differently and potentially have
> > non-synchronized release schedules.
> >
>
> I agree, SDKs has a different lifecycle that APIs, so that need to be
> covered by the versioning strategy.
>
> Not sure if it has been discussed. But commonly to version APIs and SDKs
> sometimes people uses the following schema:
>
> - version X.Y identifies the API
> - version X.Y.Z identified the SDK compatible with API X.Y
>
> Anyway, as we did with the code PR, I'd prefer to keep the versioning for
> later discussion. Firstly I'd focus on agreeing, or not, in using the Maven
> build also for the Python SDK.
>
> Cheers,
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernandez@redlink.co
> w: http://redlink.co
>

Re: Integrate Python SDK in the Maven build

Posted by Sergio Fernández <wi...@apache.org>.
Hi Ahmet,

On Fri, Aug 12, 2016 at 8:55 PM, Ahmet Altay <al...@google.com.invalid>
wrote:

> Hi Sergio,
>
> Thank you for the integration PR. This will be very useful. As an immediate
> benefit Python SDK will have precommit coverage through Jenkins. There was
> already existing coverage with Travis, nevertheless this will give us
> additional testing. It is also an important step towards maturing the
> python-
> sdk branch to be merged into the master branch. My opinion is to get this
> bit
> in and benefit from it now.
>
> Related to versioning bit. I believe Silviu wanted to get the Maven
> integration first. It is now a good time to have this discussion.


Sure, and I agreed on that, step by step is always better. I just wanted to
bring the discussion to dev@, where I expect people has a broader opinion
where to go in the mid-long term.


> There
> are two components to this discussion:
>
> (1) Reading the version information from the pom.xml. This should happen
> regardless of what versioning strategy we choose. We can continue this
> discussion on how to implement that in BEAM-547.
>
> (2) What version to use for Python SDK? This is a better forum to answer
> this question.
>
> As a context, there was an earlier community discussion on this [1][2]:
>
> > Finally, we propose the standard approach where the entire source code
> lives
> > in each branch and is released concurrently. We’d like to avoid the case
> > where individual modules are released on different cadences and are being
> > maintained in different branches of the main repository. This is
> beneficial
> > because we don’t need to worry as much about versioning some of the
> surfaces
> > between components. However, for components that have a stable surface or
> > across languages, we can relax this, as appropriate. Additionally, this
> can
> > be relaxed for hotfixes and different SDK languages.
>
> Based on the above quote by default we should use the same version, however
> it
> is also possible to relax that requirement for a different SDK.
>

Interesting...


I propose that Python SDK to be versioned differently for two reasons:
>
> (1) Python SDK does not have all the features yet and it is likely that it
> will
> play catch up for a while. During this time it would be confusing to the
> users. A user might assume that a released Python SDK with version X to
> have
> all the features that are available in Java SDK version X.
> (2) In the future, it is possible that different user communities might
> embrace
> different SDKs. Having different versions would give the flexibility to the
> SDK
> developers to prioritize feature request differently and potentially have
> non-synchronized release schedules.
>

I agree, SDKs has a different lifecycle that APIs, so that need to be
covered by the versioning strategy.

Not sure if it has been discussed. But commonly to version APIs and SDKs
sometimes people uses the following schema:

- version X.Y identifies the API
- version X.Y.Z identified the SDK compatible with API X.Y

Anyway, as we did with the code PR, I'd prefer to keep the versioning for
later discussion. Firstly I'd focus on agreeing, or not, in using the Maven
build also for the Python SDK.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: Integrate Python SDK in the Maven build

Posted by Lukasz Cwik <lc...@google.com.INVALID>.
While an SDK is heavily in development I can see why we would want to have
different versions.

I also see that once an SDK matures, there is a lot of benefit of knowing
that these SDKs are effectively equivalent beyond the choice of language.

Once an SDK is mature, will new features that aren't experimental anymore
*require* or *recommend* that they are developed in all SDKs?


On Fri, Aug 12, 2016 at 11:55 AM, Ahmet Altay <al...@google.com.invalid>
wrote:

> Hi Sergio,
>
> Thank you for the integration PR. This will be very useful. As an immediate
> benefit Python SDK will have precommit coverage through Jenkins. There was
> already existing coverage with Travis, nevertheless this will give us
> additional testing. It is also an important step towards maturing the
> python-
> sdk branch to be merged into the master branch. My opinion is to get this
> bit
> in and benefit from it now.
>
> Related to versioning bit. I believe Silviu wanted to get the Maven
> integration first. It is now a good time to have this discussion. There
> are two components to this discussion:
>
> (1) Reading the version information from the pom.xml. This should happen
> regardless of what versioning strategy we choose. We can continue this
> discussion on how to implement that in BEAM-547.
>
> (2) What version to use for Python SDK? This is a better forum to answer
> this question.
>
> As a context, there was an earlier community discussion on this [1][2]:
>
> > Finally, we propose the standard approach where the entire source code
> lives
> > in each branch and is released concurrently. We’d like to avoid the case
> > where individual modules are released on different cadences and are being
> > maintained in different branches of the main repository. This is
> beneficial
> > because we don’t need to worry as much about versioning some of the
> surfaces
> > between components. However, for components that have a stable surface or
> > across languages, we can relax this, as appropriate. Additionally, this
> can
> > be relaxed for hotfixes and different SDK languages.
>
> Based on the above quote by default we should use the same version, however
> it
> is also possible to relax that requirement for a different SDK.
>
> I propose that Python SDK to be versioned differently for two reasons:
>
> (1) Python SDK does not have all the features yet and it is likely that it
> will
> play catch up for a while. During this time it would be confusing to the
> users. A user might assume that a released Python SDK with version X to
> have
> all the features that are available in Java SDK version X.
> (2) In the future, it is possible that different user communities might
> embrace
> different SDKs. Having different versions would give the flexibility to the
> SDK
> developers to prioritize feature request differently and potentially have
> non-synchronized release schedules.
>
> Thanks,
> Ahmet
>
> [1]
> https://lists.apache.org/thread.html/3b201b523701df077bee1a916a8af8
> dbaf3b11c28aa83015f71dad93@1455032762@%3Cdev.beam.apache.org%3E
> [2]
> https://docs.google.com/document/d/1mTeZED33Famq25XedbKeDlGIJRvtz
> CXjSfwH9NKQYUE/edit?usp=sharing
>
> On Fri, Aug 12, 2016 at 9:49 AM, Sergio Fernández <wi...@apache.org>
> wrote:
>
> > Hi everybody,
> >
> > few weeks ago we started the discussion about integrating the Python SDK
> in
> > the main Maven buid [1]. With the support from Ahmet Altay, we got a PR
> [2]
> > that I think should be ready to be merged into the python-sdk branch.
> >
> > What's the feeling of the project? Do you think is valuable to have an
> > integrated build?
> >
> > That leads us to the next aspect: versioning. Initially I had the idea of
> > using the same version that the Maven artifact. After talking with Silviu
> > Calinoiu we kept that aside for the moment. But I think at some point we
> > should also discuss if the project wants to align the versions [3].
> >
> > Thanks for the feedback.
> >
> > Cheers,
> >
> > [1] https://issues.apache.org/jira/browse/BEAM-378
> > [2] https://github.com/apache/incubator-beam/pull/537
> > [3] https://issues.apache.org/jira/browse/BEAM-547
> >
> > --
> > Sergio Fernández
> > Partner Technology Manager
> > Redlink GmbH
> > m: +43 6602747925
> > e: sergio.fernandez@redlink.co
> > w: http://redlink.co
> >
>

Re: Integrate Python SDK in the Maven build

Posted by Ahmet Altay <al...@google.com.INVALID>.
Hi Sergio,

Thank you for the integration PR. This will be very useful. As an immediate
benefit Python SDK will have precommit coverage through Jenkins. There was
already existing coverage with Travis, nevertheless this will give us
additional testing. It is also an important step towards maturing the
python-
sdk branch to be merged into the master branch. My opinion is to get this
bit
in and benefit from it now.

Related to versioning bit. I believe Silviu wanted to get the Maven
integration first. It is now a good time to have this discussion. There
are two components to this discussion:

(1) Reading the version information from the pom.xml. This should happen
regardless of what versioning strategy we choose. We can continue this
discussion on how to implement that in BEAM-547.

(2) What version to use for Python SDK? This is a better forum to answer
this question.

As a context, there was an earlier community discussion on this [1][2]:

> Finally, we propose the standard approach where the entire source code
lives
> in each branch and is released concurrently. We’d like to avoid the case
> where individual modules are released on different cadences and are being
> maintained in different branches of the main repository. This is
beneficial
> because we don’t need to worry as much about versioning some of the
surfaces
> between components. However, for components that have a stable surface or
> across languages, we can relax this, as appropriate. Additionally, this
can
> be relaxed for hotfixes and different SDK languages.

Based on the above quote by default we should use the same version, however
it
is also possible to relax that requirement for a different SDK.

I propose that Python SDK to be versioned differently for two reasons:

(1) Python SDK does not have all the features yet and it is likely that it
will
play catch up for a while. During this time it would be confusing to the
users. A user might assume that a released Python SDK with version X to have
all the features that are available in Java SDK version X.
(2) In the future, it is possible that different user communities might
embrace
different SDKs. Having different versions would give the flexibility to the
SDK
developers to prioritize feature request differently and potentially have
non-synchronized release schedules.

Thanks,
Ahmet

[1]
https://lists.apache.org/thread.html/3b201b523701df077bee1a916a8af8dbaf3b11c28aa83015f71dad93@1455032762@%3Cdev.beam.apache.org%3E
[2]
https://docs.google.com/document/d/1mTeZED33Famq25XedbKeDlGIJRvtzCXjSfwH9NKQYUE/edit?usp=sharing

On Fri, Aug 12, 2016 at 9:49 AM, Sergio Fernández <wi...@apache.org> wrote:

> Hi everybody,
>
> few weeks ago we started the discussion about integrating the Python SDK in
> the main Maven buid [1]. With the support from Ahmet Altay, we got a PR [2]
> that I think should be ready to be merged into the python-sdk branch.
>
> What's the feeling of the project? Do you think is valuable to have an
> integrated build?
>
> That leads us to the next aspect: versioning. Initially I had the idea of
> using the same version that the Maven artifact. After talking with Silviu
> Calinoiu we kept that aside for the moment. But I think at some point we
> should also discuss if the project wants to align the versions [3].
>
> Thanks for the feedback.
>
> Cheers,
>
> [1] https://issues.apache.org/jira/browse/BEAM-378
> [2] https://github.com/apache/incubator-beam/pull/537
> [3] https://issues.apache.org/jira/browse/BEAM-547
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernandez@redlink.co
> w: http://redlink.co
>