You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Matteo Merli <mm...@apache.org> on 2022/09/19 23:25:20 UTC

[DISCUSS] PIP-209: Separate C++/Python clients to own repositories

https://github.com/apache/pulsar/issues/17724



## Motivation

Pulsar C++ code base is in the same main repository for the Pulsar project.

While the decision was the right one at the time, there is a
considerable overhead
in keeping the C++ client in its current position.

### Issues with the current approach

The Pulsar repository has grown a lot in size and number of active developers.

 1. The frequency of changes in various parts of the codebase has increased to a
    point where the amount of resources dedicated to CI is very significant.

    Every change in Java code will trigger the CI jobs for the C++
client and every
    change in the C++ client will do the same.

    During a CI job we are building the C++ client multiple times:
     1. For C++ and Python client tests
     2. To build Python wheels to be included in the pulsar Docker
images (for supporting
        Pulsar functions)

 2. The release process for Pulsar has become very complex and
requires building a
    large number of binaries for C++ and Python clients. This has
become too much
    of a burden during the course of a Pulsar release.


## Goal

Decouple the development of C++ and Python client libraries from the development
of the core components of Pulsar in Java.


## Changes

### Repositories

 1. Move the C++ client code to a new repository
`github.com/apache/pulsar-client-c++`
 2. Move the Python client code to a new repository
`github.com/apache/pulsar-client-python`

The change will be done without losing any history, extracting a
sub-directory into
a new Git repository.

```
git filter-repo --subdirectory-filter  pulsar-client-cpp
```

### Release process

The release process will be split in multiple parts:

 1. the main Pulsar release will only contain the Java parts (server
distribution
    and Java client library)
 2. The C++ client will have its own release schedule and versioning
 3. The Python client will have its own release schedule and versioning

#### Versioning

Both C++ and Python clients will continue with their own individual versioning.

In order to not break anything or cause more confusion, we would need to use
a new version that is bigger than the current version (2.11.x).

The suggestion is to start the new releases for both C++ and Python from 3.0.0.


#### Existing branches

Existing branches of Pulsar, where the C++ client will still be in the same main
the repository and will be receiving bug fixes in their current location.

The different location of the new C++ code will make the cherry-picking process
slightly more painful in the short term, though it will even out in long term.


### Projects dependencies

#### C++/Python --> Pulsar

Both C++ and Python unit/integration tests are designed to run against
a standalone
instance of Pulsar broker. In the current form, they're using the `master` code
that is built to run the tests.

After the split, the unit tests will use a Docker image of Pulsar. We
can use a few
different images to test for compatibility
 1. Latest stable (eg: 2.10.1)
 2. Nightly (Pulsar Docker image published every day from master branch)

#### Pulsar --> Python

To create a Pulsar image, we are now building the Python client wheel
file and then
installing it at build time.

Instead, we are going to include a wheel file for a version of the Python client
that has been already released.

#### Python --> C++

The Python client library is just a wrapper on top of the C++ client.
Today these
are built together, with Python wrapper code residing in a
sub-directory of C++ client
code, and compiled using the same CMake build script.

By separating the Python client into a different repository, we are going to
depend on an already released version of the C++ client.


#### Automated documentation in the website

On the Pulsar website we are auto-generating C++ documentation with the Doxygen
tool and the Python one with Pdoc.

Instead of just fetching the main repo code, the website build job should be
also fetching the new repos to run the tooling.






--
Matteo Merli
<mm...@apache.org>

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Dave Fisher <wa...@apache.org>.
Hi -

This is great and I agree with the plan except for one missing discussion.

The work needs to either include how to create documentation in the new repository and move it to the pulsar-site repository. (This might be better as a new PIP which could also include the go and node clients already in separate repositories.)

Best,
Dave

Sent from my iPhone

> On Sep 19, 2022, at 4:25 PM, Matteo Merli <mm...@apache.org> wrote:
> 
> https://github.com/apache/pulsar/issues/17724
> 
> 
> 
> ## Motivation
> 
> Pulsar C++ code base is in the same main repository for the Pulsar project.
> 
> While the decision was the right one at the time, there is a
> considerable overhead
> in keeping the C++ client in its current position.
> 
> ### Issues with the current approach
> 
> The Pulsar repository has grown a lot in size and number of active developers.
> 
> 1. The frequency of changes in various parts of the codebase has increased to a
>    point where the amount of resources dedicated to CI is very significant.
> 
>    Every change in Java code will trigger the CI jobs for the C++
> client and every
>    change in the C++ client will do the same.
> 
>    During a CI job we are building the C++ client multiple times:
>     1. For C++ and Python client tests
>     2. To build Python wheels to be included in the pulsar Docker
> images (for supporting
>        Pulsar functions)
> 
> 2. The release process for Pulsar has become very complex and
> requires building a
>    large number of binaries for C++ and Python clients. This has
> become too much
>    of a burden during the course of a Pulsar release.
> 
> 
> ## Goal
> 
> Decouple the development of C++ and Python client libraries from the development
> of the core components of Pulsar in Java.
> 
> 
> ## Changes
> 
> ### Repositories
> 
> 1. Move the C++ client code to a new repository
> `github.com/apache/pulsar-client-c++`
> 2. Move the Python client code to a new repository
> `github.com/apache/pulsar-client-python`
> 
> The change will be done without losing any history, extracting a
> sub-directory into
> a new Git repository.
> 
> ```
> git filter-repo --subdirectory-filter  pulsar-client-cpp
> ```
> 
> ### Release process
> 
> The release process will be split in multiple parts:
> 
> 1. the main Pulsar release will only contain the Java parts (server
> distribution
>    and Java client library)
> 2. The C++ client will have its own release schedule and versioning
> 3. The Python client will have its own release schedule and versioning
> 
> #### Versioning
> 
> Both C++ and Python clients will continue with their own individual versioning.
> 
> In order to not break anything or cause more confusion, we would need to use
> a new version that is bigger than the current version (2.11.x).
> 
> The suggestion is to start the new releases for both C++ and Python from 3.0.0.
> 
> 
> #### Existing branches
> 
> Existing branches of Pulsar, where the C++ client will still be in the same main
> the repository and will be receiving bug fixes in their current location.
> 
> The different location of the new C++ code will make the cherry-picking process
> slightly more painful in the short term, though it will even out in long term.
> 
> 
> ### Projects dependencies
> 
> #### C++/Python --> Pulsar
> 
> Both C++ and Python unit/integration tests are designed to run against
> a standalone
> instance of Pulsar broker. In the current form, they're using the `master` code
> that is built to run the tests.
> 
> After the split, the unit tests will use a Docker image of Pulsar. We
> can use a few
> different images to test for compatibility
> 1. Latest stable (eg: 2.10.1)
> 2. Nightly (Pulsar Docker image published every day from master branch)
> 
> #### Pulsar --> Python
> 
> To create a Pulsar image, we are now building the Python client wheel
> file and then
> installing it at build time.
> 
> Instead, we are going to include a wheel file for a version of the Python client
> that has been already released.
> 
> #### Python --> C++
> 
> The Python client library is just a wrapper on top of the C++ client.
> Today these
> are built together, with Python wrapper code residing in a
> sub-directory of C++ client
> code, and compiled using the same CMake build script.
> 
> By separating the Python client into a different repository, we are going to
> depend on an already released version of the C++ client.
> 
> 
> #### Automated documentation in the website
> 
> On the Pulsar website we are auto-generating C++ documentation with the Doxygen
> tool and the Python one with Pdoc.
> 
> Instead of just fetching the main repo code, the website build job should be
> also fetching the new repos to run the tooling.
> 
> 
> 
> 
> 
> 
> --
> Matteo Merli
> <mm...@apache.org>


Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Dave Fisher <wa...@comcast.net>.

Sent from my iPhone

> On Sep 19, 2022, at 7:29 PM, Baodi Shi <ba...@icloud.com.invalid> wrote:
> 
> Hi, @merlimat. Looks good to me. 
> 
>> Both C++ and Python clients will continue with their own individual versioning.
>> 
>> In order to not break anything or cause more confusion, we would need to use
>> a new version that is bigger than the current version (2.11.x).
>> 
>> The suggestion is to start the new releases for both C++ and Python from 3.0.0.
> 
> In the future, Can we need to maintain a compatibility list? For the user, how should he choose the appropriate client version to match the broker?

I think that this a question for maintaining documentation for clients including these Apache Pulsar provided clients.

- Java
- C++
- Python
- Node JS
- Go
- Spring Reactive

There could be a yaml schema defining the main Java client and each client could cover the portion that they cover. A matrix could be built in the pulsar-site build.

Maybe swagger files make more sense … again I see a parallel PIP.

Regards,
Dave
> 
>> 
>> #### Existing branches
>> 
>> Existing branches of Pulsar, where the C++ client will still be in the same main
>> the repository and will be receiving bug fixes in their current location.
>> 
>> The different location of the new C++ code will make the cherry-picking process
>> slightly more painful in the short term, though it will even out in long term.
> 
> The current existing issue, is need to move to the new repository?
> 
> Thanks,
> Baodi Shi
> 
>> On Sep 20, 2022, at 07:2520, Matteo Merli <mm...@apache.org> wrote:
>> 
>> https://github.com/apache/pulsar/issues/17724
>> 
>> 
>> 
>> ## Motivation
>> 
>> Pulsar C++ code base is in the same main repository for the Pulsar project.
>> 
>> While the decision was the right one at the time, there is a
>> considerable overhead
>> in keeping the C++ client in its current position.
>> 
>> ### Issues with the current approach
>> 
>> The Pulsar repository has grown a lot in size and number of active developers.
>> 
>> 1. The frequency of changes in various parts of the codebase has increased to a
>>   point where the amount of resources dedicated to CI is very significant.
>> 
>>   Every change in Java code will trigger the CI jobs for the C++
>> client and every
>>   change in the C++ client will do the same.
>> 
>>   During a CI job we are building the C++ client multiple times:
>>    1. For C++ and Python client tests
>>    2. To build Python wheels to be included in the pulsar Docker
>> images (for supporting
>>       Pulsar functions)
>> 
>> 2. The release process for Pulsar has become very complex and
>> requires building a
>>   large number of binaries for C++ and Python clients. This has
>> become too much
>>   of a burden during the course of a Pulsar release.
>> 
>> 
>> ## Goal
>> 
>> Decouple the development of C++ and Python client libraries from the development
>> of the core components of Pulsar in Java.
>> 
>> 
>> ## Changes
>> 
>> ### Repositories
>> 
>> 1. Move the C++ client code to a new repository
>> `github.com/apache/pulsar-client-c++`
>> 2. Move the Python client code to a new repository
>> `github.com/apache/pulsar-client-python`
>> 
>> The change will be done without losing any history, extracting a
>> sub-directory into
>> a new Git repository.
>> 
>> ```
>> git filter-repo --subdirectory-filter  pulsar-client-cpp
>> ```
>> 
>> ### Release process
>> 
>> The release process will be split in multiple parts:
>> 
>> 1. the main Pulsar release will only contain the Java parts (server
>> distribution
>>   and Java client library)
>> 2. The C++ client will have its own release schedule and versioning
>> 3. The Python client will have its own release schedule and versioning
>> 
>> #### Versioning
>> 
>> Both C++ and Python clients will continue with their own individual versioning.
>> 
>> In order to not break anything or cause more confusion, we would need to use
>> a new version that is bigger than the current version (2.11.x).
>> 
>> The suggestion is to start the new releases for both C++ and Python from 3.0.0.
>> 
>> 
>> #### Existing branches
>> 
>> Existing branches of Pulsar, where the C++ client will still be in the same main
>> the repository and will be receiving bug fixes in their current location.
>> 
>> The different location of the new C++ code will make the cherry-picking process
>> slightly more painful in the short term, though it will even out in long term.
>> 
>> 
>> ### Projects dependencies
>> 
>> #### C++/Python --> Pulsar
>> 
>> Both C++ and Python unit/integration tests are designed to run against
>> a standalone
>> instance of Pulsar broker. In the current form, they're using the `master` code
>> that is built to run the tests.
>> 
>> After the split, the unit tests will use a Docker image of Pulsar. We
>> can use a few
>> different images to test for compatibility
>> 1. Latest stable (eg: 2.10.1)
>> 2. Nightly (Pulsar Docker image published every day from master branch)
>> 
>> #### Pulsar --> Python
>> 
>> To create a Pulsar image, we are now building the Python client wheel
>> file and then
>> installing it at build time.
>> 
>> Instead, we are going to include a wheel file for a version of the Python client
>> that has been already released.
>> 
>> #### Python --> C++
>> 
>> The Python client library is just a wrapper on top of the C++ client.
>> Today these
>> are built together, with Python wrapper code residing in a
>> sub-directory of C++ client
>> code, and compiled using the same CMake build script.
>> 
>> By separating the Python client into a different repository, we are going to
>> depend on an already released version of the C++ client.
>> 
>> 
>> #### Automated documentation in the website
>> 
>> On the Pulsar website we are auto-generating C++ documentation with the Doxygen
>> tool and the Python one with Pdoc.
>> 
>> Instead of just fetching the main repo code, the website build job should be
>> also fetching the new repos to run the tooling.
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Matteo Merli
>> <mm...@apache.org>
> 


Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Dave Fisher <wa...@comcast.net>.
+1 (binding)

Sent from my iPhone

> On Sep 26, 2022, at 2:38 AM, Enrico Olivelli <eo...@gmail.com> wrote:
> 
> +100 to this !
> 
> Thanks for this proposal
> 
> Enrico
> 
>> Il giorno lun 26 set 2022 alle ore 10:04 Zike Yang <zi...@apache.org> ha scritto:
>> 
>> +1. Looks good to me.
>> 
>> Do we need to move the `Apache Pulsar / Multi Clients` Github project
>> to the corresponding repos?
>> 
>> Thanks,
>> Zike Yang
>> 
>> Zike Yang
>> 
>> 
>>> On Fri, Sep 23, 2022 at 7:22 AM Matteo Merli <ma...@gmail.com> wrote:
>>> 
>>> --
>>> Matteo Merli
>>> <ma...@gmail.com>
>>> 
>>> On Tue, Sep 20, 2022 at 8:14 PM Michael Marshall <mm...@apache.org> wrote:
>>>> 
>>>> Great proposal, thanks Matteo.
>>>> 
>>>> I think I agree with splitting out the client into two repos. One
>>>> issue is that new C++ features will lag in the python client because
>>>> the C++ client will first need to be released. Quick releases will
>>>> likely help there.
>>> 
>>> Yes, decoupling Java, C++ and Python releases will make each of them
>>> much easier.
>>> 
>>> We'll be able to do patch releases with a tenth of the manual work involved.
>>> 
>>>>> The client <--> broker compatibility is in general always guaranteed.
>>>> 
>>>> I think we should make this more visible in our Pulsar documentation.
>>>> It's a fantastic feature, and I get the sense that it is not well
>>>> known.
>>> 
>>> Agree, it's something that still surprises a lot of people.


Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Qiang Huang <qi...@gmail.com>.
+1

Zixuan Liu <no...@gmail.com> 于2022年9月26日周一 23:42写道:

> +1
>
> Thanks,
> Zixuan
>
> Anon Hxy <an...@gmail.com> 于2022年9月26日周一 17:52写道:
>
> > +1 LGTM
> >
> > Thanks,
> > Xiaoyu Hou
> >
> > Enrico Olivelli <eo...@gmail.com> 于2022年9月26日周一 17:39写道:
> >
> > > +100 to this !
> > >
> > > Thanks for this proposal
> > >
> > > Enrico
> > >
> > > Il giorno lun 26 set 2022 alle ore 10:04 Zike Yang <zi...@apache.org>
> ha
> > > scritto:
> > > >
> > > > +1. Looks good to me.
> > > >
> > > > Do we need to move the `Apache Pulsar / Multi Clients` Github project
> > > > to the corresponding repos?
> > > >
> > > > Thanks,
> > > > Zike Yang
> > > >
> > > > Zike Yang
> > > >
> > > >
> > > > On Fri, Sep 23, 2022 at 7:22 AM Matteo Merli <matteo.merli@gmail.com
> >
> > > wrote:
> > > > >
> > > > > --
> > > > > Matteo Merli
> > > > > <ma...@gmail.com>
> > > > >
> > > > > On Tue, Sep 20, 2022 at 8:14 PM Michael Marshall <
> > mmarshall@apache.org>
> > > wrote:
> > > > > >
> > > > > > Great proposal, thanks Matteo.
> > > > > >
> > > > > > I think I agree with splitting out the client into two repos. One
> > > > > > issue is that new C++ features will lag in the python client
> > because
> > > > > > the C++ client will first need to be released. Quick releases
> will
> > > > > > likely help there.
> > > > >
> > > > > Yes, decoupling Java, C++ and Python releases will make each of
> them
> > > > > much easier.
> > > > >
> > > > > We'll be able to do patch releases with a tenth of the manual work
> > > involved.
> > > > >
> > > > > > > The client <--> broker compatibility is in general always
> > > guaranteed.
> > > > > >
> > > > > > I think we should make this more visible in our Pulsar
> > documentation.
> > > > > > It's a fantastic feature, and I get the sense that it is not well
> > > > > > known.
> > > > >
> > > > > Agree, it's something that still surprises a lot of people.
> > >
> >
>


-- 
BR,
Qiang Huang

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Zixuan Liu <no...@gmail.com>.
+1

Thanks,
Zixuan

Anon Hxy <an...@gmail.com> 于2022年9月26日周一 17:52写道:

> +1 LGTM
>
> Thanks,
> Xiaoyu Hou
>
> Enrico Olivelli <eo...@gmail.com> 于2022年9月26日周一 17:39写道:
>
> > +100 to this !
> >
> > Thanks for this proposal
> >
> > Enrico
> >
> > Il giorno lun 26 set 2022 alle ore 10:04 Zike Yang <zi...@apache.org> ha
> > scritto:
> > >
> > > +1. Looks good to me.
> > >
> > > Do we need to move the `Apache Pulsar / Multi Clients` Github project
> > > to the corresponding repos?
> > >
> > > Thanks,
> > > Zike Yang
> > >
> > > Zike Yang
> > >
> > >
> > > On Fri, Sep 23, 2022 at 7:22 AM Matteo Merli <ma...@gmail.com>
> > wrote:
> > > >
> > > > --
> > > > Matteo Merli
> > > > <ma...@gmail.com>
> > > >
> > > > On Tue, Sep 20, 2022 at 8:14 PM Michael Marshall <
> mmarshall@apache.org>
> > wrote:
> > > > >
> > > > > Great proposal, thanks Matteo.
> > > > >
> > > > > I think I agree with splitting out the client into two repos. One
> > > > > issue is that new C++ features will lag in the python client
> because
> > > > > the C++ client will first need to be released. Quick releases will
> > > > > likely help there.
> > > >
> > > > Yes, decoupling Java, C++ and Python releases will make each of them
> > > > much easier.
> > > >
> > > > We'll be able to do patch releases with a tenth of the manual work
> > involved.
> > > >
> > > > > > The client <--> broker compatibility is in general always
> > guaranteed.
> > > > >
> > > > > I think we should make this more visible in our Pulsar
> documentation.
> > > > > It's a fantastic feature, and I get the sense that it is not well
> > > > > known.
> > > >
> > > > Agree, it's something that still surprises a lot of people.
> >
>

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Anon Hxy <an...@gmail.com>.
+1 LGTM

Thanks,
Xiaoyu Hou

Enrico Olivelli <eo...@gmail.com> 于2022年9月26日周一 17:39写道:

> +100 to this !
>
> Thanks for this proposal
>
> Enrico
>
> Il giorno lun 26 set 2022 alle ore 10:04 Zike Yang <zi...@apache.org> ha
> scritto:
> >
> > +1. Looks good to me.
> >
> > Do we need to move the `Apache Pulsar / Multi Clients` Github project
> > to the corresponding repos?
> >
> > Thanks,
> > Zike Yang
> >
> > Zike Yang
> >
> >
> > On Fri, Sep 23, 2022 at 7:22 AM Matteo Merli <ma...@gmail.com>
> wrote:
> > >
> > > --
> > > Matteo Merli
> > > <ma...@gmail.com>
> > >
> > > On Tue, Sep 20, 2022 at 8:14 PM Michael Marshall <mm...@apache.org>
> wrote:
> > > >
> > > > Great proposal, thanks Matteo.
> > > >
> > > > I think I agree with splitting out the client into two repos. One
> > > > issue is that new C++ features will lag in the python client because
> > > > the C++ client will first need to be released. Quick releases will
> > > > likely help there.
> > >
> > > Yes, decoupling Java, C++ and Python releases will make each of them
> > > much easier.
> > >
> > > We'll be able to do patch releases with a tenth of the manual work
> involved.
> > >
> > > > > The client <--> broker compatibility is in general always
> guaranteed.
> > > >
> > > > I think we should make this more visible in our Pulsar documentation.
> > > > It's a fantastic feature, and I get the sense that it is not well
> > > > known.
> > >
> > > Agree, it's something that still surprises a lot of people.
>

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Enrico Olivelli <eo...@gmail.com>.
+100 to this !

Thanks for this proposal

Enrico

Il giorno lun 26 set 2022 alle ore 10:04 Zike Yang <zi...@apache.org> ha scritto:
>
> +1. Looks good to me.
>
> Do we need to move the `Apache Pulsar / Multi Clients` Github project
> to the corresponding repos?
>
> Thanks,
> Zike Yang
>
> Zike Yang
>
>
> On Fri, Sep 23, 2022 at 7:22 AM Matteo Merli <ma...@gmail.com> wrote:
> >
> > --
> > Matteo Merli
> > <ma...@gmail.com>
> >
> > On Tue, Sep 20, 2022 at 8:14 PM Michael Marshall <mm...@apache.org> wrote:
> > >
> > > Great proposal, thanks Matteo.
> > >
> > > I think I agree with splitting out the client into two repos. One
> > > issue is that new C++ features will lag in the python client because
> > > the C++ client will first need to be released. Quick releases will
> > > likely help there.
> >
> > Yes, decoupling Java, C++ and Python releases will make each of them
> > much easier.
> >
> > We'll be able to do patch releases with a tenth of the manual work involved.
> >
> > > > The client <--> broker compatibility is in general always guaranteed.
> > >
> > > I think we should make this more visible in our Pulsar documentation.
> > > It's a fantastic feature, and I get the sense that it is not well
> > > known.
> >
> > Agree, it's something that still surprises a lot of people.

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Zike Yang <zi...@apache.org>.
+1. Looks good to me.

Do we need to move the `Apache Pulsar / Multi Clients` Github project
to the corresponding repos?

Thanks,
Zike Yang

Zike Yang


On Fri, Sep 23, 2022 at 7:22 AM Matteo Merli <ma...@gmail.com> wrote:
>
> --
> Matteo Merli
> <ma...@gmail.com>
>
> On Tue, Sep 20, 2022 at 8:14 PM Michael Marshall <mm...@apache.org> wrote:
> >
> > Great proposal, thanks Matteo.
> >
> > I think I agree with splitting out the client into two repos. One
> > issue is that new C++ features will lag in the python client because
> > the C++ client will first need to be released. Quick releases will
> > likely help there.
>
> Yes, decoupling Java, C++ and Python releases will make each of them
> much easier.
>
> We'll be able to do patch releases with a tenth of the manual work involved.
>
> > > The client <--> broker compatibility is in general always guaranteed.
> >
> > I think we should make this more visible in our Pulsar documentation.
> > It's a fantastic feature, and I get the sense that it is not well
> > known.
>
> Agree, it's something that still surprises a lot of people.

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Matteo Merli <ma...@gmail.com>.
--
Matteo Merli
<ma...@gmail.com>

On Tue, Sep 20, 2022 at 8:14 PM Michael Marshall <mm...@apache.org> wrote:
>
> Great proposal, thanks Matteo.
>
> I think I agree with splitting out the client into two repos. One
> issue is that new C++ features will lag in the python client because
> the C++ client will first need to be released. Quick releases will
> likely help there.

Yes, decoupling Java, C++ and Python releases will make each of them
much easier.

We'll be able to do patch releases with a tenth of the manual work involved.

> > The client <--> broker compatibility is in general always guaranteed.
>
> I think we should make this more visible in our Pulsar documentation.
> It's a fantastic feature, and I get the sense that it is not well
> known.

Agree, it's something that still surprises a lot of people.

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Michael Marshall <mm...@apache.org>.
Great proposal, thanks Matteo.

I think I agree with splitting out the client into two repos. One
issue is that new C++ features will lag in the python client because
the C++ client will first need to be released. Quick releases will
likely help there.

> Instead of just fetching the main repo code, the website build job should be
> also fetching the new repos to run the tooling.

The python documentation is generated by downloading the source
tarball for a release [0] and then running pandoc in a docker
container. The only change for the python docs will be to update where
the source tarball is downloaded from and to ensure the release
process is documented.

I've wanted to do something similar with the C++ docs, but I haven't
had a chance.

> The client <--> broker compatibility is in general always guaranteed.

I think we should make this more visible in our Pulsar documentation.
It's a fantastic feature, and I get the sense that it is not well
known.

Thanks,
Michael

[0] https://github.com/apache/pulsar-site/tree/main/site2/tools/api/python


On Tue, Sep 20, 2022 at 11:46 AM Matteo Merli <ma...@gmail.com> wrote:
>
> --
> Matteo Merli
> <ma...@gmail.com>
>
> On Mon, Sep 19, 2022 at 7:30 PM Baodi Shi <ba...@icloud.com.invalid> wrote:
> > > The suggestion is to start the new releases for both C++ and Python from 3.0.0.
> >
> > In the future, Can we need to maintain a compatibility list? For the user, how should he choose the appropriate client version to match the broker?
>
> That would be very similar to other languages as well. The client <-->
> broker compatibility is in general always guaranteed.
>
> The only thing to be aware of is with respect to particular features.
> eg: if I want to use feature X I need to make sure that the broker is
> at a version that supports it, as well as the client.
>
> Today, based on the lag with which the features are introduced in
> clients other than Java, using a C++ client for 2.8.4 does not
> guarantee that a feature supported by broker is available in the same
> version of the client.
>
> > > The different location of the new C++ code will make the cherry-picking process
> > > slightly more painful in the short term, though it will even out in long term.
> >
> > The current existing issue, is need to move to the new repository?
>
> Good question, we could use a scripts that creates issues in the new
> repo linking back to the old repo issue. (everything that's currently
> tagged with 'cpp' label).

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Matteo Merli <ma...@gmail.com>.
--
Matteo Merli
<ma...@gmail.com>

On Mon, Sep 19, 2022 at 7:30 PM Baodi Shi <ba...@icloud.com.invalid> wrote:
> > The suggestion is to start the new releases for both C++ and Python from 3.0.0.
>
> In the future, Can we need to maintain a compatibility list? For the user, how should he choose the appropriate client version to match the broker?

That would be very similar to other languages as well. The client <-->
broker compatibility is in general always guaranteed.

The only thing to be aware of is with respect to particular features.
eg: if I want to use feature X I need to make sure that the broker is
at a version that supports it, as well as the client.

Today, based on the lag with which the features are introduced in
clients other than Java, using a C++ client for 2.8.4 does not
guarantee that a feature supported by broker is available in the same
version of the client.

> > The different location of the new C++ code will make the cherry-picking process
> > slightly more painful in the short term, though it will even out in long term.
>
> The current existing issue, is need to move to the new repository?

Good question, we could use a scripts that creates issues in the new
repo linking back to the old repo issue. (everything that's currently
tagged with 'cpp' label).

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Baodi Shi <ba...@icloud.com.INVALID>.
Hi, @merlimat. Looks good to me. 

> Both C++ and Python clients will continue with their own individual versioning.
> 
> In order to not break anything or cause more confusion, we would need to use
> a new version that is bigger than the current version (2.11.x).
> 
> The suggestion is to start the new releases for both C++ and Python from 3.0.0.

In the future, Can we need to maintain a compatibility list? For the user, how should he choose the appropriate client version to match the broker?

> 
> #### Existing branches
> 
> Existing branches of Pulsar, where the C++ client will still be in the same main
> the repository and will be receiving bug fixes in their current location.
> 
> The different location of the new C++ code will make the cherry-picking process
> slightly more painful in the short term, though it will even out in long term.

The current existing issue, is need to move to the new repository?

Thanks,
Baodi Shi

> On Sep 20, 2022, at 07:2520, Matteo Merli <mm...@apache.org> wrote:
> 
> https://github.com/apache/pulsar/issues/17724
> 
> 
> 
> ## Motivation
> 
> Pulsar C++ code base is in the same main repository for the Pulsar project.
> 
> While the decision was the right one at the time, there is a
> considerable overhead
> in keeping the C++ client in its current position.
> 
> ### Issues with the current approach
> 
> The Pulsar repository has grown a lot in size and number of active developers.
> 
> 1. The frequency of changes in various parts of the codebase has increased to a
>    point where the amount of resources dedicated to CI is very significant.
> 
>    Every change in Java code will trigger the CI jobs for the C++
> client and every
>    change in the C++ client will do the same.
> 
>    During a CI job we are building the C++ client multiple times:
>     1. For C++ and Python client tests
>     2. To build Python wheels to be included in the pulsar Docker
> images (for supporting
>        Pulsar functions)
> 
> 2. The release process for Pulsar has become very complex and
> requires building a
>    large number of binaries for C++ and Python clients. This has
> become too much
>    of a burden during the course of a Pulsar release.
> 
> 
> ## Goal
> 
> Decouple the development of C++ and Python client libraries from the development
> of the core components of Pulsar in Java.
> 
> 
> ## Changes
> 
> ### Repositories
> 
> 1. Move the C++ client code to a new repository
> `github.com/apache/pulsar-client-c++`
> 2. Move the Python client code to a new repository
> `github.com/apache/pulsar-client-python`
> 
> The change will be done without losing any history, extracting a
> sub-directory into
> a new Git repository.
> 
> ```
> git filter-repo --subdirectory-filter  pulsar-client-cpp
> ```
> 
> ### Release process
> 
> The release process will be split in multiple parts:
> 
> 1. the main Pulsar release will only contain the Java parts (server
> distribution
>    and Java client library)
> 2. The C++ client will have its own release schedule and versioning
> 3. The Python client will have its own release schedule and versioning
> 
> #### Versioning
> 
> Both C++ and Python clients will continue with their own individual versioning.
> 
> In order to not break anything or cause more confusion, we would need to use
> a new version that is bigger than the current version (2.11.x).
> 
> The suggestion is to start the new releases for both C++ and Python from 3.0.0.
> 
> 
> #### Existing branches
> 
> Existing branches of Pulsar, where the C++ client will still be in the same main
> the repository and will be receiving bug fixes in their current location.
> 
> The different location of the new C++ code will make the cherry-picking process
> slightly more painful in the short term, though it will even out in long term.
> 
> 
> ### Projects dependencies
> 
> #### C++/Python --> Pulsar
> 
> Both C++ and Python unit/integration tests are designed to run against
> a standalone
> instance of Pulsar broker. In the current form, they're using the `master` code
> that is built to run the tests.
> 
> After the split, the unit tests will use a Docker image of Pulsar. We
> can use a few
> different images to test for compatibility
> 1. Latest stable (eg: 2.10.1)
> 2. Nightly (Pulsar Docker image published every day from master branch)
> 
> #### Pulsar --> Python
> 
> To create a Pulsar image, we are now building the Python client wheel
> file and then
> installing it at build time.
> 
> Instead, we are going to include a wheel file for a version of the Python client
> that has been already released.
> 
> #### Python --> C++
> 
> The Python client library is just a wrapper on top of the C++ client.
> Today these
> are built together, with Python wrapper code residing in a
> sub-directory of C++ client
> code, and compiled using the same CMake build script.
> 
> By separating the Python client into a different repository, we are going to
> depend on an already released version of the C++ client.
> 
> 
> #### Automated documentation in the website
> 
> On the Pulsar website we are auto-generating C++ documentation with the Doxygen
> tool and the Python one with Pdoc.
> 
> Instead of just fetching the main repo code, the website build job should be
> also fetching the new repos to run the tooling.
> 
> 
> 
> 
> 
> 
> --
> Matteo Merli
> <mm...@apache.org>


Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by tison <wa...@gmail.com>.
I agree that we need to work on a more comprehensive documentation-building
process. Perhaps let's start with removing old code
https://github.com/apache/pulsar-site/pull/215. I'd say it needs a code
bash to sort current logic - it's somewhat brittle depending on relative
path, coupled shell scripts, etc. And yes, it's better for a separate
proposal.

Best,
tison.


tison <wa...@gmail.com> 于2022年9月20日周二 16:35写道:

> Hi Yunze,
>
> > Just wondering if there is a way to retain the git history in the
> pulsar-client-cpp directory?
>
> Matteo's proposal already write:
>
> > git filter-repo --subdirectory-filter  pulsar-client-cpp
>
> So you will retain the git history.
>
> Best,
> tison.
>
>
> Yunze Xu <yz...@streamnative.io.invalid> 于2022年9月20日周二 16:27写道:
>
>> LGTM. I also listed the related files outside the pulsar-client-cpp
>> directory recently:
>>
>> - pulsar-common/src/main/proto/PulsarApi.proto: the Pulsar binary
>>   proto file
>> - src/gen-pulsar-version-macro.py: generate the internal version info
>> - pulsar-client/src/test/proto/*.proto: test the protobuf native
>>   schema feature
>>
>> It would not be a complicated job for that. Just wondering if there is
>> a way to retain the git history in the pulsar-client-cpp directory?
>>
>> Thanks,
>> Yunze
>>
>>
>>
>>
>> > On Sep 20, 2022, at 07:25, Matteo Merli <mm...@apache.org> wrote:
>> >
>> > https://github.com/apache/pulsar/issues/17724
>> >
>> >
>> >
>> > ## Motivation
>> >
>> > Pulsar C++ code base is in the same main repository for the Pulsar
>> project.
>> >
>> > While the decision was the right one at the time, there is a
>> > considerable overhead
>> > in keeping the C++ client in its current position.
>> >
>> > ### Issues with the current approach
>> >
>> > The Pulsar repository has grown a lot in size and number of active
>> developers.
>> >
>> > 1. The frequency of changes in various parts of the codebase has
>> increased to a
>> >    point where the amount of resources dedicated to CI is very
>> significant.
>> >
>> >    Every change in Java code will trigger the CI jobs for the C++
>> > client and every
>> >    change in the C++ client will do the same.
>> >
>> >    During a CI job we are building the C++ client multiple times:
>> >     1. For C++ and Python client tests
>> >     2. To build Python wheels to be included in the pulsar Docker
>> > images (for supporting
>> >        Pulsar functions)
>> >
>> > 2. The release process for Pulsar has become very complex and
>> > requires building a
>> >    large number of binaries for C++ and Python clients. This has
>> > become too much
>> >    of a burden during the course of a Pulsar release.
>> >
>> >
>> > ## Goal
>> >
>> > Decouple the development of C++ and Python client libraries from the
>> development
>> > of the core components of Pulsar in Java.
>> >
>> >
>> > ## Changes
>> >
>> > ### Repositories
>> >
>> > 1. Move the C++ client code to a new repository
>> > `github.com/apache/pulsar-client-c++`
>> <http://github.com/apache/pulsar-client-c++>
>> > 2. Move the Python client code to a new repository
>> > `github.com/apache/pulsar-client-python`
>> <http://github.com/apache/pulsar-client-python>
>> >
>> > The change will be done without losing any history, extracting a
>> > sub-directory into
>> > a new Git repository.
>> >
>> > ```
>> > git filter-repo --subdirectory-filter  pulsar-client-cpp
>> > ```
>> >
>> > ### Release process
>> >
>> > The release process will be split in multiple parts:
>> >
>> > 1. the main Pulsar release will only contain the Java parts (server
>> > distribution
>> >    and Java client library)
>> > 2. The C++ client will have its own release schedule and versioning
>> > 3. The Python client will have its own release schedule and versioning
>> >
>> > #### Versioning
>> >
>> > Both C++ and Python clients will continue with their own individual
>> versioning.
>> >
>> > In order to not break anything or cause more confusion, we would need
>> to use
>> > a new version that is bigger than the current version (2.11.x).
>> >
>> > The suggestion is to start the new releases for both C++ and Python
>> from 3.0.0.
>> >
>> >
>> > #### Existing branches
>> >
>> > Existing branches of Pulsar, where the C++ client will still be in the
>> same main
>> > the repository and will be receiving bug fixes in their current
>> location.
>> >
>> > The different location of the new C++ code will make the cherry-picking
>> process
>> > slightly more painful in the short term, though it will even out in
>> long term.
>> >
>> >
>> > ### Projects dependencies
>> >
>> > #### C++/Python --> Pulsar
>> >
>> > Both C++ and Python unit/integration tests are designed to run against
>> > a standalone
>> > instance of Pulsar broker. In the current form, they're using the
>> `master` code
>> > that is built to run the tests.
>> >
>> > After the split, the unit tests will use a Docker image of Pulsar. We
>> > can use a few
>> > different images to test for compatibility
>> > 1. Latest stable (eg: 2.10.1)
>> > 2. Nightly (Pulsar Docker image published every day from master branch)
>> >
>> > #### Pulsar --> Python
>> >
>> > To create a Pulsar image, we are now building the Python client wheel
>> > file and then
>> > installing it at build time.
>> >
>> > Instead, we are going to include a wheel file for a version of the
>> Python client
>> > that has been already released.
>> >
>> > #### Python --> C++
>> >
>> > The Python client library is just a wrapper on top of the C++ client.
>> > Today these
>> > are built together, with Python wrapper code residing in a
>> > sub-directory of C++ client
>> > code, and compiled using the same CMake build script.
>> >
>> > By separating the Python client into a different repository, we are
>> going to
>> > depend on an already released version of the C++ client.
>> >
>> >
>> > #### Automated documentation in the website
>> >
>> > On the Pulsar website we are auto-generating C++ documentation with the
>> Doxygen
>> > tool and the Python one with Pdoc.
>> >
>> > Instead of just fetching the main repo code, the website build job
>> should be
>> > also fetching the new repos to run the tooling.
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Matteo Merli
>> > <mm...@apache.org>
>>
>>

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Matteo Merli <ma...@gmail.com>.
On Tue, Sep 20, 2022 at 9:32 AM Yunze Xu <yz...@streamnative.io.invalid> wrote:
>
> Hi Tison & Matteo,
>
> I noticed a comment here:
> https://github.com/apache/pulsar/pull/17733#discussion_r975552686
>
> Currently the Python Functions relies on the Python client. I see there are
> still some Python Function examples in pulsar-functions/python-examples
> directory. How should we deal with these examples? And should we still
> Include the Python Functions in the Docker images?

Yes, in the proposal I did add a section for this. Instead of building
the C++/Python
client when building the Pulsar docker image, we are going to install a
Python wheel file that is already released and available.

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.
Hi Tison & Matteo,

I noticed a comment here:
https://github.com/apache/pulsar/pull/17733#discussion_r975552686

Currently the Python Functions relies on the Python client. I see there are 
still some Python Function examples in pulsar-functions/python-examples
directory. How should we deal with these examples? And should we still
Include the Python Functions in the Docker images?

Thanks,
Yunze




> On Sep 20, 2022, at 23:40, tison <wa...@gmail.com> wrote:
> 
> One more thing to mention here:
> 
> Currently, Pulsar Docker Image bundles C++ client and Python client, and
> from my perspective, the image is mainly used as a server, perhaps we can
> remove these clients from bundling.
> 
> Best,
> tison.
> 
> 
> Yunze Xu <yz...@streamnative.io.invalid> 于2022年9月20日周二 16:50写道:
> 
>> Hi Tison,
>> 
>> Sorry I just missed that. Thanks for your reminder.
>> 
>> Thanks,
>> Yunze
>> 
>> 
>> 
>> 
>>> On Sep 20, 2022, at 16:35, tison <wa...@gmail.com> wrote:
>>> 
>>> Hi Yunze,
>>> 
>>>> Just wondering if there is a way to retain the git history in the
>>> pulsar-client-cpp directory?
>>> 
>>> Matteo's proposal already write:
>>> 
>>>> git filter-repo --subdirectory-filter  pulsar-client-cpp
>>> 
>>> So you will retain the git history.
>>> 
>>> Best,
>>> tison.
>>> 
>>> 
>>> Yunze Xu <yz...@streamnative.io.invalid> 于2022年9月20日周二 16:27写道:
>>> 
>>>> LGTM. I also listed the related files outside the pulsar-client-cpp
>>>> directory recently:
>>>> 
>>>> - pulsar-common/src/main/proto/PulsarApi.proto: the Pulsar binary
>>>> proto file
>>>> - src/gen-pulsar-version-macro.py: generate the internal version info
>>>> - pulsar-client/src/test/proto/*.proto: test the protobuf native
>>>> schema feature
>>>> 
>>>> It would not be a complicated job for that. Just wondering if there is
>>>> a way to retain the git history in the pulsar-client-cpp directory?
>>>> 
>>>> Thanks,
>>>> Yunze
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Sep 20, 2022, at 07:25, Matteo Merli <mm...@apache.org> wrote:
>>>>> 
>>>>> https://github.com/apache/pulsar/issues/17724
>>>>> 
>>>>> 
>>>>> 
>>>>> ## Motivation
>>>>> 
>>>>> Pulsar C++ code base is in the same main repository for the Pulsar
>>>> project.
>>>>> 
>>>>> While the decision was the right one at the time, there is a
>>>>> considerable overhead
>>>>> in keeping the C++ client in its current position.
>>>>> 
>>>>> ### Issues with the current approach
>>>>> 
>>>>> The Pulsar repository has grown a lot in size and number of active
>>>> developers.
>>>>> 
>>>>> 1. The frequency of changes in various parts of the codebase has
>>>> increased to a
>>>>>  point where the amount of resources dedicated to CI is very
>>>> significant.
>>>>> 
>>>>>  Every change in Java code will trigger the CI jobs for the C++
>>>>> client and every
>>>>>  change in the C++ client will do the same.
>>>>> 
>>>>>  During a CI job we are building the C++ client multiple times:
>>>>>   1. For C++ and Python client tests
>>>>>   2. To build Python wheels to be included in the pulsar Docker
>>>>> images (for supporting
>>>>>      Pulsar functions)
>>>>> 
>>>>> 2. The release process for Pulsar has become very complex and
>>>>> requires building a
>>>>>  large number of binaries for C++ and Python clients. This has
>>>>> become too much
>>>>>  of a burden during the course of a Pulsar release.
>>>>> 
>>>>> 
>>>>> ## Goal
>>>>> 
>>>>> Decouple the development of C++ and Python client libraries from the
>>>> development
>>>>> of the core components of Pulsar in Java.
>>>>> 
>>>>> 
>>>>> ## Changes
>>>>> 
>>>>> ### Repositories
>>>>> 
>>>>> 1. Move the C++ client code to a new repository
>>>>> `github.com/apache/pulsar-client-c++`
>> <http://github.com/apache/pulsar-client-c++>
>>>> <http://github.com/apache/pulsar-client-c++>
>>>>> 2. Move the Python client code to a new repository
>>>>> `github.com/apache/pulsar-client-python`
>> <http://github.com/apache/pulsar-client-python>
>>>> <http://github.com/apache/pulsar-client-python>
>>>>> 
>>>>> The change will be done without losing any history, extracting a
>>>>> sub-directory into
>>>>> a new Git repository.
>>>>> 
>>>>> ```
>>>>> git filter-repo --subdirectory-filter  pulsar-client-cpp
>>>>> ```
>>>>> 
>>>>> ### Release process
>>>>> 
>>>>> The release process will be split in multiple parts:
>>>>> 
>>>>> 1. the main Pulsar release will only contain the Java parts (server
>>>>> distribution
>>>>>  and Java client library)
>>>>> 2. The C++ client will have its own release schedule and versioning
>>>>> 3. The Python client will have its own release schedule and versioning
>>>>> 
>>>>> #### Versioning
>>>>> 
>>>>> Both C++ and Python clients will continue with their own individual
>>>> versioning.
>>>>> 
>>>>> In order to not break anything or cause more confusion, we would need
>> to
>>>> use
>>>>> a new version that is bigger than the current version (2.11.x).
>>>>> 
>>>>> The suggestion is to start the new releases for both C++ and Python
>> from
>>>> 3.0.0.
>>>>> 
>>>>> 
>>>>> #### Existing branches
>>>>> 
>>>>> Existing branches of Pulsar, where the C++ client will still be in the
>>>> same main
>>>>> the repository and will be receiving bug fixes in their current
>> location.
>>>>> 
>>>>> The different location of the new C++ code will make the cherry-picking
>>>> process
>>>>> slightly more painful in the short term, though it will even out in
>> long
>>>> term.
>>>>> 
>>>>> 
>>>>> ### Projects dependencies
>>>>> 
>>>>> #### C++/Python --> Pulsar
>>>>> 
>>>>> Both C++ and Python unit/integration tests are designed to run against
>>>>> a standalone
>>>>> instance of Pulsar broker. In the current form, they're using the
>>>> `master` code
>>>>> that is built to run the tests.
>>>>> 
>>>>> After the split, the unit tests will use a Docker image of Pulsar. We
>>>>> can use a few
>>>>> different images to test for compatibility
>>>>> 1. Latest stable (eg: 2.10.1)
>>>>> 2. Nightly (Pulsar Docker image published every day from master branch)
>>>>> 
>>>>> #### Pulsar --> Python
>>>>> 
>>>>> To create a Pulsar image, we are now building the Python client wheel
>>>>> file and then
>>>>> installing it at build time.
>>>>> 
>>>>> Instead, we are going to include a wheel file for a version of the
>>>> Python client
>>>>> that has been already released.
>>>>> 
>>>>> #### Python --> C++
>>>>> 
>>>>> The Python client library is just a wrapper on top of the C++ client.
>>>>> Today these
>>>>> are built together, with Python wrapper code residing in a
>>>>> sub-directory of C++ client
>>>>> code, and compiled using the same CMake build script.
>>>>> 
>>>>> By separating the Python client into a different repository, we are
>>>> going to
>>>>> depend on an already released version of the C++ client.
>>>>> 
>>>>> 
>>>>> #### Automated documentation in the website
>>>>> 
>>>>> On the Pulsar website we are auto-generating C++ documentation with the
>>>> Doxygen
>>>>> tool and the Python one with Pdoc.
>>>>> 
>>>>> Instead of just fetching the main repo code, the website build job
>>>> should be
>>>>> also fetching the new repos to run the tooling.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Matteo Merli
>>>>> <mm...@apache.org>


Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by tison <wa...@gmail.com>.
One more thing to mention here:

Currently, Pulsar Docker Image bundles C++ client and Python client, and
from my perspective, the image is mainly used as a server, perhaps we can
remove these clients from bundling.

Best,
tison.


Yunze Xu <yz...@streamnative.io.invalid> 于2022年9月20日周二 16:50写道:

> Hi Tison,
>
> Sorry I just missed that. Thanks for your reminder.
>
> Thanks,
> Yunze
>
>
>
>
> > On Sep 20, 2022, at 16:35, tison <wa...@gmail.com> wrote:
> >
> > Hi Yunze,
> >
> >> Just wondering if there is a way to retain the git history in the
> > pulsar-client-cpp directory?
> >
> > Matteo's proposal already write:
> >
> >> git filter-repo --subdirectory-filter  pulsar-client-cpp
> >
> > So you will retain the git history.
> >
> > Best,
> > tison.
> >
> >
> > Yunze Xu <yz...@streamnative.io.invalid> 于2022年9月20日周二 16:27写道:
> >
> >> LGTM. I also listed the related files outside the pulsar-client-cpp
> >> directory recently:
> >>
> >> - pulsar-common/src/main/proto/PulsarApi.proto: the Pulsar binary
> >>  proto file
> >> - src/gen-pulsar-version-macro.py: generate the internal version info
> >> - pulsar-client/src/test/proto/*.proto: test the protobuf native
> >>  schema feature
> >>
> >> It would not be a complicated job for that. Just wondering if there is
> >> a way to retain the git history in the pulsar-client-cpp directory?
> >>
> >> Thanks,
> >> Yunze
> >>
> >>
> >>
> >>
> >>> On Sep 20, 2022, at 07:25, Matteo Merli <mm...@apache.org> wrote:
> >>>
> >>> https://github.com/apache/pulsar/issues/17724
> >>>
> >>>
> >>>
> >>> ## Motivation
> >>>
> >>> Pulsar C++ code base is in the same main repository for the Pulsar
> >> project.
> >>>
> >>> While the decision was the right one at the time, there is a
> >>> considerable overhead
> >>> in keeping the C++ client in its current position.
> >>>
> >>> ### Issues with the current approach
> >>>
> >>> The Pulsar repository has grown a lot in size and number of active
> >> developers.
> >>>
> >>> 1. The frequency of changes in various parts of the codebase has
> >> increased to a
> >>>   point where the amount of resources dedicated to CI is very
> >> significant.
> >>>
> >>>   Every change in Java code will trigger the CI jobs for the C++
> >>> client and every
> >>>   change in the C++ client will do the same.
> >>>
> >>>   During a CI job we are building the C++ client multiple times:
> >>>    1. For C++ and Python client tests
> >>>    2. To build Python wheels to be included in the pulsar Docker
> >>> images (for supporting
> >>>       Pulsar functions)
> >>>
> >>> 2. The release process for Pulsar has become very complex and
> >>> requires building a
> >>>   large number of binaries for C++ and Python clients. This has
> >>> become too much
> >>>   of a burden during the course of a Pulsar release.
> >>>
> >>>
> >>> ## Goal
> >>>
> >>> Decouple the development of C++ and Python client libraries from the
> >> development
> >>> of the core components of Pulsar in Java.
> >>>
> >>>
> >>> ## Changes
> >>>
> >>> ### Repositories
> >>>
> >>> 1. Move the C++ client code to a new repository
> >>> `github.com/apache/pulsar-client-c++`
> <http://github.com/apache/pulsar-client-c++>
> >> <http://github.com/apache/pulsar-client-c++>
> >>> 2. Move the Python client code to a new repository
> >>> `github.com/apache/pulsar-client-python`
> <http://github.com/apache/pulsar-client-python>
> >> <http://github.com/apache/pulsar-client-python>
> >>>
> >>> The change will be done without losing any history, extracting a
> >>> sub-directory into
> >>> a new Git repository.
> >>>
> >>> ```
> >>> git filter-repo --subdirectory-filter  pulsar-client-cpp
> >>> ```
> >>>
> >>> ### Release process
> >>>
> >>> The release process will be split in multiple parts:
> >>>
> >>> 1. the main Pulsar release will only contain the Java parts (server
> >>> distribution
> >>>   and Java client library)
> >>> 2. The C++ client will have its own release schedule and versioning
> >>> 3. The Python client will have its own release schedule and versioning
> >>>
> >>> #### Versioning
> >>>
> >>> Both C++ and Python clients will continue with their own individual
> >> versioning.
> >>>
> >>> In order to not break anything or cause more confusion, we would need
> to
> >> use
> >>> a new version that is bigger than the current version (2.11.x).
> >>>
> >>> The suggestion is to start the new releases for both C++ and Python
> from
> >> 3.0.0.
> >>>
> >>>
> >>> #### Existing branches
> >>>
> >>> Existing branches of Pulsar, where the C++ client will still be in the
> >> same main
> >>> the repository and will be receiving bug fixes in their current
> location.
> >>>
> >>> The different location of the new C++ code will make the cherry-picking
> >> process
> >>> slightly more painful in the short term, though it will even out in
> long
> >> term.
> >>>
> >>>
> >>> ### Projects dependencies
> >>>
> >>> #### C++/Python --> Pulsar
> >>>
> >>> Both C++ and Python unit/integration tests are designed to run against
> >>> a standalone
> >>> instance of Pulsar broker. In the current form, they're using the
> >> `master` code
> >>> that is built to run the tests.
> >>>
> >>> After the split, the unit tests will use a Docker image of Pulsar. We
> >>> can use a few
> >>> different images to test for compatibility
> >>> 1. Latest stable (eg: 2.10.1)
> >>> 2. Nightly (Pulsar Docker image published every day from master branch)
> >>>
> >>> #### Pulsar --> Python
> >>>
> >>> To create a Pulsar image, we are now building the Python client wheel
> >>> file and then
> >>> installing it at build time.
> >>>
> >>> Instead, we are going to include a wheel file for a version of the
> >> Python client
> >>> that has been already released.
> >>>
> >>> #### Python --> C++
> >>>
> >>> The Python client library is just a wrapper on top of the C++ client.
> >>> Today these
> >>> are built together, with Python wrapper code residing in a
> >>> sub-directory of C++ client
> >>> code, and compiled using the same CMake build script.
> >>>
> >>> By separating the Python client into a different repository, we are
> >> going to
> >>> depend on an already released version of the C++ client.
> >>>
> >>>
> >>> #### Automated documentation in the website
> >>>
> >>> On the Pulsar website we are auto-generating C++ documentation with the
> >> Doxygen
> >>> tool and the Python one with Pdoc.
> >>>
> >>> Instead of just fetching the main repo code, the website build job
> >> should be
> >>> also fetching the new repos to run the tooling.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Matteo Merli
> >>> <mm...@apache.org>
> >>
> >>
>
>

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.
Hi Tison,

Sorry I just missed that. Thanks for your reminder.

Thanks,
Yunze




> On Sep 20, 2022, at 16:35, tison <wa...@gmail.com> wrote:
> 
> Hi Yunze,
> 
>> Just wondering if there is a way to retain the git history in the
> pulsar-client-cpp directory?
> 
> Matteo's proposal already write:
> 
>> git filter-repo --subdirectory-filter  pulsar-client-cpp
> 
> So you will retain the git history.
> 
> Best,
> tison.
> 
> 
> Yunze Xu <yz...@streamnative.io.invalid> 于2022年9月20日周二 16:27写道:
> 
>> LGTM. I also listed the related files outside the pulsar-client-cpp
>> directory recently:
>> 
>> - pulsar-common/src/main/proto/PulsarApi.proto: the Pulsar binary
>>  proto file
>> - src/gen-pulsar-version-macro.py: generate the internal version info
>> - pulsar-client/src/test/proto/*.proto: test the protobuf native
>>  schema feature
>> 
>> It would not be a complicated job for that. Just wondering if there is
>> a way to retain the git history in the pulsar-client-cpp directory?
>> 
>> Thanks,
>> Yunze
>> 
>> 
>> 
>> 
>>> On Sep 20, 2022, at 07:25, Matteo Merli <mm...@apache.org> wrote:
>>> 
>>> https://github.com/apache/pulsar/issues/17724
>>> 
>>> 
>>> 
>>> ## Motivation
>>> 
>>> Pulsar C++ code base is in the same main repository for the Pulsar
>> project.
>>> 
>>> While the decision was the right one at the time, there is a
>>> considerable overhead
>>> in keeping the C++ client in its current position.
>>> 
>>> ### Issues with the current approach
>>> 
>>> The Pulsar repository has grown a lot in size and number of active
>> developers.
>>> 
>>> 1. The frequency of changes in various parts of the codebase has
>> increased to a
>>>   point where the amount of resources dedicated to CI is very
>> significant.
>>> 
>>>   Every change in Java code will trigger the CI jobs for the C++
>>> client and every
>>>   change in the C++ client will do the same.
>>> 
>>>   During a CI job we are building the C++ client multiple times:
>>>    1. For C++ and Python client tests
>>>    2. To build Python wheels to be included in the pulsar Docker
>>> images (for supporting
>>>       Pulsar functions)
>>> 
>>> 2. The release process for Pulsar has become very complex and
>>> requires building a
>>>   large number of binaries for C++ and Python clients. This has
>>> become too much
>>>   of a burden during the course of a Pulsar release.
>>> 
>>> 
>>> ## Goal
>>> 
>>> Decouple the development of C++ and Python client libraries from the
>> development
>>> of the core components of Pulsar in Java.
>>> 
>>> 
>>> ## Changes
>>> 
>>> ### Repositories
>>> 
>>> 1. Move the C++ client code to a new repository
>>> `github.com/apache/pulsar-client-c++`
>> <http://github.com/apache/pulsar-client-c++>
>>> 2. Move the Python client code to a new repository
>>> `github.com/apache/pulsar-client-python`
>> <http://github.com/apache/pulsar-client-python>
>>> 
>>> The change will be done without losing any history, extracting a
>>> sub-directory into
>>> a new Git repository.
>>> 
>>> ```
>>> git filter-repo --subdirectory-filter  pulsar-client-cpp
>>> ```
>>> 
>>> ### Release process
>>> 
>>> The release process will be split in multiple parts:
>>> 
>>> 1. the main Pulsar release will only contain the Java parts (server
>>> distribution
>>>   and Java client library)
>>> 2. The C++ client will have its own release schedule and versioning
>>> 3. The Python client will have its own release schedule and versioning
>>> 
>>> #### Versioning
>>> 
>>> Both C++ and Python clients will continue with their own individual
>> versioning.
>>> 
>>> In order to not break anything or cause more confusion, we would need to
>> use
>>> a new version that is bigger than the current version (2.11.x).
>>> 
>>> The suggestion is to start the new releases for both C++ and Python from
>> 3.0.0.
>>> 
>>> 
>>> #### Existing branches
>>> 
>>> Existing branches of Pulsar, where the C++ client will still be in the
>> same main
>>> the repository and will be receiving bug fixes in their current location.
>>> 
>>> The different location of the new C++ code will make the cherry-picking
>> process
>>> slightly more painful in the short term, though it will even out in long
>> term.
>>> 
>>> 
>>> ### Projects dependencies
>>> 
>>> #### C++/Python --> Pulsar
>>> 
>>> Both C++ and Python unit/integration tests are designed to run against
>>> a standalone
>>> instance of Pulsar broker. In the current form, they're using the
>> `master` code
>>> that is built to run the tests.
>>> 
>>> After the split, the unit tests will use a Docker image of Pulsar. We
>>> can use a few
>>> different images to test for compatibility
>>> 1. Latest stable (eg: 2.10.1)
>>> 2. Nightly (Pulsar Docker image published every day from master branch)
>>> 
>>> #### Pulsar --> Python
>>> 
>>> To create a Pulsar image, we are now building the Python client wheel
>>> file and then
>>> installing it at build time.
>>> 
>>> Instead, we are going to include a wheel file for a version of the
>> Python client
>>> that has been already released.
>>> 
>>> #### Python --> C++
>>> 
>>> The Python client library is just a wrapper on top of the C++ client.
>>> Today these
>>> are built together, with Python wrapper code residing in a
>>> sub-directory of C++ client
>>> code, and compiled using the same CMake build script.
>>> 
>>> By separating the Python client into a different repository, we are
>> going to
>>> depend on an already released version of the C++ client.
>>> 
>>> 
>>> #### Automated documentation in the website
>>> 
>>> On the Pulsar website we are auto-generating C++ documentation with the
>> Doxygen
>>> tool and the Python one with Pdoc.
>>> 
>>> Instead of just fetching the main repo code, the website build job
>> should be
>>> also fetching the new repos to run the tooling.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Matteo Merli
>>> <mm...@apache.org>
>> 
>> 


Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by tison <wa...@gmail.com>.
Hi Yunze,

> Just wondering if there is a way to retain the git history in the
pulsar-client-cpp directory?

Matteo's proposal already write:

> git filter-repo --subdirectory-filter  pulsar-client-cpp

So you will retain the git history.

Best,
tison.


Yunze Xu <yz...@streamnative.io.invalid> 于2022年9月20日周二 16:27写道:

> LGTM. I also listed the related files outside the pulsar-client-cpp
> directory recently:
>
> - pulsar-common/src/main/proto/PulsarApi.proto: the Pulsar binary
>   proto file
> - src/gen-pulsar-version-macro.py: generate the internal version info
> - pulsar-client/src/test/proto/*.proto: test the protobuf native
>   schema feature
>
> It would not be a complicated job for that. Just wondering if there is
> a way to retain the git history in the pulsar-client-cpp directory?
>
> Thanks,
> Yunze
>
>
>
>
> > On Sep 20, 2022, at 07:25, Matteo Merli <mm...@apache.org> wrote:
> >
> > https://github.com/apache/pulsar/issues/17724
> >
> >
> >
> > ## Motivation
> >
> > Pulsar C++ code base is in the same main repository for the Pulsar
> project.
> >
> > While the decision was the right one at the time, there is a
> > considerable overhead
> > in keeping the C++ client in its current position.
> >
> > ### Issues with the current approach
> >
> > The Pulsar repository has grown a lot in size and number of active
> developers.
> >
> > 1. The frequency of changes in various parts of the codebase has
> increased to a
> >    point where the amount of resources dedicated to CI is very
> significant.
> >
> >    Every change in Java code will trigger the CI jobs for the C++
> > client and every
> >    change in the C++ client will do the same.
> >
> >    During a CI job we are building the C++ client multiple times:
> >     1. For C++ and Python client tests
> >     2. To build Python wheels to be included in the pulsar Docker
> > images (for supporting
> >        Pulsar functions)
> >
> > 2. The release process for Pulsar has become very complex and
> > requires building a
> >    large number of binaries for C++ and Python clients. This has
> > become too much
> >    of a burden during the course of a Pulsar release.
> >
> >
> > ## Goal
> >
> > Decouple the development of C++ and Python client libraries from the
> development
> > of the core components of Pulsar in Java.
> >
> >
> > ## Changes
> >
> > ### Repositories
> >
> > 1. Move the C++ client code to a new repository
> > `github.com/apache/pulsar-client-c++`
> <http://github.com/apache/pulsar-client-c++>
> > 2. Move the Python client code to a new repository
> > `github.com/apache/pulsar-client-python`
> <http://github.com/apache/pulsar-client-python>
> >
> > The change will be done without losing any history, extracting a
> > sub-directory into
> > a new Git repository.
> >
> > ```
> > git filter-repo --subdirectory-filter  pulsar-client-cpp
> > ```
> >
> > ### Release process
> >
> > The release process will be split in multiple parts:
> >
> > 1. the main Pulsar release will only contain the Java parts (server
> > distribution
> >    and Java client library)
> > 2. The C++ client will have its own release schedule and versioning
> > 3. The Python client will have its own release schedule and versioning
> >
> > #### Versioning
> >
> > Both C++ and Python clients will continue with their own individual
> versioning.
> >
> > In order to not break anything or cause more confusion, we would need to
> use
> > a new version that is bigger than the current version (2.11.x).
> >
> > The suggestion is to start the new releases for both C++ and Python from
> 3.0.0.
> >
> >
> > #### Existing branches
> >
> > Existing branches of Pulsar, where the C++ client will still be in the
> same main
> > the repository and will be receiving bug fixes in their current location.
> >
> > The different location of the new C++ code will make the cherry-picking
> process
> > slightly more painful in the short term, though it will even out in long
> term.
> >
> >
> > ### Projects dependencies
> >
> > #### C++/Python --> Pulsar
> >
> > Both C++ and Python unit/integration tests are designed to run against
> > a standalone
> > instance of Pulsar broker. In the current form, they're using the
> `master` code
> > that is built to run the tests.
> >
> > After the split, the unit tests will use a Docker image of Pulsar. We
> > can use a few
> > different images to test for compatibility
> > 1. Latest stable (eg: 2.10.1)
> > 2. Nightly (Pulsar Docker image published every day from master branch)
> >
> > #### Pulsar --> Python
> >
> > To create a Pulsar image, we are now building the Python client wheel
> > file and then
> > installing it at build time.
> >
> > Instead, we are going to include a wheel file for a version of the
> Python client
> > that has been already released.
> >
> > #### Python --> C++
> >
> > The Python client library is just a wrapper on top of the C++ client.
> > Today these
> > are built together, with Python wrapper code residing in a
> > sub-directory of C++ client
> > code, and compiled using the same CMake build script.
> >
> > By separating the Python client into a different repository, we are
> going to
> > depend on an already released version of the C++ client.
> >
> >
> > #### Automated documentation in the website
> >
> > On the Pulsar website we are auto-generating C++ documentation with the
> Doxygen
> > tool and the Python one with Pdoc.
> >
> > Instead of just fetching the main repo code, the website build job
> should be
> > also fetching the new repos to run the tooling.
> >
> >
> >
> >
> >
> >
> > --
> > Matteo Merli
> > <mm...@apache.org>
>
>

Re: [DISCUSS] PIP-209: Separate C++/Python clients to own repositories

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.
LGTM. I also listed the related files outside the pulsar-client-cpp
directory recently:

- pulsar-common/src/main/proto/PulsarApi.proto: the Pulsar binary
  proto file
- src/gen-pulsar-version-macro.py: generate the internal version info
- pulsar-client/src/test/proto/*.proto: test the protobuf native
  schema feature

It would not be a complicated job for that. Just wondering if there is
a way to retain the git history in the pulsar-client-cpp directory?

Thanks,
Yunze




> On Sep 20, 2022, at 07:25, Matteo Merli <mm...@apache.org> wrote:
> 
> https://github.com/apache/pulsar/issues/17724
> 
> 
> 
> ## Motivation
> 
> Pulsar C++ code base is in the same main repository for the Pulsar project.
> 
> While the decision was the right one at the time, there is a
> considerable overhead
> in keeping the C++ client in its current position.
> 
> ### Issues with the current approach
> 
> The Pulsar repository has grown a lot in size and number of active developers.
> 
> 1. The frequency of changes in various parts of the codebase has increased to a
>    point where the amount of resources dedicated to CI is very significant.
> 
>    Every change in Java code will trigger the CI jobs for the C++
> client and every
>    change in the C++ client will do the same.
> 
>    During a CI job we are building the C++ client multiple times:
>     1. For C++ and Python client tests
>     2. To build Python wheels to be included in the pulsar Docker
> images (for supporting
>        Pulsar functions)
> 
> 2. The release process for Pulsar has become very complex and
> requires building a
>    large number of binaries for C++ and Python clients. This has
> become too much
>    of a burden during the course of a Pulsar release.
> 
> 
> ## Goal
> 
> Decouple the development of C++ and Python client libraries from the development
> of the core components of Pulsar in Java.
> 
> 
> ## Changes
> 
> ### Repositories
> 
> 1. Move the C++ client code to a new repository
> `github.com/apache/pulsar-client-c++`
> 2. Move the Python client code to a new repository
> `github.com/apache/pulsar-client-python`
> 
> The change will be done without losing any history, extracting a
> sub-directory into
> a new Git repository.
> 
> ```
> git filter-repo --subdirectory-filter  pulsar-client-cpp
> ```
> 
> ### Release process
> 
> The release process will be split in multiple parts:
> 
> 1. the main Pulsar release will only contain the Java parts (server
> distribution
>    and Java client library)
> 2. The C++ client will have its own release schedule and versioning
> 3. The Python client will have its own release schedule and versioning
> 
> #### Versioning
> 
> Both C++ and Python clients will continue with their own individual versioning.
> 
> In order to not break anything or cause more confusion, we would need to use
> a new version that is bigger than the current version (2.11.x).
> 
> The suggestion is to start the new releases for both C++ and Python from 3.0.0.
> 
> 
> #### Existing branches
> 
> Existing branches of Pulsar, where the C++ client will still be in the same main
> the repository and will be receiving bug fixes in their current location.
> 
> The different location of the new C++ code will make the cherry-picking process
> slightly more painful in the short term, though it will even out in long term.
> 
> 
> ### Projects dependencies
> 
> #### C++/Python --> Pulsar
> 
> Both C++ and Python unit/integration tests are designed to run against
> a standalone
> instance of Pulsar broker. In the current form, they're using the `master` code
> that is built to run the tests.
> 
> After the split, the unit tests will use a Docker image of Pulsar. We
> can use a few
> different images to test for compatibility
> 1. Latest stable (eg: 2.10.1)
> 2. Nightly (Pulsar Docker image published every day from master branch)
> 
> #### Pulsar --> Python
> 
> To create a Pulsar image, we are now building the Python client wheel
> file and then
> installing it at build time.
> 
> Instead, we are going to include a wheel file for a version of the Python client
> that has been already released.
> 
> #### Python --> C++
> 
> The Python client library is just a wrapper on top of the C++ client.
> Today these
> are built together, with Python wrapper code residing in a
> sub-directory of C++ client
> code, and compiled using the same CMake build script.
> 
> By separating the Python client into a different repository, we are going to
> depend on an already released version of the C++ client.
> 
> 
> #### Automated documentation in the website
> 
> On the Pulsar website we are auto-generating C++ documentation with the Doxygen
> tool and the Python one with Pdoc.
> 
> Instead of just fetching the main repo code, the website build job should be
> also fetching the new repos to run the tooling.
> 
> 
> 
> 
> 
> 
> --
> Matteo Merli
> <mm...@apache.org>