You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by Jiajie Zhong <zh...@gmail.com> on 2022/11/02 09:30:01 UTC

[PROPOSAL] Maintenance Python API code in independent repository

I propose to separate Python API code into another repository to
maintenance, and release Python API in an independent process with an
independent version

Currently our Python API code is a module in apache/dolphinscheduler
codebase, each time users change Python API code, they need to run all
requests CI check for dolphinscheduler and Python API, But if the user
does only change Python code, it could be merged if Python API CI pass
and do not dependent on others CI.

Besides, we release Python API as the same version of
dolphinscheduler. It is easy for user to match Python API version. But
when Python API does not change any code, but dolphinscheduler release
a bugfix version, Python API has to release the new version to match
dolphinscheduler. This happened when we released Python API 2.0.6 and
2.0.7. 2.0.6 and 2.0.7 is bugfix version, and Python API does not
change any code, so the PyPI package is the same.

Separate Python API also makes our code more sense, we will have more
distinguished code in dolphinscheduler and Python API new repository.
Have separate
issue tracker and changelog for information to users.

-- 
Best Wish
— Jay Chung

Re: [PROPOSAL] Maintenance Python API code in independent repository

Posted by Jiajie Zhong <zh...@hotmail.com>.
Thank Eric for the support, and I will try to separate during this week,

And for the python api version, after we separate Python API from the main repo
It will have a different version number and different release rate from apache/dolphinscheudler

We will have a document to tell users the match between Python API to dolphinscheudler
and we will have a version match mechanism when users submit requests from Python API
to dolphinscheudler. When submit requests with wrong API version, will raise a warning/error to
In the console


> On Nov 3, 2022, at 10:46, Chufeng Gao <ch...@gmail.com> wrote:
> 
> Hi Jay,
> 
> I'm definitely +1 to this. Putting py code in another repo will reduce the
> CI running time of both sides. Just curious, as we discussed before in a
> community conference, another purpose for separating py code is that we
> expect a faster iteration of pyds. If we `release Python API as the same
> version of dolphinscheduler`, how could we achieve that?
> 
> Thanks.
> 
> *Best Regards,*
> 
> *Chufeng (Eric) Gao*
> 
> 
> 
> On Wed, Nov 2, 2022 at 5:30 PM Jiajie Zhong <zh...@gmail.com>
> wrote:
> 
>> I propose to separate Python API code into another repository to
>> maintenance, and release Python API in an independent process with an
>> independent version
>> 
>> Currently our Python API code is a module in apache/dolphinscheduler
>> codebase, each time users change Python API code, they need to run all
>> requests CI check for dolphinscheduler and Python API, But if the user
>> does only change Python code, it could be merged if Python API CI pass
>> and do not dependent on others CI.
>> 
>> Besides, we release Python API as the same version of
>> dolphinscheduler. It is easy for user to match Python API version. But
>> when Python API does not change any code, but dolphinscheduler release
>> a bugfix version, Python API has to release the new version to match
>> dolphinscheduler. This happened when we released Python API 2.0.6 and
>> 2.0.7. 2.0.6 and 2.0.7 is bugfix version, and Python API does not
>> change any code, so the PyPI package is the same.
>> 
>> Separate Python API also makes our code more sense, we will have more
>> distinguished code in dolphinscheduler and Python API new repository.
>> Have separate
>> issue tracker and changelog for information to users.
>> 
>> --
>> Best Wish
>> — Jay Chung
>> 


Re: [PROPOSAL] Maintenance Python API code in independent repository

Posted by Jay Chung <zh...@hotmail.com>.
Hi guys,

I am here to update the latest event of this proposal.

After two days passed, we created a new repository for Python API independent
repo. You can find it in [1].

Due to Python API build all version docs base on tags, so I also migrate all release
version of Python API into the new repo, which you can see in [2]

After that, we migrated the codes to the new repo, and the last thing we have to
do is remove the existing code and others related to Python API, which will be done in
[3], and finally, change our website build behavior in [4]. The whole proposal will be
Closed when we pass PR [3] and [4], 


[1]: https://github.com/apache/dolphinscheduler-sdk-python <https://github.com/apache/dolphinscheduler-sdk-python>
[2]: https://github.com/apache/dolphinscheduler-sdk-python/pulls?q=is%3Apr+is%3Aclosed+migrate+in%3Atitle <https://github.com/apache/dolphinscheduler-sdk-python/pulls?q=is:pr+is:closed+migrate+in:title>
[3]: https://github.com/apache/dolphinscheduler/pull/12779 <https://github.com/apache/dolphinscheduler/pull/12779>
[4]: https://github.com/apache/dolphinscheduler-website/pull/839 <https://github.com/apache/dolphinscheduler-website/pull/839>
Cheers,
— Jay Chung

> On Nov 8, 2022, at 21:39, david zollo <da...@gmail.com> wrote:
> 
> +1
> 
> 
> 
> Best Regards
> 
> ---------------
> Apache DolphinScheduler PMC Chair & Apache SeaTunnel PPMC
> David
> Linkedin: https://www.linkedin.com/in/davidzollo
> Twitter: @WorkflowEasy <https://twitter.com/WorkflowEasy>
> ---------------
> 
> 
> On Mon, Nov 7, 2022 at 4:34 PM Jiajie Zhong <zh...@hotmail.com>
> wrote:
> 
>> BTW, when we separate Python API into another repository, we have to
>> handle one
>> more issue, should we restart the Python API package in the version or
>> continue to keep
>> with the current version?
>> 
>> We may have two choices here.
>> 
>> 1. Change pypi package name(eg. from `apache-dolphinscheduler` to
>> `pydolphinscheudler`)
>>    and start with version 0.0.1. Pros are we will have a better version
>> in the further, and let user
>>   know is it a different package and they should take care of when they
>> use the Python API.
>>   Cons are users may confuse when they use both packages
>> `apache-dolphinscheduler` and
>>   `pydolphinscheudler`(although we will add some doc to describe it)
>> 2. Keep package name, but use 4.0.0 as the first version after we separate
>> it. Pros are we can
>>   Still keep the package name, users can directly upgrade by pip or other
>> package version manager.
>>   Cons are users may confuse when dolphin scheduler release version 4.x.x
>> (we will still add some
>>   document to describe it too)
>> 
>> I personally prefer to use the second method, does anyone have other ideas
>> about it?
>> 
>> 
>>> On Nov 7, 2022, at 16:15, Jiajie Zhong <zh...@hotmail.com>
>> wrote:
>>> 
>>> Thank Eric for the support, and I will try to separate during this week,
>>> 
>>> And for the python api version, after we separate Python API from the
>> main repo
>>> It will have a different version number and different release rate from
>> apache/dolphinscheudler
>>> 
>>> We will have a document to tell users the match between Python API to
>> dolphinscheudler
>>> and we will have a version match mechanism when users submit requests
>> from Python API
>>> to dolphinscheudler. When submit requests with wrong API version, will
>> raise a warning/error to
>>> In the console
>>> 
>>> 
>>>> On Nov 3, 2022, at 10:46, Chufeng Gao <ch...@gmail.com> wrote:
>>>> 
>>>> Hi Jay,
>>>> 
>>>> I'm definitely +1 to this. Putting py code in another repo will reduce
>> the
>>>> CI running time of both sides. Just curious, as we discussed before in a
>>>> community conference, another purpose for separating py code is that we
>>>> expect a faster iteration of pyds. If we `release Python API as the same
>>>> version of dolphinscheduler`, how could we achieve that?
>>>> 
>>>> Thanks.
>>>> 
>>>> *Best Regards,*
>>>> 
>>>> *Chufeng (Eric) Gao*
>>>> 
>>>> 
>>>> 
>>>> On Wed, Nov 2, 2022 at 5:30 PM Jiajie Zhong <zh...@gmail.com>
>>>> wrote:
>>>> 
>>>>> I propose to separate Python API code into another repository to
>>>>> maintenance, and release Python API in an independent process with an
>>>>> independent version
>>>>> 
>>>>> Currently our Python API code is a module in apache/dolphinscheduler
>>>>> codebase, each time users change Python API code, they need to run all
>>>>> requests CI check for dolphinscheduler and Python API, But if the user
>>>>> does only change Python code, it could be merged if Python API CI pass
>>>>> and do not dependent on others CI.
>>>>> 
>>>>> Besides, we release Python API as the same version of
>>>>> dolphinscheduler. It is easy for user to match Python API version. But
>>>>> when Python API does not change any code, but dolphinscheduler release
>>>>> a bugfix version, Python API has to release the new version to match
>>>>> dolphinscheduler. This happened when we released Python API 2.0.6 and
>>>>> 2.0.7. 2.0.6 and 2.0.7 is bugfix version, and Python API does not
>>>>> change any code, so the PyPI package is the same.
>>>>> 
>>>>> Separate Python API also makes our code more sense, we will have more
>>>>> distinguished code in dolphinscheduler and Python API new repository.
>>>>> Have separate
>>>>> issue tracker and changelog for information to users.
>>>>> 
>>>>> --
>>>>> Best Wish
>>>>> — Jay Chung
>>>>> 
>>> 
>> 
>> 


Re: [PROPOSAL] Maintenance Python API code in independent repository

Posted by david zollo <da...@gmail.com>.
+1



Best Regards

---------------
Apache DolphinScheduler PMC Chair & Apache SeaTunnel PPMC
David
Linkedin: https://www.linkedin.com/in/davidzollo
Twitter: @WorkflowEasy <https://twitter.com/WorkflowEasy>
---------------


On Mon, Nov 7, 2022 at 4:34 PM Jiajie Zhong <zh...@hotmail.com>
wrote:

> BTW, when we separate Python API into another repository, we have to
> handle one
> more issue, should we restart the Python API package in the version or
> continue to keep
> with the current version?
>
> We may have two choices here.
>
> 1. Change pypi package name(eg. from `apache-dolphinscheduler` to
> `pydolphinscheudler`)
>     and start with version 0.0.1. Pros are we will have a better version
> in the further, and let user
>    know is it a different package and they should take care of when they
> use the Python API.
>    Cons are users may confuse when they use both packages
> `apache-dolphinscheduler` and
>    `pydolphinscheudler`(although we will add some doc to describe it)
> 2. Keep package name, but use 4.0.0 as the first version after we separate
> it. Pros are we can
>    Still keep the package name, users can directly upgrade by pip or other
> package version manager.
>    Cons are users may confuse when dolphin scheduler release version 4.x.x
> (we will still add some
>    document to describe it too)
>
> I personally prefer to use the second method, does anyone have other ideas
> about it?
>
>
> > On Nov 7, 2022, at 16:15, Jiajie Zhong <zh...@hotmail.com>
> wrote:
> >
> > Thank Eric for the support, and I will try to separate during this week,
> >
> > And for the python api version, after we separate Python API from the
> main repo
> > It will have a different version number and different release rate from
> apache/dolphinscheudler
> >
> > We will have a document to tell users the match between Python API to
> dolphinscheudler
> > and we will have a version match mechanism when users submit requests
> from Python API
> > to dolphinscheudler. When submit requests with wrong API version, will
> raise a warning/error to
> > In the console
> >
> >
> >> On Nov 3, 2022, at 10:46, Chufeng Gao <ch...@gmail.com> wrote:
> >>
> >> Hi Jay,
> >>
> >> I'm definitely +1 to this. Putting py code in another repo will reduce
> the
> >> CI running time of both sides. Just curious, as we discussed before in a
> >> community conference, another purpose for separating py code is that we
> >> expect a faster iteration of pyds. If we `release Python API as the same
> >> version of dolphinscheduler`, how could we achieve that?
> >>
> >> Thanks.
> >>
> >> *Best Regards,*
> >>
> >> *Chufeng (Eric) Gao*
> >>
> >>
> >>
> >> On Wed, Nov 2, 2022 at 5:30 PM Jiajie Zhong <zh...@gmail.com>
> >> wrote:
> >>
> >>> I propose to separate Python API code into another repository to
> >>> maintenance, and release Python API in an independent process with an
> >>> independent version
> >>>
> >>> Currently our Python API code is a module in apache/dolphinscheduler
> >>> codebase, each time users change Python API code, they need to run all
> >>> requests CI check for dolphinscheduler and Python API, But if the user
> >>> does only change Python code, it could be merged if Python API CI pass
> >>> and do not dependent on others CI.
> >>>
> >>> Besides, we release Python API as the same version of
> >>> dolphinscheduler. It is easy for user to match Python API version. But
> >>> when Python API does not change any code, but dolphinscheduler release
> >>> a bugfix version, Python API has to release the new version to match
> >>> dolphinscheduler. This happened when we released Python API 2.0.6 and
> >>> 2.0.7. 2.0.6 and 2.0.7 is bugfix version, and Python API does not
> >>> change any code, so the PyPI package is the same.
> >>>
> >>> Separate Python API also makes our code more sense, we will have more
> >>> distinguished code in dolphinscheduler and Python API new repository.
> >>> Have separate
> >>> issue tracker and changelog for information to users.
> >>>
> >>> --
> >>> Best Wish
> >>> — Jay Chung
> >>>
> >
>
>

Re: [PROPOSAL] Maintenance Python API code in independent repository

Posted by Jiajie Zhong <zh...@hotmail.com>.
BTW, when we separate Python API into another repository, we have to handle one
more issue, should we restart the Python API package in the version or continue to keep
with the current version?

We may have two choices here.

1. Change pypi package name(eg. from `apache-dolphinscheduler` to `pydolphinscheudler`)
    and start with version 0.0.1. Pros are we will have a better version in the further, and let user
   know is it a different package and they should take care of when they use the Python API.
   Cons are users may confuse when they use both packages `apache-dolphinscheduler` and
   `pydolphinscheudler`(although we will add some doc to describe it)
2. Keep package name, but use 4.0.0 as the first version after we separate it. Pros are we can
   Still keep the package name, users can directly upgrade by pip or other package version manager.
   Cons are users may confuse when dolphin scheduler release version 4.x.x (we will still add some
   document to describe it too)

I personally prefer to use the second method, does anyone have other ideas about it?


> On Nov 7, 2022, at 16:15, Jiajie Zhong <zh...@hotmail.com> wrote:
> 
> Thank Eric for the support, and I will try to separate during this week,
> 
> And for the python api version, after we separate Python API from the main repo
> It will have a different version number and different release rate from apache/dolphinscheudler
> 
> We will have a document to tell users the match between Python API to dolphinscheudler
> and we will have a version match mechanism when users submit requests from Python API
> to dolphinscheudler. When submit requests with wrong API version, will raise a warning/error to
> In the console
> 
> 
>> On Nov 3, 2022, at 10:46, Chufeng Gao <ch...@gmail.com> wrote:
>> 
>> Hi Jay,
>> 
>> I'm definitely +1 to this. Putting py code in another repo will reduce the
>> CI running time of both sides. Just curious, as we discussed before in a
>> community conference, another purpose for separating py code is that we
>> expect a faster iteration of pyds. If we `release Python API as the same
>> version of dolphinscheduler`, how could we achieve that?
>> 
>> Thanks.
>> 
>> *Best Regards,*
>> 
>> *Chufeng (Eric) Gao*
>> 
>> 
>> 
>> On Wed, Nov 2, 2022 at 5:30 PM Jiajie Zhong <zh...@gmail.com>
>> wrote:
>> 
>>> I propose to separate Python API code into another repository to
>>> maintenance, and release Python API in an independent process with an
>>> independent version
>>> 
>>> Currently our Python API code is a module in apache/dolphinscheduler
>>> codebase, each time users change Python API code, they need to run all
>>> requests CI check for dolphinscheduler and Python API, But if the user
>>> does only change Python code, it could be merged if Python API CI pass
>>> and do not dependent on others CI.
>>> 
>>> Besides, we release Python API as the same version of
>>> dolphinscheduler. It is easy for user to match Python API version. But
>>> when Python API does not change any code, but dolphinscheduler release
>>> a bugfix version, Python API has to release the new version to match
>>> dolphinscheduler. This happened when we released Python API 2.0.6 and
>>> 2.0.7. 2.0.6 and 2.0.7 is bugfix version, and Python API does not
>>> change any code, so the PyPI package is the same.
>>> 
>>> Separate Python API also makes our code more sense, we will have more
>>> distinguished code in dolphinscheduler and Python API new repository.
>>> Have separate
>>> issue tracker and changelog for information to users.
>>> 
>>> --
>>> Best Wish
>>> — Jay Chung
>>> 
> 


Re: [PROPOSAL] Maintenance Python API code in independent repository

Posted by Chufeng Gao <ch...@gmail.com>.
Hi Jay,

I'm definitely +1 to this. Putting py code in another repo will reduce the
CI running time of both sides. Just curious, as we discussed before in a
community conference, another purpose for separating py code is that we
expect a faster iteration of pyds. If we `release Python API as the same
version of dolphinscheduler`, how could we achieve that?

Thanks.

*Best Regards,*

*Chufeng (Eric) Gao*



On Wed, Nov 2, 2022 at 5:30 PM Jiajie Zhong <zh...@gmail.com>
wrote:

> I propose to separate Python API code into another repository to
> maintenance, and release Python API in an independent process with an
> independent version
>
> Currently our Python API code is a module in apache/dolphinscheduler
> codebase, each time users change Python API code, they need to run all
> requests CI check for dolphinscheduler and Python API, But if the user
> does only change Python code, it could be merged if Python API CI pass
> and do not dependent on others CI.
>
> Besides, we release Python API as the same version of
> dolphinscheduler. It is easy for user to match Python API version. But
> when Python API does not change any code, but dolphinscheduler release
> a bugfix version, Python API has to release the new version to match
> dolphinscheduler. This happened when we released Python API 2.0.6 and
> 2.0.7. 2.0.6 and 2.0.7 is bugfix version, and Python API does not
> change any code, so the PyPI package is the same.
>
> Separate Python API also makes our code more sense, we will have more
> distinguished code in dolphinscheduler and Python API new repository.
> Have separate
> issue tracker and changelog for information to users.
>
> --
> Best Wish
> — Jay Chung
>