You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@seatunnel.apache.org by Benedict Jin <as...@apache.org> on 2022/02/09 04:10:00 UTC

SeaTunnel automatically generates LICENSE design

Hi,

Here is the SeaTunnel automatically generates LICENSE design, as follows:
Background

As we all know, the SeaTunnel is a high-performance Data Integration
Platform that supports efficient data transformation and transfer between
heterogeneous data sources. Therefore, it is inevitable to introduce many
third-party dependencies. Therefore, there is also a problem, which is,
with more and more third-party components, it will be a very
labor-intensive process to manually maintain LICENSE. At the same time,
newbies need to learn this LICENSE mechanism, which also increases the
entry threshold for newbies. Therefore, this document will introduce a way
to automatically generate the LICENSE file.
Requirement

After preliminary research, some basic requirements have been sorted out:

   1. It needs to be implemented in a scripting language to facilitate
   understanding and maintenance;
   2. It can be integrated with the existing Maven build process;
   3. It should support Github Action to trigger automatically.

Plan Version v1.0

The easiest way is to generate a temporary THIRD-PARTY.txt through Maven’s
license-maven-plugin plugin. Then through the Python script, the LICENSE
file is automatically parsed and created.
[image: image.png]


Version v2.0

Further through Maven’s exec-maven-plugin plug-in, it supports one-click
triggering by using Maven.
[image: image.png]


Version v3.0

Going a step further, we can integrate with Github Action. When a new PR is
created, if the dependency changes, Github Action automatically modifies
the LICENSE, and creates a commit to submit it to the new PR. In this way,
when contributing new plug-ins, users do not need to understand the LICENSE
mechanism, all of which are automatically modified, which greatly reduces
the threshold for newbies.
[image: image.png] Conclusion

Currently I have created the #1210
<https://github.com/apache/incubator-seatunnel/pull/1210> PR, which has
completed versions v1.0 and v2.0, we need to discuss whether v3.0 needs to
be implemented at this stage. And then we can implement v3.0 in a follow-up
PR if necessary.

Any comments are welcome, thank you very much.
Regards,
Benedict Jin

Re: SeaTunnel automatically generates LICENSE design

Posted by Jiajie Zhong <zh...@gmail.com>.
Sound good, and thanks Benedict.

I would take a quick look at this, I do not know much about license,
so I could only give some syntax advice in the PR.

Cheers,
— Jiajie Zhong


Re: SeaTunnel automatically generates LICENSE design

Posted by Benedict Jin <as...@apache.org>.
Hi Jiajie Zhong,

Yes, I agree with Calvin's idea too. For now, the v2.0 version is quite enough for us. Using command to trigger the process sounds good to me. I implemented it before, so it is feasible. I will create another PR to complete the process.

By the way, I have rewrote the Python script, and now the script completely uses Python's native APIs, does not rely on any third-party libraries, and can run in the native Python environment. Beside, the latest version script already supports the handling of unknown licneses, and all existing unknown protocols have found the corresponding licenses and are correctly marked. FYI, https://github.com/apache/incubator-seatunnel/pull/1210/files

Regards,
Benedict Jin

On 2022/02/10 03:40:56 Jiajie Zhong wrote:
> Hey Benedict,
> 
> I agree with Calvin's idea. We should better run check in CI and hint user how to add license.
> But I think if we want to achieve this function. We should a bot and use commands to trigger 
> whether to automatically modify files or not.
> 
> I give a simple example here. when bot hint license or header is missing, and we think it just
> add some header to it. we should trigger by simple command in github like `/auto-license` and
> bot would auto add license header to missing file.
> 
> **BUT I think maybe this is too COMPLEX for now**
> But if we think we could implement it in a short time, it would be great.
> 
> Otherwise, Benedict idea is OK to me, it use wish to use our dev script to auto full license or
> header, he/she should install python develop envirement locally. Or use could fix license issue
> manually
> 
> Cheers,
> — Jiajie Zhong

Re: SeaTunnel automatically generates LICENSE design

Posted by Jiajie Zhong <zh...@gmail.com>.
Hey Benedict,

I agree with Calvin's idea. We should better run check in CI and hint user how to add license.
But I think if we want to achieve this function. We should a bot and use commands to trigger 
whether to automatically modify files or not.

I give a simple example here. when bot hint license or header is missing, and we think it just
add some header to it. we should trigger by simple command in github like `/auto-license` and
bot would auto add license header to missing file.

**BUT I think maybe this is too COMPLEX for now**
But if we think we could implement it in a short time, it would be great.

Otherwise, Benedict idea is OK to me, it use wish to use our dev script to auto full license or
header, he/she should install python develop envirement locally. Or use could fix license issue
manually

Cheers,
— Jiajie Zhong

Re: SeaTunnel automatically generates LICENSE design

Posted by Benedict Jin <as...@apache.org>.
Hi CalvinKirs,

Thank you for your comments. Yes, that is right the Github Action is more suitable for checking. About the Python env, I believe Python can be installed with one-click. We only need to find the package in https://www.python.org/downloads/ and install it. And Python has been installed on MacOS, CentOS, Ubuntu, and etc. Of course, if we want to take care of users on non Unix-like OS, there is another way, which is convert Python script to exe file. Also we can provide requirements.txt file to help quickly install 3rd party Python dependencies. Even, the script can avoid any 3rd party Python dependencies and just use the native APIs, if we are woring about them. What do you think

Regards,
Benedict Jin

On 2022/02/09 12:38:34 CalvinKirs wrote:
> These should all be done by CI, but CI is only responsible for checking and outputting the results. Like I said above.
> Knowing that our contributors mainly use Java, if they need a Python environment to check, that's still a challenge for them. maybe just for me :) 
> 
> 
> Best wishes!
> Calvin Kirs
> 
> 
> On 02/9/2022 13:58,Benedict Jin<as...@apache.org> wrote:
> Hi CalvinKirs,
> 
> Thanks for your comment. Agree, then the v2.0 is enough and we don't need to implement the v3.0 version, and no need automatically modify LICENSE, just as a local tool to help us to generate LICENSE file.
> 
> After this PR is merged, the overall process becomes: the Github Action with skywalking-eyes still automatically performs checks. If it does not pass, we can generate LICENSE locally through the exec-maven-plugin maven plugin with one step to reduce the user threshold. And we handle these Unknown licenses and source-level dependencies manually.
> 
> Regards,
> Benedict Jin
> 
> On 2022/02/09 05:27:27 CalvinKirs wrote:
> hi Benedict,
> 
> This sounds like a very good feature, but as Zhenxu and I said on the PR[1], there will be some detected licenses that are Unknow, but in fact they may still be available, just need our human intervention to check. The other is source-level dependencies.
> 
> Then there are some dual licenses, and different license declarations, such as AL2, some check results are Apache 2.0, while others are The Apache Software License, Version 2.0, and some such as the check results are Go License, but in fact It is BSD License.
> 
> I prefer CI check, and output relevant results to remind contributors, and then contributors add the corresponding licenses, such as which licenses we need to add, and which licenses need to be checked by the maintainer. Instead of automatically helping him to add.
> License compliance is particularly important for an Apache project.
> 
> In addition, we are currently using skywalking-eyes[2] to do some checks, if these functions can be contributed to SkyWalking-eyes, it may be better so that he can benefit more projects. sure, it's up to you.
> 
> [1]https://github.com/apache/incubator-seatunnel/pull/1210
> [2]https://github.com/apache/skywalking-eyes
> 
> 
> Best wishes!
> Calvin Kirs
> 
> 
> On 02/9/2022 12:15,Benedict Jin<as...@apache.org> wrote:
> It seems that the picture cannot be displayed well, you can see the description of this issue. FYI, https://github.com/apache/incubator-seatunnel/issues/1209
> 
> 
> On Wed, 9 Feb 2022 at 12:10, Benedict Jin <as...@apache.org> wrote:
> 
> Hi,
> 
> Here is the SeaTunnel automatically generates LICENSE design, as follows:
> 
> Background
> 
> As we all know, the SeaTunnel is a high-performance Data Integration Platform that supports efficient data transformation and transfer between heterogeneous data sources. Therefore, it is inevitable to introduce many third-party dependencies. Therefore, there is also a problem, which is, with more and more third-party components, it will be a very labor-intensive process to manually maintain LICENSE. At the same time, newbies need to learn this LICENSE mechanism, which also increases the entry threshold for newbies. Therefore, this document will introduce a way to automatically generate the LICENSE file.
> 
> Requirement
> 
> After preliminary research, some basic requirements have been sorted out:
> 
> It needs to be implemented in a scripting language to facilitate understanding and maintenance;
> It can be integrated with the existing Maven build process;
> It should support Github Action to trigger automatically.
> Plan
> Version v1.0
> 
> The easiest way is to generate a temporary THIRD-PARTY.txt through Maven’s license-maven-plugin plugin. Then through the Python script, the LICENSE file is automatically parsed and created.
> 
> 
> 
> 
> Version v2.0
> 
> Further through Maven’s exec-maven-plugin plug-in, it supports one-click triggering by using Maven.
> 
> 
> 
> 
> Version v3.0
> 
> Going a step further, we can integrate with Github Action. When a new PR is created, if the dependency changes, Github Action automatically modifies the LICENSE, and creates a commit to submit it to the new PR. In this way, when contributing new plug-ins, users do not need to understand the LICENSE mechanism, all of which are automatically modified, which greatly reduces the threshold for newbies.
> 
> Conclusion
> 
> Currently I have created the #1210 PR, which has completed versions v1.0 and v2.0, we need to discuss whether v3.0 needs to be implemented at this stage. And then we can implement v3.0 in a follow-up PR if necessary.
> 
> Any comments are welcome, thank you very much.
> 
> 
> Regards,
> Benedict Jin
> 
> 

Re: SeaTunnel automatically generates LICENSE design

Posted by CalvinKirs <ac...@163.com>.
These should all be done by CI, but CI is only responsible for checking and outputting the results. Like I said above.
Knowing that our contributors mainly use Java, if they need a Python environment to check, that's still a challenge for them. maybe just for me :) 


Best wishes!
Calvin Kirs


On 02/9/2022 13:58,Benedict Jin<as...@apache.org> wrote:
Hi CalvinKirs,

Thanks for your comment. Agree, then the v2.0 is enough and we don't need to implement the v3.0 version, and no need automatically modify LICENSE, just as a local tool to help us to generate LICENSE file.

After this PR is merged, the overall process becomes: the Github Action with skywalking-eyes still automatically performs checks. If it does not pass, we can generate LICENSE locally through the exec-maven-plugin maven plugin with one step to reduce the user threshold. And we handle these Unknown licenses and source-level dependencies manually.

Regards,
Benedict Jin

On 2022/02/09 05:27:27 CalvinKirs wrote:
hi Benedict,

This sounds like a very good feature, but as Zhenxu and I said on the PR[1], there will be some detected licenses that are Unknow, but in fact they may still be available, just need our human intervention to check. The other is source-level dependencies.

Then there are some dual licenses, and different license declarations, such as AL2, some check results are Apache 2.0, while others are The Apache Software License, Version 2.0, and some such as the check results are Go License, but in fact It is BSD License.

I prefer CI check, and output relevant results to remind contributors, and then contributors add the corresponding licenses, such as which licenses we need to add, and which licenses need to be checked by the maintainer. Instead of automatically helping him to add.
License compliance is particularly important for an Apache project.

In addition, we are currently using skywalking-eyes[2] to do some checks, if these functions can be contributed to SkyWalking-eyes, it may be better so that he can benefit more projects. sure, it's up to you.

[1]https://github.com/apache/incubator-seatunnel/pull/1210
[2]https://github.com/apache/skywalking-eyes


Best wishes!
Calvin Kirs


On 02/9/2022 12:15,Benedict Jin<as...@apache.org> wrote:
It seems that the picture cannot be displayed well, you can see the description of this issue. FYI, https://github.com/apache/incubator-seatunnel/issues/1209


On Wed, 9 Feb 2022 at 12:10, Benedict Jin <as...@apache.org> wrote:

Hi,

Here is the SeaTunnel automatically generates LICENSE design, as follows:

Background

As we all know, the SeaTunnel is a high-performance Data Integration Platform that supports efficient data transformation and transfer between heterogeneous data sources. Therefore, it is inevitable to introduce many third-party dependencies. Therefore, there is also a problem, which is, with more and more third-party components, it will be a very labor-intensive process to manually maintain LICENSE. At the same time, newbies need to learn this LICENSE mechanism, which also increases the entry threshold for newbies. Therefore, this document will introduce a way to automatically generate the LICENSE file.

Requirement

After preliminary research, some basic requirements have been sorted out:

It needs to be implemented in a scripting language to facilitate understanding and maintenance;
It can be integrated with the existing Maven build process;
It should support Github Action to trigger automatically.
Plan
Version v1.0

The easiest way is to generate a temporary THIRD-PARTY.txt through Maven’s license-maven-plugin plugin. Then through the Python script, the LICENSE file is automatically parsed and created.




Version v2.0

Further through Maven’s exec-maven-plugin plug-in, it supports one-click triggering by using Maven.




Version v3.0

Going a step further, we can integrate with Github Action. When a new PR is created, if the dependency changes, Github Action automatically modifies the LICENSE, and creates a commit to submit it to the new PR. In this way, when contributing new plug-ins, users do not need to understand the LICENSE mechanism, all of which are automatically modified, which greatly reduces the threshold for newbies.

Conclusion

Currently I have created the #1210 PR, which has completed versions v1.0 and v2.0, we need to discuss whether v3.0 needs to be implemented at this stage. And then we can implement v3.0 in a follow-up PR if necessary.

Any comments are welcome, thank you very much.


Regards,
Benedict Jin


Re: SeaTunnel automatically generates LICENSE design

Posted by Benedict Jin <as...@apache.org>.
Hi CalvinKirs,

Thanks for your comment. Agree, then the v2.0 is enough and we don't need to implement the v3.0 version, and no need automatically modify LICENSE, just as a local tool to help us to generate LICENSE file.

After this PR is merged, the overall process becomes: the Github Action with skywalking-eyes still automatically performs checks. If it does not pass, we can generate LICENSE locally through the exec-maven-plugin maven plugin with one step to reduce the user threshold. And we handle these Unknown licenses and source-level dependencies manually.

Regards,
Benedict Jin

On 2022/02/09 05:27:27 CalvinKirs wrote:
> hi Benedict,
> 
> This sounds like a very good feature, but as Zhenxu and I said on the PR[1], there will be some detected licenses that are Unknow, but in fact they may still be available, just need our human intervention to check. The other is source-level dependencies.
> 
> Then there are some dual licenses, and different license declarations, such as AL2, some check results are Apache 2.0, while others are The Apache Software License, Version 2.0, and some such as the check results are Go License, but in fact It is BSD License.
> 
> I prefer CI check, and output relevant results to remind contributors, and then contributors add the corresponding licenses, such as which licenses we need to add, and which licenses need to be checked by the maintainer. Instead of automatically helping him to add.
> License compliance is particularly important for an Apache project.
> 
> In addition, we are currently using skywalking-eyes[2] to do some checks, if these functions can be contributed to SkyWalking-eyes, it may be better so that he can benefit more projects. sure, it's up to you.
> 
> [1]https://github.com/apache/incubator-seatunnel/pull/1210
> [2]https://github.com/apache/skywalking-eyes
> 
> 
> Best wishes!
> Calvin Kirs
> 
> 
> On 02/9/2022 12:15,Benedict Jin<as...@apache.org> wrote:
> It seems that the picture cannot be displayed well, you can see the description of this issue. FYI, https://github.com/apache/incubator-seatunnel/issues/1209
> 
> 
> On Wed, 9 Feb 2022 at 12:10, Benedict Jin <as...@apache.org> wrote:
> 
> Hi,
> 
> Here is the SeaTunnel automatically generates LICENSE design, as follows:
> 
> Background
> 
> As we all know, the SeaTunnel is a high-performance Data Integration Platform that supports efficient data transformation and transfer between heterogeneous data sources. Therefore, it is inevitable to introduce many third-party dependencies. Therefore, there is also a problem, which is, with more and more third-party components, it will be a very labor-intensive process to manually maintain LICENSE. At the same time, newbies need to learn this LICENSE mechanism, which also increases the entry threshold for newbies. Therefore, this document will introduce a way to automatically generate the LICENSE file.
> 
> Requirement
> 
> After preliminary research, some basic requirements have been sorted out:
> 
> It needs to be implemented in a scripting language to facilitate understanding and maintenance;
> It can be integrated with the existing Maven build process;
> It should support Github Action to trigger automatically.
> Plan
> Version v1.0
> 
> The easiest way is to generate a temporary THIRD-PARTY.txt through Maven’s license-maven-plugin plugin. Then through the Python script, the LICENSE file is automatically parsed and created.
> 
> 
> 
> 
> Version v2.0
> 
> Further through Maven’s exec-maven-plugin plug-in, it supports one-click triggering by using Maven.
> 
> 
> 
> 
> Version v3.0
> 
> Going a step further, we can integrate with Github Action. When a new PR is created, if the dependency changes, Github Action automatically modifies the LICENSE, and creates a commit to submit it to the new PR. In this way, when contributing new plug-ins, users do not need to understand the LICENSE mechanism, all of which are automatically modified, which greatly reduces the threshold for newbies.
> 
> Conclusion
> 
> Currently I have created the #1210 PR, which has completed versions v1.0 and v2.0, we need to discuss whether v3.0 needs to be implemented at this stage. And then we can implement v3.0 in a follow-up PR if necessary.
> 
> Any comments are welcome, thank you very much.
> 
> 
> Regards,
> Benedict Jin
> 

Re: SeaTunnel automatically generates LICENSE design

Posted by CalvinKirs <ac...@163.com>.
hi Benedict,

This sounds like a very good feature, but as Zhenxu and I said on the PR[1], there will be some detected licenses that are Unknow, but in fact they may still be available, just need our human intervention to check. The other is source-level dependencies.

Then there are some dual licenses, and different license declarations, such as AL2, some check results are Apache 2.0, while others are The Apache Software License, Version 2.0, and some such as the check results are Go License, but in fact It is BSD License.

I prefer CI check, and output relevant results to remind contributors, and then contributors add the corresponding licenses, such as which licenses we need to add, and which licenses need to be checked by the maintainer. Instead of automatically helping him to add.
License compliance is particularly important for an Apache project.

In addition, we are currently using skywalking-eyes[2] to do some checks, if these functions can be contributed to SkyWalking-eyes, it may be better so that he can benefit more projects. sure, it's up to you.

[1]https://github.com/apache/incubator-seatunnel/pull/1210
[2]https://github.com/apache/skywalking-eyes


Best wishes!
Calvin Kirs


On 02/9/2022 12:15,Benedict Jin<as...@apache.org> wrote:
It seems that the picture cannot be displayed well, you can see the description of this issue. FYI, https://github.com/apache/incubator-seatunnel/issues/1209


On Wed, 9 Feb 2022 at 12:10, Benedict Jin <as...@apache.org> wrote:

Hi,

Here is the SeaTunnel automatically generates LICENSE design, as follows:

Background

As we all know, the SeaTunnel is a high-performance Data Integration Platform that supports efficient data transformation and transfer between heterogeneous data sources. Therefore, it is inevitable to introduce many third-party dependencies. Therefore, there is also a problem, which is, with more and more third-party components, it will be a very labor-intensive process to manually maintain LICENSE. At the same time, newbies need to learn this LICENSE mechanism, which also increases the entry threshold for newbies. Therefore, this document will introduce a way to automatically generate the LICENSE file.

Requirement

After preliminary research, some basic requirements have been sorted out:

It needs to be implemented in a scripting language to facilitate understanding and maintenance;
It can be integrated with the existing Maven build process;
It should support Github Action to trigger automatically.
Plan
Version v1.0

The easiest way is to generate a temporary THIRD-PARTY.txt through Maven’s license-maven-plugin plugin. Then through the Python script, the LICENSE file is automatically parsed and created.




Version v2.0

Further through Maven’s exec-maven-plugin plug-in, it supports one-click triggering by using Maven.




Version v3.0

Going a step further, we can integrate with Github Action. When a new PR is created, if the dependency changes, Github Action automatically modifies the LICENSE, and creates a commit to submit it to the new PR. In this way, when contributing new plug-ins, users do not need to understand the LICENSE mechanism, all of which are automatically modified, which greatly reduces the threshold for newbies.

Conclusion

Currently I have created the #1210 PR, which has completed versions v1.0 and v2.0, we need to discuss whether v3.0 needs to be implemented at this stage. And then we can implement v3.0 in a follow-up PR if necessary.

Any comments are welcome, thank you very much.


Regards,
Benedict Jin

Re: SeaTunnel automatically generates LICENSE design

Posted by Benedict Jin <as...@apache.org>.
It seems that the picture cannot be displayed well, you can see the
description of this issue. FYI,
https://github.com/apache/incubator-seatunnel/issues/1209

On Wed, 9 Feb 2022 at 12:10, Benedict Jin <as...@apache.org> wrote:

> Hi,
>
> Here is the SeaTunnel automatically generates LICENSE design, as follows:
> Background
>
> As we all know, the SeaTunnel is a high-performance Data Integration
> Platform that supports efficient data transformation and transfer between
> heterogeneous data sources. Therefore, it is inevitable to introduce many
> third-party dependencies. Therefore, there is also a problem, which is,
> with more and more third-party components, it will be a very
> labor-intensive process to manually maintain LICENSE. At the same time,
> newbies need to learn this LICENSE mechanism, which also increases the
> entry threshold for newbies. Therefore, this document will introduce a way
> to automatically generate the LICENSE file.
> Requirement
>
> After preliminary research, some basic requirements have been sorted out:
>
>    1. It needs to be implemented in a scripting language to facilitate
>    understanding and maintenance;
>    2. It can be integrated with the existing Maven build process;
>    3. It should support Github Action to trigger automatically.
>
> Plan Version v1.0
>
> The easiest way is to generate a temporary THIRD-PARTY.txt through
> Maven’s license-maven-plugin plugin. Then through the Python script, the
> LICENSE file is automatically parsed and created.
> [image: image.png]
>
>
> Version v2.0
>
> Further through Maven’s exec-maven-plugin plug-in, it supports one-click
> triggering by using Maven.
> [image: image.png]
>
>
> Version v3.0
>
> Going a step further, we can integrate with Github Action. When a new PR
> is created, if the dependency changes, Github Action automatically modifies
> the LICENSE, and creates a commit to submit it to the new PR. In this way,
> when contributing new plug-ins, users do not need to understand the LICENSE
> mechanism, all of which are automatically modified, which greatly reduces
> the threshold for newbies.
> [image: image.png] Conclusion
>
> Currently I have created the #1210
> <https://github.com/apache/incubator-seatunnel/pull/1210> PR, which has
> completed versions v1.0 and v2.0, we need to discuss whether v3.0 needs to
> be implemented at this stage. And then we can implement v3.0 in a follow-up
> PR if necessary.
>
> Any comments are welcome, thank you very much.
> Regards,
> Benedict Jin
>