You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by James Turton <dz...@apache.org> on 2024/01/26 07:21:57 UTC

License headers inside Javadoc comments

Good morning!

I'd like to ask about a feature to prevent RAT from allowing license 
headers to appear inside Javadoc comments  (/**) while still requiring 
them in Java comments (/*) in .java files. Currently the Drill project 
makes use of com.mycila.license-maven-plugin to reject licenses in 
Javadoc comments because the developers at the time didn't want license 
headers cluttering the Javadoc website that is generated from the 
source. Are you aware of  a general view on Apache license headers 
appearing in Javadoc pages? If preventing them from doing so is a good 
idea, could this become a (configurable) feature in RAT?

Thanks
James Turton

Fwd: License headers inside Javadoc comments

Posted by James Turton <dz...@apache.org>.
Just a forward to complete the records.

-------- Forwarded Message --------
Subject: 	Re: License headers inside Javadoc comments
Date: 	Fri, 26 Jan 2024 15:38:29 +0200
From: 	James Turton <dz...@apache.org>
To: 	P. Ottlinger <po...@apache.org>, dev@creadur.apache.org
CC: 	dev <de...@drill.apache.org>



Thanks Phil.

Here's some background [1] which comes from before I was involved with 
Drill. What they wanted was for the license header checker to accept, in 
.java files,

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
etc.

but reject

/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
etc.

Notice the two asterisks that open the Java comment block in the second 
form thereby making it a Javadoc comment that will appear in generated 
Javadoc. There are no longer any examples of the latter in Drill but 
this has been enforced by the addition of the license-maven-plugin.

I got here because I want to remove that plugin, which essentially 
duplicates RAT, in favour of another (with exactly the same name :() 
that can generate license and notice information for our third party 
code. This last task is what I'm really doing, the Javadoc license 
header rejection matter is yak shaving that came up on the road.

So my yak shaving question is: if I make RAT Drill's only license header 
checker then could I make it reject license headers of the second form? 
Even if I can't I'm inclined to make it the only header checker since I 
think that it's in any case mandatory and authoritative. But in an 
effort to retain the work of the previous Drill developers I'm trying to 
preserve what they implemented.

1. https://issues.apache.org/jira/browse/DRILL-6320

On 2024/01/26 14:06, P. Ottlinger wrote:
> Hi James,
>
> thanks for reaching out!
>
> Am 26.01.24 um 08:21 schrieb James Turton:
>> I'd like to ask about a feature to prevent RAT from allowing license 
>> headers to appear inside Javadoc comments  (/**) while still 
>> requiring them in Java comments (/*) in .java files. Currently the 
>> Drill project makes use of com.mycila.license-maven-plugin to reject 
>> licenses in Javadoc comments because the developers at the time 
>> didn't want license headers cluttering the Javadoc website that is 
>> generated from the source. Are you aware of  a general view on Apache 
>> license headers appearing in Javadoc pages? If preventing them from 
>> doing so is a good idea, could this become a (configurable) feature 
>> in RAT?
>
> could you be so kind to provide an example of what you want to achieve 
> and how your use case looks like?
>
> I'm afraid I do not really understand what you mean with 
> javadoc-specific licenses?
>
> At the moment we don't have a file specific parsing to exclude 
> comments - is that what you want to achieve?
>
> On the other hand if a license header is needed per file, it has to be 
> somewhere in the sources ;)
>
> Thanks,
> Phil

Re: License headers inside Javadoc comments

Posted by James Turton <dz...@apache.org>.
Yes, well spotted.

On 2024/01/29 22:35, Paul Rogers wrote:
> James,
>
> If the extra check is costly, you might also observe that all (most?)
> existing files have the proper header format. It is only new or changed
> files that must be checked. So, you can use Git to determine the change set
> on each PR and do the extra format check only on those files.
>
> - Paul
>
> On Mon, Jan 29, 2024 at 7:37 AM James Turton <dz...@apache.org> wrote:
>
>> Thank you for these explanations Claude.
>>
>> Looking at your second paragraph about the proposal to enhance the code
>> that inserts headers, a comment start definition for Java files of
>> '/*\n' (newline after the '/*') should work to accept the Apache license
>> header in a Java comment but reject it if it's in a Javadoc comment.
>> That seems promising, and I'll take a look at RAT-330, but I'm also able
>> to move forward in Drill using alternatives in the interim.
>>
>> Regards
>> James
>>
>>
>> On 2024/01/29 09:33, Claude Warren wrote:
>>> James,
>>>
>>> The in general processing for matching licenses strips out all non
>>> essential text (e.g. '/' and '*') so the current implementation can not
>>> determine if the license text is within a javadoc block or not.  Some
>>> matchers (e.g. Copyright, SPDX, and regex) do use the unmodified text but
>>> they are generally much slower.  Infact, the original SPDX and Copyright
>>> implementations caused a significant (2 order of magnitude or more)
>>> increase in processing time.  It would be possible to create a custom
>>> matcher to do what you want.  But there is no mechanism currently
>> available
>>> in the code base to only call a matcher on specific file types.
>>>
>>> There is a section of code that understands file types, but this is the
>>> code that inserts headers into files that don't have them.  It may be
>>> possible to build on that to create a custom matcher to ensure that
>> license
>>> comments are not within java docs.  There is a ticket open to modify how
>>> this code works so that new file types with comment start stop
>> definitions
>>> and restrictions on first lines and such can be defined outside of the
>>> codebase, making it possible to insert headers in as yet unrecognized
>> file
>>> formats.[1]  This might be extended and provide input to the process you
>>> are requesting.
>>>
>>> There is also a section of code that removes the non essential text.  The
>>> 'prune' method could be modified to remove blocks of code between the
>>> opening javadoc '/**' and the closing '*/'.  But this may lead to
>> problems
>>> with non java files.  Speaking of non java files have you thought about
>>> ensuring that the license does not appear in other javadoc like systems?
>>> [2]  Once this can of worms is opened we will need a way to manage all
>> the
>>> requests that will follow for other file types.
>>>
>>> If you have any ideas for implementing the change I would be interested
>> to
>>> hear them.
>>>
>>> Claude
>>>
>>> [1] https://issues.apache.org/jira/browse/RAT-330?
>>> [2]
>>>
>> https://stackoverflow.com/questions/5334531/using-javadoc-for-python-documentation
>>> On Fri, Jan 26, 2024 at 2:38 PM James Turton <dz...@apache.org> wrote:
>>>
>>>> Thanks Phil.
>>>>
>>>> Here's some background [1] which comes from before I was involved with
>>>> Drill. What they wanted was for the license header checker to accept, in
>>>> .java files,
>>>>
>>>> /*
>>>>     * Licensed to the Apache Software Foundation (ASF) under one
>>>>     * or more contributor license agreements.  See the NOTICE file
>>>>     * distributed with this work for additional information
>>>>       etc.
>>>>
>>>> but reject
>>>>
>>>> /**
>>>>     * Licensed to the Apache Software Foundation (ASF) under one
>>>>     * or more contributor license agreements.  See the NOTICE file
>>>>     * distributed with this work for additional information
>>>>       etc.
>>>>
>>>> Notice the two asterisks that open the Java comment block in the second
>>>> form thereby making it a Javadoc comment that will appear in generated
>>>> Javadoc. There are no longer any examples of the latter in Drill but
>>>> this has been enforced by the addition of the license-maven-plugin.
>>>>
>>>> I got here because I want to remove that plugin, which essentially
>>>> duplicates RAT, in favour of another (with exactly the same name :()
>>>> that can generate license and notice information for our third party
>>>> code. This last task is what I'm really doing, the Javadoc license
>>>> header rejection matter is yak shaving that came up on the road.
>>>>
>>>> So my yak shaving question is: if I make RAT Drill's only license header
>>>> checker then could I make it reject license headers of the second form?
>>>> Even if I can't I'm inclined to make it the only header checker since I
>>>> think that it's in any case mandatory and authoritative. But in an
>>>> effort to retain the work of the previous Drill developers I'm trying to
>>>> preserve what they implemented.
>>>>
>>>> 1. https://issues.apache.org/jira/browse/DRILL-6320
>>>>
>>>> On 2024/01/26 14:06, P. Ottlinger wrote:
>>>>> Hi James,
>>>>>
>>>>> thanks for reaching out!
>>>>>
>>>>> Am 26.01.24 um 08:21 schrieb James Turton:
>>>>>> I'd like to ask about a feature to prevent RAT from allowing license
>>>>>> headers to appear inside Javadoc comments  (/**) while still requiring
>>>>>> them in Java comments (/*) in .java files. Currently the Drill project
>>>>>> makes use of com.mycila.license-maven-plugin to reject licenses in
>>>>>> Javadoc comments because the developers at the time didn't want
>>>>>> license headers cluttering the Javadoc website that is generated from
>>>>>> the source. Are you aware of  a general view on Apache license headers
>>>>>> appearing in Javadoc pages? If preventing them from doing so is a good
>>>>>> idea, could this become a (configurable) feature in RAT?
>>>>> could you be so kind to provide an example of what you want to achieve
>>>>> and how your use case looks like?
>>>>>
>>>>> I'm afraid I do not really understand what you mean with
>>>>> javadoc-specific licenses?
>>>>>
>>>>> At the moment we don't have a file specific parsing to exclude comments
>>>>> - is that what you want to achieve?
>>>>>
>>>>> On the other hand if a license header is needed per file, it has to be
>>>>> somewhere in the sources ;)
>>>>>
>>>>> Thanks,
>>>>> Phil
>>


Re: License headers inside Javadoc comments

Posted by Paul Rogers <pa...@gmail.com>.
James,

If the extra check is costly, you might also observe that all (most?)
existing files have the proper header format. It is only new or changed
files that must be checked. So, you can use Git to determine the change set
on each PR and do the extra format check only on those files.

- Paul

On Mon, Jan 29, 2024 at 7:37 AM James Turton <dz...@apache.org> wrote:

> Thank you for these explanations Claude.
>
> Looking at your second paragraph about the proposal to enhance the code
> that inserts headers, a comment start definition for Java files of
> '/*\n' (newline after the '/*') should work to accept the Apache license
> header in a Java comment but reject it if it's in a Javadoc comment.
> That seems promising, and I'll take a look at RAT-330, but I'm also able
> to move forward in Drill using alternatives in the interim.
>
> Regards
> James
>
>
> On 2024/01/29 09:33, Claude Warren wrote:
> > James,
> >
> > The in general processing for matching licenses strips out all non
> > essential text (e.g. '/' and '*') so the current implementation can not
> > determine if the license text is within a javadoc block or not.  Some
> > matchers (e.g. Copyright, SPDX, and regex) do use the unmodified text but
> > they are generally much slower.  Infact, the original SPDX and Copyright
> > implementations caused a significant (2 order of magnitude or more)
> > increase in processing time.  It would be possible to create a custom
> > matcher to do what you want.  But there is no mechanism currently
> available
> > in the code base to only call a matcher on specific file types.
> >
> > There is a section of code that understands file types, but this is the
> > code that inserts headers into files that don't have them.  It may be
> > possible to build on that to create a custom matcher to ensure that
> license
> > comments are not within java docs.  There is a ticket open to modify how
> > this code works so that new file types with comment start stop
> definitions
> > and restrictions on first lines and such can be defined outside of the
> > codebase, making it possible to insert headers in as yet unrecognized
> file
> > formats.[1]  This might be extended and provide input to the process you
> > are requesting.
> >
> > There is also a section of code that removes the non essential text.  The
> > 'prune' method could be modified to remove blocks of code between the
> > opening javadoc '/**' and the closing '*/'.  But this may lead to
> problems
> > with non java files.  Speaking of non java files have you thought about
> > ensuring that the license does not appear in other javadoc like systems?
> > [2]  Once this can of worms is opened we will need a way to manage all
> the
> > requests that will follow for other file types.
> >
> > If you have any ideas for implementing the change I would be interested
> to
> > hear them.
> >
> > Claude
> >
> > [1] https://issues.apache.org/jira/browse/RAT-330?
> > [2]
> >
> https://stackoverflow.com/questions/5334531/using-javadoc-for-python-documentation
> >
> > On Fri, Jan 26, 2024 at 2:38 PM James Turton <dz...@apache.org> wrote:
> >
> >> Thanks Phil.
> >>
> >> Here's some background [1] which comes from before I was involved with
> >> Drill. What they wanted was for the license header checker to accept, in
> >> .java files,
> >>
> >> /*
> >>    * Licensed to the Apache Software Foundation (ASF) under one
> >>    * or more contributor license agreements.  See the NOTICE file
> >>    * distributed with this work for additional information
> >>      etc.
> >>
> >> but reject
> >>
> >> /**
> >>    * Licensed to the Apache Software Foundation (ASF) under one
> >>    * or more contributor license agreements.  See the NOTICE file
> >>    * distributed with this work for additional information
> >>      etc.
> >>
> >> Notice the two asterisks that open the Java comment block in the second
> >> form thereby making it a Javadoc comment that will appear in generated
> >> Javadoc. There are no longer any examples of the latter in Drill but
> >> this has been enforced by the addition of the license-maven-plugin.
> >>
> >> I got here because I want to remove that plugin, which essentially
> >> duplicates RAT, in favour of another (with exactly the same name :()
> >> that can generate license and notice information for our third party
> >> code. This last task is what I'm really doing, the Javadoc license
> >> header rejection matter is yak shaving that came up on the road.
> >>
> >> So my yak shaving question is: if I make RAT Drill's only license header
> >> checker then could I make it reject license headers of the second form?
> >> Even if I can't I'm inclined to make it the only header checker since I
> >> think that it's in any case mandatory and authoritative. But in an
> >> effort to retain the work of the previous Drill developers I'm trying to
> >> preserve what they implemented.
> >>
> >> 1. https://issues.apache.org/jira/browse/DRILL-6320
> >>
> >> On 2024/01/26 14:06, P. Ottlinger wrote:
> >>> Hi James,
> >>>
> >>> thanks for reaching out!
> >>>
> >>> Am 26.01.24 um 08:21 schrieb James Turton:
> >>>> I'd like to ask about a feature to prevent RAT from allowing license
> >>>> headers to appear inside Javadoc comments  (/**) while still requiring
> >>>> them in Java comments (/*) in .java files. Currently the Drill project
> >>>> makes use of com.mycila.license-maven-plugin to reject licenses in
> >>>> Javadoc comments because the developers at the time didn't want
> >>>> license headers cluttering the Javadoc website that is generated from
> >>>> the source. Are you aware of  a general view on Apache license headers
> >>>> appearing in Javadoc pages? If preventing them from doing so is a good
> >>>> idea, could this become a (configurable) feature in RAT?
> >>> could you be so kind to provide an example of what you want to achieve
> >>> and how your use case looks like?
> >>>
> >>> I'm afraid I do not really understand what you mean with
> >>> javadoc-specific licenses?
> >>>
> >>> At the moment we don't have a file specific parsing to exclude comments
> >>> - is that what you want to achieve?
> >>>
> >>> On the other hand if a license header is needed per file, it has to be
> >>> somewhere in the sources ;)
> >>>
> >>> Thanks,
> >>> Phil
> >
>
>

Re: License headers inside Javadoc comments

Posted by James Turton <dz...@apache.org>.
Thank you for these explanations Claude.

Looking at your second paragraph about the proposal to enhance the code 
that inserts headers, a comment start definition for Java files of 
'/*\n' (newline after the '/*') should work to accept the Apache license 
header in a Java comment but reject it if it's in a Javadoc comment. 
That seems promising, and I'll take a look at RAT-330, but I'm also able 
to move forward in Drill using alternatives in the interim.

Regards
James


On 2024/01/29 09:33, Claude Warren wrote:
> James,
>
> The in general processing for matching licenses strips out all non
> essential text (e.g. '/' and '*') so the current implementation can not
> determine if the license text is within a javadoc block or not.  Some
> matchers (e.g. Copyright, SPDX, and regex) do use the unmodified text but
> they are generally much slower.  Infact, the original SPDX and Copyright
> implementations caused a significant (2 order of magnitude or more)
> increase in processing time.  It would be possible to create a custom
> matcher to do what you want.  But there is no mechanism currently available
> in the code base to only call a matcher on specific file types.
>
> There is a section of code that understands file types, but this is the
> code that inserts headers into files that don't have them.  It may be
> possible to build on that to create a custom matcher to ensure that license
> comments are not within java docs.  There is a ticket open to modify how
> this code works so that new file types with comment start stop definitions
> and restrictions on first lines and such can be defined outside of the
> codebase, making it possible to insert headers in as yet unrecognized file
> formats.[1]  This might be extended and provide input to the process you
> are requesting.
>
> There is also a section of code that removes the non essential text.  The
> 'prune' method could be modified to remove blocks of code between the
> opening javadoc '/**' and the closing '*/'.  But this may lead to problems
> with non java files.  Speaking of non java files have you thought about
> ensuring that the license does not appear in other javadoc like systems?
> [2]  Once this can of worms is opened we will need a way to manage all the
> requests that will follow for other file types.
>
> If you have any ideas for implementing the change I would be interested to
> hear them.
>
> Claude
>
> [1] https://issues.apache.org/jira/browse/RAT-330?
> [2]
> https://stackoverflow.com/questions/5334531/using-javadoc-for-python-documentation
>
> On Fri, Jan 26, 2024 at 2:38 PM James Turton <dz...@apache.org> wrote:
>
>> Thanks Phil.
>>
>> Here's some background [1] which comes from before I was involved with
>> Drill. What they wanted was for the license header checker to accept, in
>> .java files,
>>
>> /*
>>    * Licensed to the Apache Software Foundation (ASF) under one
>>    * or more contributor license agreements.  See the NOTICE file
>>    * distributed with this work for additional information
>>      etc.
>>
>> but reject
>>
>> /**
>>    * Licensed to the Apache Software Foundation (ASF) under one
>>    * or more contributor license agreements.  See the NOTICE file
>>    * distributed with this work for additional information
>>      etc.
>>
>> Notice the two asterisks that open the Java comment block in the second
>> form thereby making it a Javadoc comment that will appear in generated
>> Javadoc. There are no longer any examples of the latter in Drill but
>> this has been enforced by the addition of the license-maven-plugin.
>>
>> I got here because I want to remove that plugin, which essentially
>> duplicates RAT, in favour of another (with exactly the same name :()
>> that can generate license and notice information for our third party
>> code. This last task is what I'm really doing, the Javadoc license
>> header rejection matter is yak shaving that came up on the road.
>>
>> So my yak shaving question is: if I make RAT Drill's only license header
>> checker then could I make it reject license headers of the second form?
>> Even if I can't I'm inclined to make it the only header checker since I
>> think that it's in any case mandatory and authoritative. But in an
>> effort to retain the work of the previous Drill developers I'm trying to
>> preserve what they implemented.
>>
>> 1. https://issues.apache.org/jira/browse/DRILL-6320
>>
>> On 2024/01/26 14:06, P. Ottlinger wrote:
>>> Hi James,
>>>
>>> thanks for reaching out!
>>>
>>> Am 26.01.24 um 08:21 schrieb James Turton:
>>>> I'd like to ask about a feature to prevent RAT from allowing license
>>>> headers to appear inside Javadoc comments  (/**) while still requiring
>>>> them in Java comments (/*) in .java files. Currently the Drill project
>>>> makes use of com.mycila.license-maven-plugin to reject licenses in
>>>> Javadoc comments because the developers at the time didn't want
>>>> license headers cluttering the Javadoc website that is generated from
>>>> the source. Are you aware of  a general view on Apache license headers
>>>> appearing in Javadoc pages? If preventing them from doing so is a good
>>>> idea, could this become a (configurable) feature in RAT?
>>> could you be so kind to provide an example of what you want to achieve
>>> and how your use case looks like?
>>>
>>> I'm afraid I do not really understand what you mean with
>>> javadoc-specific licenses?
>>>
>>> At the moment we don't have a file specific parsing to exclude comments
>>> - is that what you want to achieve?
>>>
>>> On the other hand if a license header is needed per file, it has to be
>>> somewhere in the sources ;)
>>>
>>> Thanks,
>>> Phil
>


Re: License headers inside Javadoc comments

Posted by James Turton <dz...@apache.org>.
Thank you for these explanations Claude.

Looking at your second paragraph about the proposal to enhance the code 
that inserts headers, a comment start definition for Java files of 
'/*\n' (newline after the '/*') should work to accept the Apache license 
header in a Java comment but reject it if it's in a Javadoc comment. 
That seems promising, and I'll take a look at RAT-330, but I'm also able 
to move forward in Drill using alternatives in the interim.

Regards
James


On 2024/01/29 09:33, Claude Warren wrote:
> James,
>
> The in general processing for matching licenses strips out all non
> essential text (e.g. '/' and '*') so the current implementation can not
> determine if the license text is within a javadoc block or not.  Some
> matchers (e.g. Copyright, SPDX, and regex) do use the unmodified text but
> they are generally much slower.  Infact, the original SPDX and Copyright
> implementations caused a significant (2 order of magnitude or more)
> increase in processing time.  It would be possible to create a custom
> matcher to do what you want.  But there is no mechanism currently available
> in the code base to only call a matcher on specific file types.
>
> There is a section of code that understands file types, but this is the
> code that inserts headers into files that don't have them.  It may be
> possible to build on that to create a custom matcher to ensure that license
> comments are not within java docs.  There is a ticket open to modify how
> this code works so that new file types with comment start stop definitions
> and restrictions on first lines and such can be defined outside of the
> codebase, making it possible to insert headers in as yet unrecognized file
> formats.[1]  This might be extended and provide input to the process you
> are requesting.
>
> There is also a section of code that removes the non essential text.  The
> 'prune' method could be modified to remove blocks of code between the
> opening javadoc '/**' and the closing '*/'.  But this may lead to problems
> with non java files.  Speaking of non java files have you thought about
> ensuring that the license does not appear in other javadoc like systems?
> [2]  Once this can of worms is opened we will need a way to manage all the
> requests that will follow for other file types.
>
> If you have any ideas for implementing the change I would be interested to
> hear them.
>
> Claude
>
> [1] https://issues.apache.org/jira/browse/RAT-330?
> [2]
> https://stackoverflow.com/questions/5334531/using-javadoc-for-python-documentation
>
> On Fri, Jan 26, 2024 at 2:38 PM James Turton <dz...@apache.org> wrote:
>
>> Thanks Phil.
>>
>> Here's some background [1] which comes from before I was involved with
>> Drill. What they wanted was for the license header checker to accept, in
>> .java files,
>>
>> /*
>>    * Licensed to the Apache Software Foundation (ASF) under one
>>    * or more contributor license agreements.  See the NOTICE file
>>    * distributed with this work for additional information
>>      etc.
>>
>> but reject
>>
>> /**
>>    * Licensed to the Apache Software Foundation (ASF) under one
>>    * or more contributor license agreements.  See the NOTICE file
>>    * distributed with this work for additional information
>>      etc.
>>
>> Notice the two asterisks that open the Java comment block in the second
>> form thereby making it a Javadoc comment that will appear in generated
>> Javadoc. There are no longer any examples of the latter in Drill but
>> this has been enforced by the addition of the license-maven-plugin.
>>
>> I got here because I want to remove that plugin, which essentially
>> duplicates RAT, in favour of another (with exactly the same name :()
>> that can generate license and notice information for our third party
>> code. This last task is what I'm really doing, the Javadoc license
>> header rejection matter is yak shaving that came up on the road.
>>
>> So my yak shaving question is: if I make RAT Drill's only license header
>> checker then could I make it reject license headers of the second form?
>> Even if I can't I'm inclined to make it the only header checker since I
>> think that it's in any case mandatory and authoritative. But in an
>> effort to retain the work of the previous Drill developers I'm trying to
>> preserve what they implemented.
>>
>> 1. https://issues.apache.org/jira/browse/DRILL-6320
>>
>> On 2024/01/26 14:06, P. Ottlinger wrote:
>>> Hi James,
>>>
>>> thanks for reaching out!
>>>
>>> Am 26.01.24 um 08:21 schrieb James Turton:
>>>> I'd like to ask about a feature to prevent RAT from allowing license
>>>> headers to appear inside Javadoc comments  (/**) while still requiring
>>>> them in Java comments (/*) in .java files. Currently the Drill project
>>>> makes use of com.mycila.license-maven-plugin to reject licenses in
>>>> Javadoc comments because the developers at the time didn't want
>>>> license headers cluttering the Javadoc website that is generated from
>>>> the source. Are you aware of  a general view on Apache license headers
>>>> appearing in Javadoc pages? If preventing them from doing so is a good
>>>> idea, could this become a (configurable) feature in RAT?
>>> could you be so kind to provide an example of what you want to achieve
>>> and how your use case looks like?
>>>
>>> I'm afraid I do not really understand what you mean with
>>> javadoc-specific licenses?
>>>
>>> At the moment we don't have a file specific parsing to exclude comments
>>> - is that what you want to achieve?
>>>
>>> On the other hand if a license header is needed per file, it has to be
>>> somewhere in the sources ;)
>>>
>>> Thanks,
>>> Phil
>


Re: License headers inside Javadoc comments

Posted by Claude Warren <cl...@xenei.com>.
James,

The in general processing for matching licenses strips out all non
essential text (e.g. '/' and '*') so the current implementation can not
determine if the license text is within a javadoc block or not.  Some
matchers (e.g. Copyright, SPDX, and regex) do use the unmodified text but
they are generally much slower.  Infact, the original SPDX and Copyright
implementations caused a significant (2 order of magnitude or more)
increase in processing time.  It would be possible to create a custom
matcher to do what you want.  But there is no mechanism currently available
in the code base to only call a matcher on specific file types.

There is a section of code that understands file types, but this is the
code that inserts headers into files that don't have them.  It may be
possible to build on that to create a custom matcher to ensure that license
comments are not within java docs.  There is a ticket open to modify how
this code works so that new file types with comment start stop definitions
and restrictions on first lines and such can be defined outside of the
codebase, making it possible to insert headers in as yet unrecognized file
formats.[1]  This might be extended and provide input to the process you
are requesting.

There is also a section of code that removes the non essential text.  The
'prune' method could be modified to remove blocks of code between the
opening javadoc '/**' and the closing '*/'.  But this may lead to problems
with non java files.  Speaking of non java files have you thought about
ensuring that the license does not appear in other javadoc like systems?
[2]  Once this can of worms is opened we will need a way to manage all the
requests that will follow for other file types.

If you have any ideas for implementing the change I would be interested to
hear them.

Claude

[1] https://issues.apache.org/jira/browse/RAT-330?
[2]
https://stackoverflow.com/questions/5334531/using-javadoc-for-python-documentation

On Fri, Jan 26, 2024 at 2:38 PM James Turton <dz...@apache.org> wrote:

> Thanks Phil.
>
> Here's some background [1] which comes from before I was involved with
> Drill. What they wanted was for the license header checker to accept, in
> .java files,
>
> /*
>   * Licensed to the Apache Software Foundation (ASF) under one
>   * or more contributor license agreements.  See the NOTICE file
>   * distributed with this work for additional information
>     etc.
>
> but reject
>
> /**
>   * Licensed to the Apache Software Foundation (ASF) under one
>   * or more contributor license agreements.  See the NOTICE file
>   * distributed with this work for additional information
>     etc.
>
> Notice the two asterisks that open the Java comment block in the second
> form thereby making it a Javadoc comment that will appear in generated
> Javadoc. There are no longer any examples of the latter in Drill but
> this has been enforced by the addition of the license-maven-plugin.
>
> I got here because I want to remove that plugin, which essentially
> duplicates RAT, in favour of another (with exactly the same name :()
> that can generate license and notice information for our third party
> code. This last task is what I'm really doing, the Javadoc license
> header rejection matter is yak shaving that came up on the road.
>
> So my yak shaving question is: if I make RAT Drill's only license header
> checker then could I make it reject license headers of the second form?
> Even if I can't I'm inclined to make it the only header checker since I
> think that it's in any case mandatory and authoritative. But in an
> effort to retain the work of the previous Drill developers I'm trying to
> preserve what they implemented.
>
> 1. https://issues.apache.org/jira/browse/DRILL-6320
>
> On 2024/01/26 14:06, P. Ottlinger wrote:
> > Hi James,
> >
> > thanks for reaching out!
> >
> > Am 26.01.24 um 08:21 schrieb James Turton:
> >> I'd like to ask about a feature to prevent RAT from allowing license
> >> headers to appear inside Javadoc comments  (/**) while still requiring
> >> them in Java comments (/*) in .java files. Currently the Drill project
> >> makes use of com.mycila.license-maven-plugin to reject licenses in
> >> Javadoc comments because the developers at the time didn't want
> >> license headers cluttering the Javadoc website that is generated from
> >> the source. Are you aware of  a general view on Apache license headers
> >> appearing in Javadoc pages? If preventing them from doing so is a good
> >> idea, could this become a (configurable) feature in RAT?
> >
> > could you be so kind to provide an example of what you want to achieve
> > and how your use case looks like?
> >
> > I'm afraid I do not really understand what you mean with
> > javadoc-specific licenses?
> >
> > At the moment we don't have a file specific parsing to exclude comments
> > - is that what you want to achieve?
> >
> > On the other hand if a license header is needed per file, it has to be
> > somewhere in the sources ;)
> >
> > Thanks,
> > Phil
>


-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: License headers inside Javadoc comments

Posted by Claude Warren <cl...@xenei.com>.
James,

The in general processing for matching licenses strips out all non
essential text (e.g. '/' and '*') so the current implementation can not
determine if the license text is within a javadoc block or not.  Some
matchers (e.g. Copyright, SPDX, and regex) do use the unmodified text but
they are generally much slower.  Infact, the original SPDX and Copyright
implementations caused a significant (2 order of magnitude or more)
increase in processing time.  It would be possible to create a custom
matcher to do what you want.  But there is no mechanism currently available
in the code base to only call a matcher on specific file types.

There is a section of code that understands file types, but this is the
code that inserts headers into files that don't have them.  It may be
possible to build on that to create a custom matcher to ensure that license
comments are not within java docs.  There is a ticket open to modify how
this code works so that new file types with comment start stop definitions
and restrictions on first lines and such can be defined outside of the
codebase, making it possible to insert headers in as yet unrecognized file
formats.[1]  This might be extended and provide input to the process you
are requesting.

There is also a section of code that removes the non essential text.  The
'prune' method could be modified to remove blocks of code between the
opening javadoc '/**' and the closing '*/'.  But this may lead to problems
with non java files.  Speaking of non java files have you thought about
ensuring that the license does not appear in other javadoc like systems?
[2]  Once this can of worms is opened we will need a way to manage all the
requests that will follow for other file types.

If you have any ideas for implementing the change I would be interested to
hear them.

Claude

[1] https://issues.apache.org/jira/browse/RAT-330?
[2]
https://stackoverflow.com/questions/5334531/using-javadoc-for-python-documentation

On Fri, Jan 26, 2024 at 2:38 PM James Turton <dz...@apache.org> wrote:

> Thanks Phil.
>
> Here's some background [1] which comes from before I was involved with
> Drill. What they wanted was for the license header checker to accept, in
> .java files,
>
> /*
>   * Licensed to the Apache Software Foundation (ASF) under one
>   * or more contributor license agreements.  See the NOTICE file
>   * distributed with this work for additional information
>     etc.
>
> but reject
>
> /**
>   * Licensed to the Apache Software Foundation (ASF) under one
>   * or more contributor license agreements.  See the NOTICE file
>   * distributed with this work for additional information
>     etc.
>
> Notice the two asterisks that open the Java comment block in the second
> form thereby making it a Javadoc comment that will appear in generated
> Javadoc. There are no longer any examples of the latter in Drill but
> this has been enforced by the addition of the license-maven-plugin.
>
> I got here because I want to remove that plugin, which essentially
> duplicates RAT, in favour of another (with exactly the same name :()
> that can generate license and notice information for our third party
> code. This last task is what I'm really doing, the Javadoc license
> header rejection matter is yak shaving that came up on the road.
>
> So my yak shaving question is: if I make RAT Drill's only license header
> checker then could I make it reject license headers of the second form?
> Even if I can't I'm inclined to make it the only header checker since I
> think that it's in any case mandatory and authoritative. But in an
> effort to retain the work of the previous Drill developers I'm trying to
> preserve what they implemented.
>
> 1. https://issues.apache.org/jira/browse/DRILL-6320
>
> On 2024/01/26 14:06, P. Ottlinger wrote:
> > Hi James,
> >
> > thanks for reaching out!
> >
> > Am 26.01.24 um 08:21 schrieb James Turton:
> >> I'd like to ask about a feature to prevent RAT from allowing license
> >> headers to appear inside Javadoc comments  (/**) while still requiring
> >> them in Java comments (/*) in .java files. Currently the Drill project
> >> makes use of com.mycila.license-maven-plugin to reject licenses in
> >> Javadoc comments because the developers at the time didn't want
> >> license headers cluttering the Javadoc website that is generated from
> >> the source. Are you aware of  a general view on Apache license headers
> >> appearing in Javadoc pages? If preventing them from doing so is a good
> >> idea, could this become a (configurable) feature in RAT?
> >
> > could you be so kind to provide an example of what you want to achieve
> > and how your use case looks like?
> >
> > I'm afraid I do not really understand what you mean with
> > javadoc-specific licenses?
> >
> > At the moment we don't have a file specific parsing to exclude comments
> > - is that what you want to achieve?
> >
> > On the other hand if a license header is needed per file, it has to be
> > somewhere in the sources ;)
> >
> > Thanks,
> > Phil
>


-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: License headers inside Javadoc comments

Posted by James Turton <dz...@apache.org>.
Thanks Phil.

Here's some background [1] which comes from before I was involved with 
Drill. What they wanted was for the license header checker to accept, in 
.java files,

/*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
    etc.

but reject

/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
    etc.

Notice the two asterisks that open the Java comment block in the second 
form thereby making it a Javadoc comment that will appear in generated 
Javadoc. There are no longer any examples of the latter in Drill but 
this has been enforced by the addition of the license-maven-plugin.

I got here because I want to remove that plugin, which essentially 
duplicates RAT, in favour of another (with exactly the same name :() 
that can generate license and notice information for our third party 
code. This last task is what I'm really doing, the Javadoc license 
header rejection matter is yak shaving that came up on the road.

So my yak shaving question is: if I make RAT Drill's only license header 
checker then could I make it reject license headers of the second form? 
Even if I can't I'm inclined to make it the only header checker since I 
think that it's in any case mandatory and authoritative. But in an 
effort to retain the work of the previous Drill developers I'm trying to 
preserve what they implemented.

1. https://issues.apache.org/jira/browse/DRILL-6320

On 2024/01/26 14:06, P. Ottlinger wrote:
> Hi James,
> 
> thanks for reaching out!
> 
> Am 26.01.24 um 08:21 schrieb James Turton:
>> I'd like to ask about a feature to prevent RAT from allowing license 
>> headers to appear inside Javadoc comments  (/**) while still requiring 
>> them in Java comments (/*) in .java files. Currently the Drill project 
>> makes use of com.mycila.license-maven-plugin to reject licenses in 
>> Javadoc comments because the developers at the time didn't want 
>> license headers cluttering the Javadoc website that is generated from 
>> the source. Are you aware of  a general view on Apache license headers 
>> appearing in Javadoc pages? If preventing them from doing so is a good 
>> idea, could this become a (configurable) feature in RAT?
> 
> could you be so kind to provide an example of what you want to achieve 
> and how your use case looks like?
> 
> I'm afraid I do not really understand what you mean with 
> javadoc-specific licenses?
> 
> At the moment we don't have a file specific parsing to exclude comments 
> - is that what you want to achieve?
> 
> On the other hand if a license header is needed per file, it has to be 
> somewhere in the sources ;)
> 
> Thanks,
> Phil

Re: License headers inside Javadoc comments

Posted by James Turton <dz...@apache.org>.
Thanks Phil.

Here's some background [1] which comes from before I was involved with 
Drill. What they wanted was for the license header checker to accept, in 
.java files,

/*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
    etc.

but reject

/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
    etc.

Notice the two asterisks that open the Java comment block in the second 
form thereby making it a Javadoc comment that will appear in generated 
Javadoc. There are no longer any examples of the latter in Drill but 
this has been enforced by the addition of the license-maven-plugin.

I got here because I want to remove that plugin, which essentially 
duplicates RAT, in favour of another (with exactly the same name :() 
that can generate license and notice information for our third party 
code. This last task is what I'm really doing, the Javadoc license 
header rejection matter is yak shaving that came up on the road.

So my yak shaving question is: if I make RAT Drill's only license header 
checker then could I make it reject license headers of the second form? 
Even if I can't I'm inclined to make it the only header checker since I 
think that it's in any case mandatory and authoritative. But in an 
effort to retain the work of the previous Drill developers I'm trying to 
preserve what they implemented.

1. https://issues.apache.org/jira/browse/DRILL-6320

On 2024/01/26 14:06, P. Ottlinger wrote:
> Hi James,
> 
> thanks for reaching out!
> 
> Am 26.01.24 um 08:21 schrieb James Turton:
>> I'd like to ask about a feature to prevent RAT from allowing license 
>> headers to appear inside Javadoc comments  (/**) while still requiring 
>> them in Java comments (/*) in .java files. Currently the Drill project 
>> makes use of com.mycila.license-maven-plugin to reject licenses in 
>> Javadoc comments because the developers at the time didn't want 
>> license headers cluttering the Javadoc website that is generated from 
>> the source. Are you aware of  a general view on Apache license headers 
>> appearing in Javadoc pages? If preventing them from doing so is a good 
>> idea, could this become a (configurable) feature in RAT?
> 
> could you be so kind to provide an example of what you want to achieve 
> and how your use case looks like?
> 
> I'm afraid I do not really understand what you mean with 
> javadoc-specific licenses?
> 
> At the moment we don't have a file specific parsing to exclude comments 
> - is that what you want to achieve?
> 
> On the other hand if a license header is needed per file, it has to be 
> somewhere in the sources ;)
> 
> Thanks,
> Phil

Re: License headers inside Javadoc comments

Posted by "P. Ottlinger" <po...@apache.org>.
Hi James,

thanks for reaching out!

Am 26.01.24 um 08:21 schrieb James Turton:
> I'd like to ask about a feature to prevent RAT from allowing license 
> headers to appear inside Javadoc comments  (/**) while still requiring 
> them in Java comments (/*) in .java files. Currently the Drill project 
> makes use of com.mycila.license-maven-plugin to reject licenses in 
> Javadoc comments because the developers at the time didn't want license 
> headers cluttering the Javadoc website that is generated from the 
> source. Are you aware of  a general view on Apache license headers 
> appearing in Javadoc pages? If preventing them from doing so is a good 
> idea, could this become a (configurable) feature in RAT?

could you be so kind to provide an example of what you want to achieve 
and how your use case looks like?

I'm afraid I do not really understand what you mean with 
javadoc-specific licenses?

At the moment we don't have a file specific parsing to exclude comments 
- is that what you want to achieve?

On the other hand if a license header is needed per file, it has to be 
somewhere in the sources ;)

Thanks,
Phil

Re: License headers inside Javadoc comments

Posted by James Turton <ja...@somecomputer.xyz.INVALID>.
Thanks Paul
> I don't know how to configure the license plugin. But, I do suspect a
> Python file (or shell script) could make a one-time pass over the files to
> standardize headers into whatever format the team chooses. Only the first
> line of each file would change.
I put more information in my reply to the RAT devs but exactly this. 
Adding a
grep or a sed command to the build or to the release instructions would 
surely be
sufficient. I will likely replace the license-maven-plugin that was 
brought in with
something that does other needed things, like report our third party 
licenses.

On 2024/01/26 09:59, Paul Rogers wrote:
> Hi James,
>
> For some reason, Drill started with the license headers in Javadoc
> comments. The (weak) explanation I got was that we never generate Javadoc,
> so it didn't really matter. Later, we started converting the headers to
> regular comments when convenient.
>
> If we were to generate Javadoc, having the license at the top of each page
> as the summary for each class would probably not be something that anyone
> finds useful.
>
> I don't know how to configure the license plugin. But, I do suspect a
> Python file (or shell script) could make a one-time pass over the files to
> standardize headers into whatever format the team chooses. Only the first
> line of each file would change.
>
> - Paul
>
> On Thu, Jan 25, 2024 at 11:22 PM James Turton <dz...@apache.org> wrote:
>
>> Good morning!
>>
>> I'd like to ask about a feature to prevent RAT from allowing license
>> headers to appear inside Javadoc comments  (/**) while still requiring
>> them in Java comments (/*) in .java files. Currently the Drill project
>> makes use of com.mycila.license-maven-plugin to reject licenses in
>> Javadoc comments because the developers at the time didn't want license
>> headers cluttering the Javadoc website that is generated from the
>> source. Are you aware of  a general view on Apache license headers
>> appearing in Javadoc pages? If preventing them from doing so is a good
>> idea, could this become a (configurable) feature in RAT?
>>
>> Thanks
>> James Turton
>>


Re: License headers inside Javadoc comments

Posted by Ted Dunning <te...@gmail.com>.
The right way to get a copyright on every page is to tweak the javadoc
command to use a different template (I would think).



On Fri, Jan 26, 2024 at 12:00 AM Paul Rogers <pa...@gmail.com> wrote:

> Hi James,
>
> For some reason, Drill started with the license headers in Javadoc
> comments. The (weak) explanation I got was that we never generate Javadoc,
> so it didn't really matter. Later, we started converting the headers to
> regular comments when convenient.
>
> If we were to generate Javadoc, having the license at the top of each page
> as the summary for each class would probably not be something that anyone
> finds useful.
>
> I don't know how to configure the license plugin. But, I do suspect a
> Python file (or shell script) could make a one-time pass over the files to
> standardize headers into whatever format the team chooses. Only the first
> line of each file would change.
>
> - Paul
>
> On Thu, Jan 25, 2024 at 11:22 PM James Turton <dz...@apache.org> wrote:
>
> > Good morning!
> >
> > I'd like to ask about a feature to prevent RAT from allowing license
> > headers to appear inside Javadoc comments  (/**) while still requiring
> > them in Java comments (/*) in .java files. Currently the Drill project
> > makes use of com.mycila.license-maven-plugin to reject licenses in
> > Javadoc comments because the developers at the time didn't want license
> > headers cluttering the Javadoc website that is generated from the
> > source. Are you aware of  a general view on Apache license headers
> > appearing in Javadoc pages? If preventing them from doing so is a good
> > idea, could this become a (configurable) feature in RAT?
> >
> > Thanks
> > James Turton
> >
>

Re: License headers inside Javadoc comments

Posted by Paul Rogers <pa...@gmail.com>.
Hi James,

For some reason, Drill started with the license headers in Javadoc
comments. The (weak) explanation I got was that we never generate Javadoc,
so it didn't really matter. Later, we started converting the headers to
regular comments when convenient.

If we were to generate Javadoc, having the license at the top of each page
as the summary for each class would probably not be something that anyone
finds useful.

I don't know how to configure the license plugin. But, I do suspect a
Python file (or shell script) could make a one-time pass over the files to
standardize headers into whatever format the team chooses. Only the first
line of each file would change.

- Paul

On Thu, Jan 25, 2024 at 11:22 PM James Turton <dz...@apache.org> wrote:

> Good morning!
>
> I'd like to ask about a feature to prevent RAT from allowing license
> headers to appear inside Javadoc comments  (/**) while still requiring
> them in Java comments (/*) in .java files. Currently the Drill project
> makes use of com.mycila.license-maven-plugin to reject licenses in
> Javadoc comments because the developers at the time didn't want license
> headers cluttering the Javadoc website that is generated from the
> source. Are you aware of  a general view on Apache license headers
> appearing in Javadoc pages? If preventing them from doing so is a good
> idea, could this become a (configurable) feature in RAT?
>
> Thanks
> James Turton
>

Re: License headers inside Javadoc comments

Posted by "P. Ottlinger" <po...@apache.org>.
Hi James,

thanks for reaching out!

Am 26.01.24 um 08:21 schrieb James Turton:
> I'd like to ask about a feature to prevent RAT from allowing license 
> headers to appear inside Javadoc comments  (/**) while still requiring 
> them in Java comments (/*) in .java files. Currently the Drill project 
> makes use of com.mycila.license-maven-plugin to reject licenses in 
> Javadoc comments because the developers at the time didn't want license 
> headers cluttering the Javadoc website that is generated from the 
> source. Are you aware of  a general view on Apache license headers 
> appearing in Javadoc pages? If preventing them from doing so is a good 
> idea, could this become a (configurable) feature in RAT?

could you be so kind to provide an example of what you want to achieve 
and how your use case looks like?

I'm afraid I do not really understand what you mean with 
javadoc-specific licenses?

At the moment we don't have a file specific parsing to exclude comments 
- is that what you want to achieve?

On the other hand if a license header is needed per file, it has to be 
somewhere in the sources ;)

Thanks,
Phil