You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by GitBox <gi...@apache.org> on 2020/09/17 19:02:53 UTC

[GitHub] [incubator-nuttx] v01d opened a new pull request #1834: License/authorship handling scrips (wip)

v01d opened a new pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834


   ## Summary
   
   This is a work in progress to create a series of scripts to obtain the history of a file in the repository and extract all  known sources of authorship so that we can determine if it is possible to change the header to Apache depending on each case.
   A first script (`log2json.sh`) receives a filename (path within the repo) and uses `git log` to obtain various metadata from each commit of the file's history, as well as the blob hash in each case, in order to later retrieve the file contents at each commit.
   The script uses `jq` command as part of this processing (available in Ubuntu).
   
   The second is a python script (`check.py`) which receives the json file as input (or from STDIN) and tries to extract any attributions in the commit messages as well as author information from license headers. Currently this is just a proof of concept and this would require tuning the regular expressions to improve detection of this information. The script also currently simply prints the information detected in the console.
   
   The idea would be that this script could, given a list of authors known to have ICLAs, check if a given file is suitable for safe change of its header into Apache. The script could be actually used also for making this change automatically and create a commit with all the reasons on why this is safe to do.
   
   ## Impact
   
   None, just scripts for generating reports.
   
   ## Testing
   
   You can run the following example, from within `tools/licensing`:
   
   <pre>
   ./log2json.sh ../configure.c > out.json
   ./check.py out.json
   </pre>
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694497704


   
   > Greg, I was looking for example commits for where you have attributed a change to some other author (ie: a commit with you as author but with message in the commit indicating it was authored by someone else). I would need this example to know how to write the regex to detect it and also test the regex. For now I'm just looking for "author:" as I have seen one example of this.
   
   Try
   
       $ git log | grep -i "contributed by"
   
   For example:
   
       Incoporate new ARMv7-M exception handling logic contributed by Mike Smith
   
   I used (and still use) "noted by" or "reported by" if some suggests a change, but I implemented it.  So those names should be ignored.
   
   But there are other cases like:
   
       Big refactoring of toolchain definitions by Mike Smith
       ez80Acclaim fixes from Kevin Franzen
   
   That are more difficult to pick out.
   
   > Also, how can we get an authoritative list of committers who have ICLAs? Ideally this should be retrievable from some Apache URL so that it can be made to pull the list when needed.
   
   Well, of course,  you can get the list of all committer from the PPMC page.  But there are additional people who have provided ICLAs as well, I think (although they may have become committers too).  I don't know how to find them other checking private@nuttx.apache.org.  Wouldn't all ICLAs be reported there?  Might be best to contact the ASF secretary to see if there is some authoritative list of ICLAs.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694480221


   Greg, I was looking for example commits for where you have attributed a change to some other author (ie: a commit with you as author but with message in the commit indicating it was authored by someone else). I would need this example to know how to write the regex to detect it and also test the regex. For now I'm just looking for "author:" as I have seen one example of this.
   
   Also, how can we get an authoritative list of committers who have ICLAs? Ideally this should be retrievable from some Apache URL so that it can be made to pull the list when needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694508492


   One example of a file in arch is the following: `arch/arm/src/samd5e5/sam_tc.h`.
   If you run: `git --no-abbrev --raw --follow -- ../../arch/arm/src/samd5e5/sam_tc.h` you can find for example commit 1767b21d3cdfd1e2988a1226d42ff40f6a79122e
   (the second hash at the bottom of the diff should be the blob hash for that file)
   in which indeed part of the diff is an update to the submodule commit but of course there's nothing about this file there.
   If you then request the blob: `git cat-file -p 4a18471087626499f2f57a07238a32cfe803e6d0`
   you get `fatal: Not a valid object name 4a18471087626499f2f57a07238a32cfe803e6d0`.
   So I'm not sure if the contents of the file at these parts of the history will be there.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] justinmclean commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
justinmclean commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694550504


   Hi,
   
   If you come up with a list of people I can check for you.
   
   Justin
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694510627


   Hmmm... This is all project management stuff.  Perhaps you have changed your mind?  Perhaps you should be on the PPMC?  Are you interested?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on a change in pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on a change in pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#discussion_r490527214



##########
File path: tools/licensing/check.py
##########
@@ -0,0 +1,43 @@
+#!/usr/bin/env python3
+

Review comment:
       Needs Apache 2.0 header




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-695039290


   Does anyone want to give this a try? It will not work for now for files under arch/ boards/ (since these were submodules). I'll try to fix that later.
   
   I also added a script to replace a license header with that of apache for convenience.
   
   In my own tests I found most files have at least one author without ICLA so I think the first step would be to gather a list of authors whose ICLA we could ask. I even managed to pick up an author only mentioned in the commit message, so that also helps (it errs on the side of false negatives, so it requires human analysis to interpret the situation of a file). I also found one file that could be "apachized".
   
   For the easy files that can be converted I'm thinking that we could follow this approach:
     1. Run the check tool, if all (real) authors have ICLAs and there are no other copyrights pointing to companies,
     2. Run the tool again with -vv verbosity and try to go over the analysis for each commit at least to see if we're not missing something obvious
     3. Run apachize.py to change the header
     4. Run the tool as in 2. but with -n parameter (no color output) and send this to a file
     5. Open a PR for the file, attach the report for traceability


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694541151


   @justinmclean sorry, where is that foundation SVN? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694509810


   > > Also, how can we get an authoritative list of committers who have ICLAs? Ideally this should be retrievable from some Apache URL so that it can be made to pull the list when needed.
   > 
   > You will also need the list CCLAs and SGAs. I know that Xiaomi, Expressif, and I have all provided SGAs. Are their others?
   
   The point is that this list grows so I need an automated process to retrieve it (wget somewhere would be great).
   @justinmclean is there something like that? An URL with an up-to-date listing of ICLAs/SGAs/etc for a given Apache project?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694965201


   Once I get this completed with the missing features I mentioned, I think we should test it over different files and try to validate the whole detection process to gain confidence in the results. Some regular expressions may need to be tuned.
   Most notably, the "attributions" part (author mentioned in commit message) will be the most flaky.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694500948


   Ok, that gives me some info already on how to look for those. It doesn't have to be exact, I suspect it will be a semi-manual process with human verification anyway. 
   
   Sorry for the name dropping, was just an example.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694548548


   > > @justinmclean sorry, where is that foundation SVN?
   > 
   > Unfortunately this is something that we do not have access to:
   > https://svn.apache.org/repos/private/foundation/officers
   
   It appears you can login with your apache ID


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo edited a comment on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo edited a comment on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694497704


   > Greg, I was looking for example commits for where you have attributed a change to some other author (ie: a commit with you as author but with message in the commit indicating it was authored by someone else). I would need this example to know how to write the regex to detect it and also test the regex. For now I'm just looking for "author:" as I have seen one example of this.
   
   Try
   
       $ git log | grep -i "contributed by"
   
   For example:
   
       Incoporate new ARMv7-M exception handling logic contributed by Mike Smith
   
   I used (and still use) "noted by" or "reported by" if some suggests a change, but I implemented it.  So those names should be ignored.
   
   But there are other cases like:
   
       Big refactoring of toolchain definitions by Mike Smith
       ez80Acclaim fixes from Kevin Franzen
   
   That are more difficult to pick out.
   
   In most cases, any significant changes will also show the author in the BSD header
   
   > Also, how can we get an authoritative list of committers who have ICLAs? Ideally this should be retrievable from some Apache URL so that it can be made to pull the list when needed.
   
   Well, of course,  you can get the list of all committer from the PPMC page.  But there are additional people who have provided ICLAs as well, I think (although they may have become committers too).  I don't know how to find them other checking private@nuttx.apache.org.  Wouldn't all ICLAs be reported there?  Might be best to contact the ASF secretary to see if there is some authoritative list of ICLAs.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694500299


   Please also remember, when it comes down to talking about specific people by name, we probably need to do that on the private@nuttx.apache.org list.  This is what it is for; to avoid discussion of specific people in an open forum.
   
   We may need and additional, non-public email list for that purpose.  The number of people on the private list might still be too large.
   
   I have background information on many old contributors.  For example, some names are fake names used to hide the contributors identity.  I know some of those.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on a change in pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on a change in pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#discussion_r490527373



##########
File path: tools/licensing/log2json.sh
##########
@@ -0,0 +1,47 @@
+#!/usr/bin/env bash
+

Review comment:
       Apache 2.0 header




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d edited a comment on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d edited a comment on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694538544


   I improved header parsing among other things. The following is an interesting case. You can run it with:
   ```bash
   tools/licensing/log2json.sh drivers/wireless/bluetooth/bt_uart.c | tools/licensing/check.py - | less
   ```
   
   Note the handling of copyright holders, author from header and git log. Attributions would be from commit body (that is largely untested)
   
   ```
   file has 16 commits
   -
   commit: 6a3c2aded683e8284e793eb3ee8793d2960ae000
   blob: d7b4125d22020fdd7defdd0cd6c6f5d534bcdf86
   date: Thu Jan 2 10:49:34 2020 -0600
   author: Xiang Xiao <xi...@xiaomi.com>
   attributions:
   header authors:
   ['Gregory Nutt <gn...@nuttx.org>']
   header copyrights:
   ['Gregory Nutt', 'Intel Corporation']
   commit description:
   Fix wait loop and void cast (#24)
   commit msg body:
   * Simplify EINTR/ECANCEL error handling    1. Add semaphore uninterruptible wait function  2 .Replace semaphore wait loop with a single uninterruptible wait  3. Replace all sem_xxx to nxsem_xxx    * Unify the void cast usage    1. Remove void cast for function because many place ignore the returned value witout cast  2. Replace void cast for variable with UNUSED macro  
   headers:
   /****************************************************************************
    * drivers/wireless/bluetooth/bt_uart.c
    * UART based Bluetooth driver
    *
    *   Copyright (C) 2018 Gregory Nutt. All rights reserved.
    *   Author: Gregory Nutt <gn...@nuttx.org>
    *
    * Ported from the Intel/Zephyr arduino101_firmware_source-v1.tar package
    * where the code was released with a compatible 3-clause BSD license:
    *
    *   Copyright (c) 2016, Intel Corporation
    *   All rights reserved.
   ...
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] btashton commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
btashton commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694549218


   
   > > Unfortunately this is something that we do not have access to:
   > > https://svn.apache.org/repos/private/foundation/officers
   > 
   > It appears you can login with your apache ID
   
   But you have to have pmc-chairs LDAP group which none of us have (except some mentors).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d edited a comment on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d edited a comment on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694964116


   I managed to add quite a few features today, I'm now checking detected authors against ICLAs using Apache JSON files. I'm first trying to match the name to these JSONs and if no match is found, I try to use the author email against a database of email -> real name (as stated in ICLA files) for known project members. This is needed because many times the author name (either from git or from header) is not the person's realname. At the same time, there are multiple possible emails (some automatically generated by github, or just because the user has multiple emails).
   
   At the moment I'm generating a final report to indicate if all authors have CLAs. I would now need to check also the copyright holders against CLAs (easy) and against SGAs from companies (do we have these also in JSON somewhere @justinmclean?). The final goal would be to check that if all these conditions match, a suggestion of "can be switched to apache header" becomes "true".
   
   I still don't know what will we do with files which existed in submodules. @gregory-nutt do you remember if at the very beginning (where "beginning" is a commit which exists in this repo) all files where part of the repo? It appears sometime submodules started to be used and then submodules where removed. So there's a part of the history where I can't access the file contents. But given that authors are never **removed** from headers, this should not be a problem (I just have to consider this case of non-accessible files).
   
   Anyway, here's an example output of parsing the `configure.c`:
   ![image](https://user-images.githubusercontent.com/161706/93622072-dddd8a80-f9b2-11ea-830f-0fa3c2df5f32.png)
   
   It has pretty colors and everything =)
   
   The script supports different levels of verbosity so that one can introspect into the analysis process and pick up any errors


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694964116


   I managed to add quite a few features today, I'm now checking detected authors against ICLAs using Apache JSON files. I'm first trying to match the name to these JSONs and if no match is found, I try to use the author email against a database of email -> real name (as stated in ICLA files) for known project members. This is needed because many times the author name (either from git or from header) is not the person's realname. At the same time, there are multiple possible emails (some automatically generated by github, or just because the user has multiple emails).
   
   At the moment I'm generating a final report to indicate if all authors have CLAs. I would now need to check also the copyright holders against CLAs (easy) and against SGAs from companies (do we have these also in JSON somewhere @justinmclean?). The final goal would be to check that if all these conditions match, a suggestion of "can be switched to apache header" becomes "true".
   
   I still don't know what will we do with files which existed in submodules. @gregory-nutt do you remember if at the very beginning (where "beginning" is a commit which exists in this repo) all files where part of the repo? It appears sometime submodules started to be used and then submodules where removed. So there's a part of the history where I can't access the file contents. But given that authors are never **removed** from headers, this should not be a problem (I just have to consider this case of non-accessible files).
   
   Anyway, here's an example output of parsing the `configure.c`:
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694501611


   > Sorry for the name dropping, was just an example.
   
   We do need to exercise some care with names if we wander into areas that are inappropriate for a public forum.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-695002596


   That's strange, it should output the contents of the file. I'm not sure what that output is. But at least it does not give an error.
   Sure, if you can push them somewhere I'll try to make use of them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] hartmannathan commented on pull request #1834: License/authorship handling scripts

Posted by GitBox <gi...@apache.org>.
hartmannathan commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-768348267


   > Why not merge the patch? then we can refine the script in IP clean process.
   
   Agreed. I will merge the patch now and we will go from there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694966840


   PS: We will have to slowly build the email -> realname mapping. Maybe everyone can look for themselves in git logs and add all possible emails. The mapping should go to the real name as stated in the ICLA JSON files (these should be manually downloaded, I added the URLs in the README.md) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694538544


   I improved header parsing among other things. The following is an interesting case. You can run it with:
   <pre>
   tools/licensing/log2json.sh drivers/wireless/bluetooth/bt_uart.c | tools/licensing/check.py - | less
   </pre>
   
   Note the handling of copyright holders, author from header and git log. Attributions would be from commit body (that is largely untested)
   
   <pre>
   file has 16 commits
   -
   commit: 6a3c2aded683e8284e793eb3ee8793d2960ae000
   blob: d7b4125d22020fdd7defdd0cd6c6f5d534bcdf86
   date: Thu Jan 2 10:49:34 2020 -0600
   author: Xiang Xiao <xi...@xiaomi.com>
   attributions:
   header authors:
   ['Gregory Nutt <gn...@nuttx.org>']
   header copyrights:
   ['Gregory Nutt', 'Intel Corporation']
   commit description:
   Fix wait loop and void cast (#24)
   commit msg body:
   * Simplify EINTR/ECANCEL error handling    1. Add semaphore uninterruptible wait function  2 .Replace semaphore wait loop with a single uninterruptible wait  3. Replace all sem_xxx to nxsem_xxx    * Unify the void cast usage    1. Remove void cast for function because many place ignore the returned value witout cast  2. Replace void cast for variable with UNUSED macro  
   headers:
   /****************************************************************************
    * drivers/wireless/bluetooth/bt_uart.c
    * UART based Bluetooth driver
    *
    *   Copyright (C) 2018 Gregory Nutt. All rights reserved.
    *   Author: Gregory Nutt <gn...@nuttx.org>
    *
    * Ported from the Intel/Zephyr arduino101_firmware_source-v1.tar package
    * where the code was released with a compatible 3-clause BSD license:
    *
    *   Copyright (c) 2016, Intel Corporation
    *   All rights reserved.
   ...
   </pre>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694529679


   > Hmmm... This is all project management stuff. Perhaps you have changed your mind? Perhaps you should be on the PPMC? Are you interested?
   
   I would say this is mostly technical at the moment. We can think about that later on.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694501841


   I was adding comments to the issue thinking I was commenting here, I'll copy them:
   
   > Another question is, what is the impact of copyright vs authorship indication? For example:
   > 
   > ../../arch/arm/src/samd5e5/sam_tc.h- * Copyright 2020 Falker Automacao Agricola LTDA.
   > ../../arch/arm/src/samd5e5/sam_tc.h: * Author: Leomar Mateus Radke leomar@falker.com.br
   > ../../arch/arm/src/samd5e5/sam_tc.h: * Author: Ricardo Wartchow wartchow@gmail.com
   > 
   > The copyright sometimes (but not always) indicates a company and the authors may be employees (in this case, maybe Leomar, but maybe not Ricard). This means that we need a SGA from the company besides ICLAs from authors right? If there's no mention to a company (only author) an ICLA would be enough?
   > 
   > I will parse the copyright also so that we can initially filter these more complex cases.
   > 
   
   And:
   
   > I think we will have a hard time with many files that once existed in submodules. Git is smart enough to know about the file coming from a submodule, but it is not possible to obtain the contents of the file at that revision as it didn't exist in the parent repo.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694992288


   > I still don't know what will we do with files which existed in submodules. @gregory-nutt do you remember if at the very beginning (where "beginning" is a commit which exists in this repo) all files where part of the repo? It appears sometime submodules started to be used and then submodules where removed. So there's a part of the history where I can't access the file contents. But given that authors are never **removed** from headers, this should not be a problem (I just have to consider this case of non-accessible files).
   
   I am not sure what you are asking for.  I do have snapshots of the old submodule repositories.  That snapshots are dated Apr 19, 2016:
   
       drwxr-xr-x+ 1 spuda spuda 0 Apr 19  2016 arch-April4
       drwxr-xr-x+ 1 spuda spuda 0 Apr 19  2016 boards-April3
       drwxr-xr-x+ 1 spuda spuda 0 Apr 19  2016 Documentation-April4
       drwxr-xr-x+ 1 spuda spuda 0 Apr 19  2016 nuttx-April4
   
       $ find . -name .git
       ./arch-April4/.git
       ./boards-April3/.git
       ./Documentation-April4/.git
       ./nuttx-April4/.git
       ./nuttx-April4/arch/.git
       ./nuttx-April4/configs/.git
       ./nuttx-April4/documentation/.git
   
   These are the last commits in the repository snapshots.  Because these were merged, the hashes are probably no longer the same.
   
       arch-April14:  6b2955bacf56b42470d59ef677004e8beed0794d, April 4
       boards-April13:  2fee7d85361d34343fa82807d45a62f37378ad5f, April 3
       Documentation-April14:  a1eb57539fc9f63eabe5bcff7d68d98ad41447b4, March 30
       nuttx-April14:  aea5b9fd38589f8ede7bc5677df6c0f2f06bcca7, April 4
   
   Would those be useful to you?  From the history, it looks like the submodules were removed in early April:
   
       commit 7337e748de8e6952c4815d279c2eb82aa72dc382
       Merge: 48106e605a 6e24c287f6
       Author: Gregory Nutt <gn...@nuttx.org>
       Date:   Sun Apr 10 07:57:59 2016 -0600
       
           Merge in configs/ submodule
       
       commit 48106e605a71f7aabb1188a9aa9a6ecb7f0bc934
       Merge: 835ad1bd4d 8f15af280a
       Author: Gregory Nutt <gn...@nuttx.org>
       Date:   Sun Apr 10 07:49:41 2016 -0600
       
           Merge in arch/ submodule
       
       commit 835ad1bd4d91820f612d53b9ce46eee131c208af
       Merge: a031fc1a88 2693be512b
       Author: Gregory Nutt <gn...@nuttx.org>
       Date:   Sun Apr 10 07:38:26 2016 -0600
       
           Merge in the Documentation submodule
       
       commit a031fc1a8899a9d0647b968ecf96324e73a13df1
       Author: Gregory Nutt <gn...@nuttx.org>
       Date:   Sat Apr 9 12:36:05 2016 -0600
       
           Remove submodules
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694505323


   > > Another question is, what is the impact of copyright vs authorship indication? For example:
   > > ../../arch/arm/src/samd5e5/sam_tc.h- * Copyright 2020 
   > > ../../arch/arm/src/samd5e5/sam_tc.h: * Author:  
   > > ../../arch/arm/src/samd5e5/sam_tc.h: * Author:  
   > > The copyright sometimes (but not always) indicates a company and the authors may be employees (in this case, maybe , but maybe not ) This means that we need a SGA from the company besides ICLAs from authors right? If there's no mention to a company (only author) an ICLA would be enough?
   
   I am not an attorney and cannot even venture to guess.
   
   You will also find many cases where someone left me as the copyright holder, but have themselves as the author.  They were intending that I manage the legal aspects of the files but I don't know if that is really valid either.
   
   > > I think we will have a hard time with many files that once existed in submodules. Git is smart enough to know about the file coming from a submodule, but it is not possible to obtain the contents of the file at that revision as it didn't exist in the parent repo.
   
   I don't know what the history will look like exactly in these cases.  The arch and predecessor of boards directories were once submodules, but their entire history should be intact:
   
   - arch/ and boards/ were extract out with their complete history intact, then later
   - they were merged back into nuttx, again with their histories intact
   
   This means that (1) the full history is there but (2) there may be duplicate commits.  Well, I know that some of the commits are duplicated at least, but I do not know the extent that GIT has merged and fixed things.  I don't really know what you will find.  But you won't find missing commits or any missing history.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694509003


   > Another case that will be confusing is that under arch/ and boards/ many people copied files that I wrote verbatim, changing only some context stuff (like paths or file names). They correctly left me as the Author and Copyright holder as the BSD license requires. But GIT will not show me as the author of the files.
   
   That's no problem, I'm picking up all possible authorship information (header, commit message and git author)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d edited a comment on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d edited a comment on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-695002596


   That's strange, it should output the contents of the file. I'm not sure what that output is. But at least it does not give an error.
   Sure, if you can push them somewhere I'll try to make use of them. Read access is enough.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-695001673


   > 
   > 
   > Are these git repos or just snapshots of the contents? If these are repositories, could you for example try to do: `git cat-file -p 029a72b36baa4f7c58f0553692a3af193980ace5` inside the `arch` repo? If this succeeds I could try to get objects from these repos when an object is not found on our current repo.
   
       $ git cat-file -p 029a72b36baa4f7c58f0553692a3af193980ace5
       tree ad1944f150e898f7bb9b2c5840b3f8a8d279952c
       parent b8751e69b0c05f7ca6d67fc280f2e07dc9d250bf
       author David Sidrane <da...@usa.net> 1437571613 -0600
       committer Gregory Nutt <gn...@nuttx.org> 1437571613 -0600
       
       Add support for the STM32446.  From David Sidrane
   
   I could re-create these repositories at github.com/nuttx if you like. I could also give you full access to that directory as well.  Lots of people have access; it is totally uncontrolled.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scripts

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-767634122


   Last time I tried they worked as expected, so you could merge and improve any kinks later on. I guess you could use them for CI, but if one wants to be entirely sure about any mentions of an author in commit messages/headers, it will still require manual checking of the verbose output.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-695020031


   > Sure, if you can push them somewhere I'll try to make use of them. Read access is enough.
   
   You can find all four repositories here:  https://github.com/nuttx
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694508772


   > Also, how can we get an authoritative list of committers who have ICLAs? Ideally this should be retrievable from some Apache URL so that it can be made to pull the list when needed.
   
   You will also need the list CCLAs and SGAs.  I know that Xiaomi, Expressif, and I have all provided SGAs.  Are their others?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694506935


   The old ChangeLog files that we removed also has the author and date of each significant change, but these are not referenced by commit number.  But perhaps they could be helpful too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on a change in pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on a change in pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#discussion_r490533669



##########
File path: tools/licensing/check.py
##########
@@ -0,0 +1,43 @@
+#!/usr/bin/env python3
+

Review comment:
       will do

##########
File path: tools/licensing/log2json.sh
##########
@@ -0,0 +1,47 @@
+#!/usr/bin/env bash
+

Review comment:
       will do




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] hartmannathan merged pull request #1834: License/authorship handling scripts

Posted by GitBox <gi...@apache.org>.
hartmannathan merged pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694999888


   Are these git repos or just snapshots of the contents? If these are repositories, could you for example try to do: `git cat-file -p 029a72b36baa4f7c58f0553692a3af193980ace5` inside the `arch` repo? If this succeeds I could try to get objects from these repos when an object is not found on our current repo.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] justinmclean commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
justinmclean commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694522962


   Hi,
   
   There’s a list of all cla in the foundations svn under:
   ./officers/cclas.txt
   ./officers/iclas.txt
   
   The ICLAs files contains lines like:
   jmclean:Justin Mclean:Justin Mclean:justin@classsoftware.com:Signed CLA;justin-mclean
   
   The CCLA file contains lines like:
   notinavail:Espressif Systems (Shanghai) Co. Ltd. - Ivan Grokhotkov:ivan@espressif.com:Signed Corp CLA for Alan Carvalho de Assis, Dong Heng, Chen Wen
   
   Whimsey has some generated data files which you can find here:
   https://whimsy.apache.org/public/
   
   The two most useful are probably:
   https://whimsy.apache.org/public/icla-info.json
   https://whimsy.apache.org/public/icla-info_noid.json
   
   Also note it's possible for two different people to have the same name.
   
   Justin


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d edited a comment on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d edited a comment on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694501841


   I was adding comments to the issue thinking I was commenting here, I'll copy them:
   
   > Another question is, what is the impact of copyright vs authorship indication? For example:
   > 
   > ../../arch/arm/src/samd5e5/sam_tc.h- * Copyright 2020 [company]
   > ../../arch/arm/src/samd5e5/sam_tc.h: * Author: [name A] [email]
   > ../../arch/arm/src/samd5e5/sam_tc.h: * Author: [name B] [email]
   > 
   > The copyright sometimes (but not always) indicates a company and the authors may be employees (in this case, maybe [A], but maybe not [B]) This means that we need a SGA from the company besides ICLAs from authors right? If there's no mention to a company (only author) an ICLA would be enough?
   > 
   > I will parse the copyright also so that we can initially filter these more complex cases.
   > 
   
   And:
   
   > I think we will have a hard time with many files that once existed in submodules. Git is smart enough to know about the file coming from a submodule, but it is not possible to obtain the contents of the file at that revision as it didn't exist in the parent repo.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] btashton commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
btashton commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694546960


   > @justinmclean sorry, where is that foundation SVN?
   
   Unfortunately this is something that we do not have access to:
   https://svn.apache.org/repos/private/foundation/officers
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo edited a comment on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo edited a comment on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694500299


   Please also remember, when it comes down to talking about specific people by name, we probably need to do that on the private@nuttx.apache.org list.  This is what it is for; to avoid discussion of specific people in an open forum.
   
   We may need an different, non-public email list for that purpose.  The number of people on the private list might still be too large.
   
   I have background information on many old contributors.  For example, some names are fake names used to hide the contributors identity.  I know some of those.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo edited a comment on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo edited a comment on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694506935


   The old ChangeLog files that we removed also has the author and date of each significant change, but these are not referenced by commit number.  But perhaps they could be helpful too.
   
   I think that Adam has the correct strategy:  Build a data set of some kind for everything, clear the easy ones, then look more carefully at the files with issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d edited a comment on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d edited a comment on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694501841


   I was adding comments to the issue thinking I was commenting here, I'll copy them:
   
   > Another question is, what is the impact of copyright vs authorship indication? For example:
   > 
   > ../../arch/arm/src/samd5e5/sam_tc.h- * Copyright 2020 <company>
   > ../../arch/arm/src/samd5e5/sam_tc.h: * Author: <name A> <email>
   > ../../arch/arm/src/samd5e5/sam_tc.h: * Author: <name B> <email>
   > 
   > The copyright sometimes (but not always) indicates a company and the authors may be employees (in this case, maybe <A>, but maybe not <B>) This means that we need a SGA from the company besides ICLAs from authors right? If there's no mention to a company (only author) an ICLA would be enough?
   > 
   > I will parse the copyright also so that we can initially filter these more complex cases.
   > 
   
   And:
   
   > I think we will have a hard time with many files that once existed in submodules. Git is smart enough to know about the file coming from a submodule, but it is not possible to obtain the contents of the file at that revision as it didn't exist in the parent repo.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] justinmclean commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
justinmclean commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694548095


   Hi,
   > @justinmclean <https://github.com/justinmclean> Thank, that's quite useful (specially since it is JSON). I'm wondering how we can safely identify authors since I see there are only names and apache UIDs whereas we have author names and emails.
   > 
   No all people who have signed ICLA have apache accounts, that’s why there’s two files. Try matching on the name would be a good start.
   
   This may also help as you can search on email, but its not going to know all committer email addresses.
   https://whimsy.apache.org/roster/committer/
   
   > Also, I understand that this is the list of all authors from all Apache users with ICLAs. It doesn't matter if they are from other projects right? 
   > 
   
   It generally wouldn’t matter, as you don’t need to sign an ICLA for each project you work on.
   
   Thanks,
   Justin


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo edited a comment on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo edited a comment on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694508772


   > Also, how can we get an authoritative list of committers who have ICLAs? Ideally this should be retrievable from some Apache URL so that it can be made to pull the list when needed.
   
   You will also need the list CCLAs and SGAs.  I know that Xiaomi, Expressif, and I have all provided SGAs.  Are there others?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694531814


   @justinmclean Thank, that's quite useful (specially since it is JSON). I'm wondering how we can safely identify authors since I see there are only names and apache UIDs whereas we have author names and emails.
   
   Also, I understand that this is the list of all authors from all Apache users with ICLAs. It doesn't matter if they are from other projects right? Anyway the chances of having a commiter with an ICLA but not being a NuttX commiter will be quite small.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] hartmannathan commented on pull request #1834: License/authorship handling scripts

Posted by GitBox <gi...@apache.org>.
hartmannathan commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-767611692


   > Does anyone want to give this a try? It will not work for now for files under arch/ boards/ (since these were submodules). I'll try to fix that later.
   
   We are having a discussion about this (again) at:
   
   https://lists.apache.org/thread.html/rfc438d6b716c07c2f60c946822f9b8b41ac44b6613e2f4c7ed7d3ebd%40%3Cdev.nuttx.apache.org%3E
   
   Maybe you could join us.
   
   Since last time, we have collected many ICLAs and SGAs, so I think a larger chunk of files could be cleared for a first step.
   
   I have not tried the scripts yet, but I think that if they work (even if not perfectly), we should merge this PR to bring it into the NuttX repo; then the scripts can always be improved by new PRs.
   
   WDYT?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] v01d commented on pull request #1834: License/authorship handling scripts

Posted by GitBox <gi...@apache.org>.
v01d commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-708596460


   I've updated the scripts and added documentation on their usage on the README.md. 
   I tried to see if I could use the old repos to access the file contents at points in time when they were inside a submodule, but it does not appear to work. Anyway, I found that adding `--simplify-merges` to `git log` seems to take care of the extra blob hash that sometimes appeared for these files. Also, I now simply ignore skip analyzing the file if a blob hash cannot be accessed, so it can safely be invoked for any file in the repo.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #1834: License/authorship handling scripts

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-768020070


   Why not merge the patch? then we can refine the script in IP clean process.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1834: License/authorship handling scrips (wip)

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1834:
URL: https://github.com/apache/incubator-nuttx/pull/1834#issuecomment-694506158


   Another case that will be confusing is that under arch/ and boards/ many people copied files that I wrote verbatim, changing only some context stuff (like paths or file names).  They correctly left me as the Author and Copyright holder as the BSD license requires.  But GIT will not show me as the author of the files.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org