You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/08/01 10:02:14 UTC

[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #103: Add ohrphaned jira usernames

mocobeta opened a new pull request, #103:
URL: https://github.com/apache/lucene-jira-archive/pull/103

   #96 
   
   `mappings-data/ohphan_jira_ids.txt` lists the "orphaned" Jira usernames that are obsolete usernames (i.e. unknown Jira users) appearing in issue descriptions or comments (`[~username]`) 
   I also added several mappings to the "verified" account mapping file; I don't find "new" accounts, but they will work as aliases.
   
   These orphaned usernames are detected by this script. (I didn't commit this scratchy code).
   ```python
   from operator import itemgetter
   from pathlib import Path
   import json
   import re
   import itertools
   from collections import defaultdict
   
   from common import JIRA_DUMP_DIRNAME, MAPPINGS_DATA_DIRNAME, JIRA_USERS_FILENAME, read_jira_users_map
   from jira_util import REGEX_MENION_TILDE, extract_description, extract_comments
   
   dump_dir = Path(__file__).resolve().parent.parent.joinpath(JIRA_DUMP_DIRNAME)
   mappings_dir = Path(__file__).resolve().parent.parent.joinpath(MAPPINGS_DATA_DIRNAME)
   jira_users_file = mappings_dir.joinpath(JIRA_USERS_FILENAME)
   jira_users = read_jira_users_map(jira_users_file) if jira_users_file.exists() else {}
   
   
   def extract_tilde_mentions(text):
       mentions = re.findall(REGEX_MENION_TILDE, text)
       mentions = set(filter(lambda x: x != '', itertools.chain.from_iterable(mentions)))
       mentions = [x[2:-1] for x in mentions]
       return mentions
   
   
   orphan_ids = defaultdict(int)
   for dump_file in dump_dir.glob("LUCENE-*.json"):
       mentions = set([])
       with open(dump_file) as fp:
           o = json.load(fp)
           description = extract_description(o)
           mentions.update(extract_tilde_mentions(description))
           comments = extract_comments(o)
           for (_, _, comment, _, _, _) in comments:
               mentions.update(extract_tilde_mentions(comment))
       for m in mentions:
           if m not in jira_users:
               orphan_ids[m] += 1
   
   orphan_ids = sorted(orphan_ids.items(), key=itemgetter(1), reverse=True)
   for id, count in orphan_ids:
       print(f'{id}\t{count}')
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-jira-archive] mikemccand commented on pull request #103: Add ohrphaned jira usernames

Posted by GitBox <gi...@apache.org>.
mikemccand commented on PR #103:
URL: https://github.com/apache/lucene-jira-archive/pull/103#issuecomment-1200996491

   > I didn't commit this scratchy code
   
   Oh no!  You should commit scratchy code!  Progress not perfection.  It's an awesome start, and future people struggling with Jira -> GitHub migration, might want to handle such orphan'd cases too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-jira-archive] mikemccand commented on pull request #103: Add ohrphaned jira usernames

Posted by GitBox <gi...@apache.org>.
mikemccand commented on PR #103:
URL: https://github.com/apache/lucene-jira-archive/pull/103#issuecomment-1201019643

   > > I didn't commit this scratchy code
   > 
   > Oh no! You should commit scratchy code! Progress not perfection. It's an awesome start, and future people struggling with Jira -> GitHub migration, might want to handle such orphan'd cases too.
   
   OK, I'm trying to smooth a bit of its scratchiness and I'll commit!  It makes it easier for me to iterate on this orphan'd usernames.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-jira-archive] mikemccand merged pull request #103: Add ohrphaned jira usernames

Posted by GitBox <gi...@apache.org>.
mikemccand merged PR #103:
URL: https://github.com/apache/lucene-jira-archive/pull/103


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org