You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by to...@apache.org on 2022/07/06 07:16:29 UTC
[lucene-jira-archive] branch main updated: Split up updating script (#17)
This is an automated email from the ASF dual-hosted git repository.
tomoko pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/lucene-jira-archive.git
The following commit(s) were added to refs/heads/main by this push:
new d831c4bd Split up updating script (#17)
d831c4bd is described below
commit d831c4bd3867cebdaa91714c11277d61cc0d746d
Author: Tomoko Uchida <to...@gmail.com>
AuthorDate: Wed Jul 6 16:16:23 2022 +0900
Split up updating script (#17)
* split up updating script into sub-scripts.
* update README
---
migration/README.md | 89 ++++++++---------
migration/src/common.py | 9 ++
migration/src/download_jira.py | 2 +-
migration/src/jira2github_import.py | 2 +-
...e_issue_links.py => remap_cross_issue_links.py} | 47 +++++----
migration/src/update_issue_links.py | 2 +
migration/src/update_issues.py | 111 +++++++++++++++++++++
7 files changed, 196 insertions(+), 66 deletions(-)
diff --git a/migration/README.md b/migration/README.md
index 19429abe..593a8879 100644
--- a/migration/README.md
+++ b/migration/README.md
@@ -30,23 +30,27 @@ source .env
## Usage
+All logs are saved in `migration/log`.
+
### 1. Download Jira issues
`src/download_jira.py` downloads Jira issues and dumps them as JSON files in `migration/jira-dump`. This also downloads attached files in each issue.
```
-(.venv) migration $ python src/download_jira.py --min 10500 --max 10600
-[2022-06-26 01:57:02,408] INFO:download_jira: Downloading Jira issues in /mnt/hdd/repo/sandbox-lucene-10557/migration/jira-dump
-[2022-06-26 01:57:17,843] INFO:download_jira: Done.
-
-(.venv) migration $ cat log/jira2github_import_2022-06-26T01\:34\:22.log
-[2022-06-26 01:34:22,300] INFO:jira2github_import: Converting Jira issues to GitHub issues in /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data
-[2022-06-26 01:34:23,355] DEBUG:jira2github_import: GitHub issue data created: /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data/GH-LUCENE-10500.json
-[2022-06-26 01:34:23,519] DEBUG:jira2github_import: GitHub issue data created: /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data/GH-LUCENE-10501.json
-[2022-06-26 01:34:24,894] DEBUG:jira2github_import: GitHub issue data created: /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data/GH-LUCENE-10502.json
+(.venv) migration $ python src/download_jira.py --min 10500 --max 10510
+[2022-07-06 15:43:00,864] INFO:download_jira: Downloading Jira issues in /mnt/hdd/repo/lucene-jira-archive/migration/jira-dump. Attachments are saved in ..
+[2022-07-06 15:43:16,247] INFO:download_jira: Done.
+
+(.venv) migration $ ls jira-dump/
+LUCENE-10500.json
+LUCENE-10501.json
+LUCENE-10502.json
...
```
+Downloaded attachments should be committed to a dedicated repo/branch for them.
+
+
### 2. Convert Jira issues to GitHub issues
`src/jira2github_import.py` converts Jira dumps into GitHub data that are importable to [issue import API](https://gist.github.com/jonmagic/5282384165e0f86ef105). Converted JSON data is saved in `migration/github-import-data`.
@@ -54,14 +58,14 @@ source .env
Also this resolves all Jira user ID - GitHub account alignment if the account mapping is given in `mapping-data/account-map.csv`.
```
-(.venv) migration $ python src/jira2github_import.py --min 10500 --max 10600
-[2022-06-26 01:34:22,300] INFO:jira2github_import: Converting Jira issues to GitHub issues in /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data
-[2022-06-26 01:36:27,739] INFO:jira2github_import: Done.
-
-(.venv) migration $ cat log/jira2github_import_2022-06-26T01\:34\:22.log
-[2022-06-26 01:34:22,300] INFO:jira2github_import: Converting Jira issues to GitHub issues in /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data
-[2022-06-26 01:34:23,355] DEBUG:jira2github_import: GitHub issue data created: /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data/GH-LUCENE-10500.json
-[2022-06-26 01:34:23,519] DEBUG:jira2github_import: GitHub issue data created: /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data/GH-LUCENE-10501.json
+(.venv) migration $ python src/jira2github_import.py --min 10500 --max 10510
+[2022-07-06 15:46:38,837] INFO:jira2github_import: Converting Jira issues to GitHub issues in /mnt/hdd/repo/lucene-jira-archive/migration/github-import-data
+[2022-07-06 15:46:48,761] INFO:jira2github_import: Done.
+
+(.venv) migration $ ls github-import-data/
+GH-LUCENE-10500.json
+GH-LUCENE-10501.json
+GH-LUCENE-10502.json
...
```
@@ -72,47 +76,40 @@ First pass: `src/import_github_issues.py` imports GitHub issues and comments via
We confirmed this script does not trigger any notifications.
```
-(.venv) migration $ python src/import_github_issues.py --min 10500 --max 10600
-[2022-06-26 01:36:46,749] INFO:import_github_issues: Importing GitHub issues
-[2022-06-26 01:47:35,979] INFO:import_github_issues: Done.
-
-(.venv) migration $ cat log/import_github_issues_2022-06-26T01\:36\:46.log
-[2022-06-26 01:36:46,749] INFO:import_github_issues: Importing GitHub issues
-[2022-06-26 01:36:52,299] DEBUG:import_github_issues: Import GitHub issue https://github.com/mocobeta/migration-test-2/issues/1 was successfully completed.
-[2022-06-26 01:36:57,883] DEBUG:import_github_issues: Import GitHub issue https://github.com/mocobeta/migration-test-2/issues/2 was successfully completed.
-[2022-06-26 01:37:03,405] DEBUG:import_github_issues: Import GitHub issue https://github.com/mocobeta/migration-test-2/issues/3 was successfully completed.
+(.venv) migration $ python src/import_github_issues.py --min 10500 --max 10510
+[2022-07-06 15:47:48,230] INFO:import_github_issues: Importing GitHub issues
+[2022-07-06 15:52:06,314] INFO:import_github_issues: Done.
...
(.venv) migration $ cat mappings-data/issue-map.csv
JiraKey,GitHubUrl,GitHubNumber
-LUCENE-10500,https://github.com/mocobeta/migration-test-2/issues/1,1
-LUCENE-10501,https://github.com/mocobeta/migration-test-2/issues/2,2
-LUCENE-10502,https://github.com/mocobeta/migration-test-2/issues/3,3
+LUCENE-10500,https://github.com/mocobeta/migration-test-3/issues/42,42
+LUCENE-10501,https://github.com/mocobeta/migration-test-3/issues/43,43
+LUCENE-10502,https://github.com/mocobeta/migration-test-3/issues/44,44
...
```
-### 4. Update GitHub issues and comments
+### 4. Re-map cross-issue links on GitHub
-Second pass: `src/update_issue_links.py` 1) iterates all imported GitHub issue descriptions and comments; 2) embed correct GitHub issue number next to the corresponding Jira issue key with previously created issue number mapping; 3) updates them if the texts are changed.
+`src/remap_cross_issue_links.py` exports issues and comments from GitHub and save updated issue/comment bodies to `migration/github-remapped-data`.
-e.g.: if `LUCENE-10500` is mapped to GitHub issue `#100`, then all text fragments `LUCENE-10500` in issue descriptions and comments will be updated to `LUCENE-10500 (#100)`.
+```
+(.venv) migration $ python src/remap_cross_issue_links.py --issues 40 41
+[2022-07-06 15:32:39,895] INFO:remap_cross_issue_links: Remapping cross-issue links
+[2022-07-06 15:32:47,729] INFO:remap_cross_issue_links: Done.
-We confirmed this script does not trigger any notifications.
+(.venv) migration $ ls github-remapped-data/
+COMMENT-1175792003.json COMMENT-1175792076.json COMMENT-1175797378.json COMMENT-1175797444.json COMMENT-1175797570.json ISSUE-40.json ISSUE-41.json
+```
+
+### 5. Update GitHub issues and comments
+
+Second pass: `src/update_issues.py` updates issues and comments with updated issue/comment bodies.
```
-(.venv) migration $ python src/update_issue_links.py
-[2022-06-26 01:59:43,324] INFO:update_issue_links: Updating GitHub issues
-[2022-06-26 02:17:38,332] INFO:update_issue_links: Done.
-
-(.venv) migration $ cat log/update_issue_links_2022-06-26T01\:59\:43.log
-[2022-06-26 01:59:43,324] INFO:update_issue_links: Updating GitHub issues
-[2022-06-26 01:59:45,586] DEBUG:update_issue_links: Issue 1 does not contain any cross-issue links; nothing to do.
-[2022-06-26 01:59:50,062] DEBUG:update_issue_links: # comments in issue 1 = 3
-[2022-06-26 01:59:52,601] DEBUG:update_issue_links: Comment 1166321470 was successfully updated.
-[2022-06-26 01:59:55,164] DEBUG:update_issue_links: Comment 1166321472 was successfully updated.
-[2022-06-26 01:59:55,165] DEBUG:update_issue_links: Comment 1166321473 does not contain any cross-issue links; nothing to do.
-[2022-06-26 01:59:57,426] DEBUG:update_issue_links: Issue 2 does not contain any cross-issue links; nothing to do.
-...
+(.venv) migration $ python src/update_issues.py --issues 40 41 --comments 1175797570 1175797444
+[2022-07-06 15:34:59,537] INFO:update_issues: Updating issues/comments
+[2022-07-06 15:35:06,532] INFO:update_issues: Done.
```
## Already implemented things
diff --git a/migration/src/common.py b/migration/src/common.py
index 93a6a02f..70324894 100644
--- a/migration/src/common.py
+++ b/migration/src/common.py
@@ -11,6 +11,7 @@ LOG_DIRNAME = "log"
JIRA_DUMP_DIRNAME = "jira-dump"
JIRA_ATTACHMENTS_DIRPATH = os.getenv("ATTACHMENTS_DL_DIR", str(Path(tempfile.gettempdir()).joinpath("attachments")))
GITHUB_IMPORT_DATA_DIRNAME = "github-import-data"
+GITHUB_REMAPPED_DATA_DIRNAME = "github-remapped-data"
MAPPINGS_DATA_DIRNAME = "mappings-data"
ISSUE_MAPPING_FILENAME = "issue-map.csv"
@@ -60,6 +61,14 @@ def github_data_file(data_dir: Path, issue_number: int) -> Path:
return data_dir.joinpath(f"GH-{issue_id}.json")
+def github_remapped_issue_data_file(data_dir: Path, issue_number: int) -> Path:
+ return data_dir.joinpath(f"ISSUE-{issue_number}.json")
+
+
+def github_remapped_comment_data_file(data_dir: Path, comment_id: int) -> Path:
+ return data_dir.joinpath(f"COMMENT-{comment_id}.json")
+
+
def make_github_title(summary: str, jira_id: str) -> str:
return f"{summary} [{jira_id}]"
diff --git a/migration/src/download_jira.py b/migration/src/download_jira.py
index 92bebc32..3db3e803 100644
--- a/migration/src/download_jira.py
+++ b/migration/src/download_jira.py
@@ -1,7 +1,7 @@
#
# Create local dump of Jira issues
# Usage:
-# python src/download_jira.py --issues <issue number list>
+# python src/download_jira.py --issues <jira issue number list>
# python src/download_jira.py --min <min issue number> --max <max issue number>
#
diff --git a/migration/src/jira2github_import.py b/migration/src/jira2github_import.py
index 654e0de3..4399f601 100644
--- a/migration/src/jira2github_import.py
+++ b/migration/src/jira2github_import.py
@@ -1,7 +1,7 @@
#
# Convert Jira issues to GitHub issues for Import Issues API (https://gist.github.com/jonmagic/5282384165e0f86ef105)
# Usage:
-# python src/jira2github_import.py --issues <issue number list>
+# python src/jira2github_import.py --issues <jira issue number list>
# python src/jira2github_import.py --min <min issue number> --max <max issue number>
#
diff --git a/migration/src/update_issue_links.py b/migration/src/remap_cross_issue_links.py
similarity index 53%
copy from migration/src/update_issue_links.py
copy to migration/src/remap_cross_issue_links.py
index 46254807..11d3bc5f 100644
--- a/migration/src/update_issue_links.py
+++ b/migration/src/remap_cross_issue_links.py
@@ -1,39 +1,42 @@
#
-# Update GitHub issues/comments to map Jira key to GitHub issue number
+# Remap Jira key to GitHub issue number
# Usage:
-# python src/update_issue_links.py --issues <issue number list>
-# python src/update_issue_links.py
+# python src/remap_cross_issue_links.py --issues <github issue number list>
+# python src/remap_cross_issue_links.py
#
import argparse
from pathlib import Path
import sys
import os
+import json
-from common import LOG_DIRNAME, MAPPINGS_DATA_DIRNAME, ISSUE_MAPPING_FILENAME, MaxRetryLimitExceedException, logging_setup, read_issue_id_map, retry_upto
+from common import LOG_DIRNAME, MAPPINGS_DATA_DIRNAME, ISSUE_MAPPING_FILENAME, GITHUB_REMAPPED_DATA_DIRNAME, MaxRetryLimitExceedException, logging_setup, read_issue_id_map, retry_upto, github_remapped_issue_data_file, github_remapped_comment_data_file
from github_issues_util import *
from jira_util import embed_gh_issue_link
log_dir = Path(__file__).resolve().parent.parent.joinpath(LOG_DIRNAME)
-logger = logging_setup(log_dir, "update_issue_links")
+logger = logging_setup(log_dir, "remap_cross_issue_links")
@retry_upto(3, 1.0, logger)
-def update_issue_link_in_issue_body(issue_number: int, issue_id_map: dict[str, str], token: str, repo: str):
+def remap_issue_link_in_issue_body(issue_number: int, issue_id_map: dict[str, str], data_dir: Path, token: str, repo: str):
body = get_issue_body(token, repo, issue_number, logger)
if body:
updated_body = embed_gh_issue_link(body, issue_id_map)
if updated_body == body:
logger.debug(f"Issue {issue_number} does not contain any cross-issue links; nothing to do.")
return
- if update_issue_body(token, repo, issue_number, updated_body, logger):
- logger.debug(f"Issue {issue_number} was successfully updated.")
-
+ data = {"issue_number": issue_number, "body": updated_body}
+ data_file = github_remapped_issue_data_file(data_dir, issue_number)
+ with open(data_file, "w") as fp:
+ json.dump(data, fp=fp, indent=2)
+ logger.debug(f"Updated issue body for issue_number={issue_number} was saved to {data_file}.")
@retry_upto(3, 1.0, logger)
-def update_issue_link_in_comments(issue_number: int, issue_id_map: dict[str, str], token: str, repo: str):
+def remap_issue_link_in_comments(issue_number: int, issue_id_map: dict[str, str], data_dir: Path, token: str, repo: str):
comments = get_issue_comments(token, repo, issue_number, logger)
if not comments:
return
@@ -45,8 +48,11 @@ def update_issue_link_in_comments(issue_number: int, issue_id_map: dict[str, str
if updated_body == body:
logger.debug(f"Comment {id} does not contain any cross-issue links; nothing to do.")
continue
- if update_comment_body(token, repo, id, updated_body, logger):
- logger.debug(f"Comment {id} was successfully updated.")
+ data = {"comment_id": id, "body": updated_body}
+ data_file = github_remapped_comment_data_file(data_dir, id)
+ with open(data_file, "w") as fp:
+ json.dump(data, fp=fp, indent=2)
+ logger.debug(f"Updated comment body for comment_id={id} was saved to {data_file}.")
if __name__ == "__main__":
@@ -62,7 +68,7 @@ if __name__ == "__main__":
check_authentication(github_token)
parser = argparse.ArgumentParser()
- parser.add_argument('--issues', type=int, required=False, nargs='*', help='Jira issue number list to be downloaded')
+ parser.add_argument('--issues', type=int, required=False, nargs='*', help='GitHub issue number list to be downloaded')
args = parser.parse_args()
mapping_data_dir = Path(__file__).resolve().parent.parent.joinpath(MAPPINGS_DATA_DIRNAME)
@@ -71,6 +77,11 @@ if __name__ == "__main__":
logger.error(f"Jira-GitHub issue id mapping file not found. {issue_mapping_file}")
sys.exit(1)
issue_id_map = read_issue_id_map(issue_mapping_file)
+
+ remapped_data_dir = Path(__file__).resolve().parent.parent.joinpath(GITHUB_REMAPPED_DATA_DIRNAME)
+ if not remapped_data_dir.exists():
+ remapped_data_dir.mkdir()
+ assert remapped_data_dir.exists()
issues = []
if args.issues:
@@ -78,17 +89,17 @@ if __name__ == "__main__":
else:
issues = list(issue_id_map.values())
- logger.info(f"Updating GitHub issues")
+ logger.info(f"Remapping cross-issue links")
for num in issues:
try:
- update_issue_link_in_issue_body(num, issue_id_map, github_token, github_repo)
+ remap_issue_link_in_issue_body(num, issue_id_map, remapped_data_dir, github_token, github_repo)
except MaxRetryLimitExceedException:
- logger.error(f"Failed to update issue body. Skipped issue {num}")
+ logger.error(f"Failed to export/convert issue body. Skipped issue {num}")
continue
try:
- update_issue_link_in_comments(num, issue_id_map, github_token, github_repo)
+ remap_issue_link_in_comments(num, issue_id_map, remapped_data_dir, github_token, github_repo)
except MaxRetryLimitExceedException:
- logger.error(f"Failed to update issue comments. Skipped issue {num}")
+ logger.error(f"Failed to export/convert issue comments. Skipped issue {num}")
continue
logger.info("Done.")
\ No newline at end of file
diff --git a/migration/src/update_issue_links.py b/migration/src/update_issue_links.py
index 46254807..3309f7dc 100644
--- a/migration/src/update_issue_links.py
+++ b/migration/src/update_issue_links.py
@@ -1,4 +1,6 @@
#
+# Deprecated.
+#
# Update GitHub issues/comments to map Jira key to GitHub issue number
# Usage:
# python src/update_issue_links.py --issues <issue number list>
diff --git a/migration/src/update_issues.py b/migration/src/update_issues.py
new file mode 100644
index 00000000..fc1b298d
--- /dev/null
+++ b/migration/src/update_issues.py
@@ -0,0 +1,111 @@
+#
+# Update GitHub issues/comments with re-mapped issue links.
+# Usage:
+# python src/update_issues.py --issues <github issue number list>
+# python src/update_issues.py --comments <github comment list>
+# python src/update_issues.py
+#
+
+import argparse
+from pathlib import Path
+import sys
+import os
+import json
+
+from common import LOG_DIRNAME, GITHUB_REMAPPED_DATA_DIRNAME, MaxRetryLimitExceedException, logging_setup, retry_upto, github_remapped_issue_data_file, github_remapped_comment_data_file
+from github_issues_util import *
+
+
+log_dir = Path(__file__).resolve().parent.parent.joinpath(LOG_DIRNAME)
+logger = logging_setup(log_dir, "update_issues")
+
+
+def update_issue_by_number(issue_number: int, data_dir: Path, token: str, repo: str):
+ data_file = github_remapped_issue_data_file(data_dir, issue_number)
+ update_issue(data_file, token, repo)
+
+
+@retry_upto(3, 1.0, logger)
+def update_issue(data_file: Path, token: str, repo: str):
+ with open(data_file) as fp:
+ o = json.load(fp)
+ issue_number = o["issue_number"]
+ body = o["body"]
+ if update_issue_body(token, repo, issue_number, body, logger):
+ logger.debug(f"Issue {issue_number} was successfully updated.")
+
+
+def update_comment_by_id(comment_id: int, data_dir: Path, token: str, repo: str):
+ data_file = github_remapped_comment_data_file(data_dir, comment_id)
+ update_comment(data_file, token, repo)
+
+
+@retry_upto(3, 1.0, logger)
+def update_comment(data_file: Path, token: str, repo: str):
+ with open(data_file) as fp:
+ o = json.load(fp)
+ comment_id = o["comment_id"]
+ body = o["body"]
+ if update_comment_body(token, repo, comment_id, body, logger):
+ logger.debug(f"Comment {comment_id} was successfully updated.")
+
+
+if __name__ == "__main__":
+ github_token = os.getenv("GITHUB_PAT")
+ if not github_token:
+ print("Please set your GitHub token to GITHUB_PAT environment variable.")
+ sys.exit(1)
+ github_repo = os.getenv("GITHUB_REPO")
+ if not github_repo:
+ print("Please set GitHub repo location to GITHUB_REPO environment varialbe.")
+ sys.exit(1)
+
+ check_authentication(github_token)
+
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--issues', type=int, required=False, nargs='*', help='GitHub issue number list to be updated')
+ parser.add_argument('--comments', type=int, required=False, nargs='*', help='GitHub comment id list to be updated')
+ args = parser.parse_args()
+
+ remapped_data_dir = Path(__file__).resolve().parent.parent.joinpath(GITHUB_REMAPPED_DATA_DIRNAME)
+ if not remapped_data_dir.exists():
+ remapped_data_dir.mkdir()
+ assert remapped_data_dir.exists()
+
+ issues = []
+ if args.issues:
+ issues = args.issues
+ comments = []
+ if args.comments:
+ comments = args.comments
+
+ logger.info(f"Updating issues/comments")
+
+ if not issues and not comments:
+ for data_file in remapped_data_dir.glob("ISSUE-*.json"):
+ try:
+ update_issue(data_file, github_token, github_repo)
+ except MaxRetryLimitExceedException:
+ logger.error(f"Failed to update issue body. Skipped {data_file}")
+ continue
+ for data_file in remapped_data_dir.glob("COMMENT-*.json"):
+ try:
+ update_comment(data_file, github_token, github_repo)
+ except MaxRetryLimitExceedException:
+ logger.error(f"Failed to update issue comments. Skipped {data_file}")
+ continue
+ else:
+ for num in issues:
+ try:
+ update_issue_by_number(num, remapped_data_dir, github_token, github_repo)
+ except MaxRetryLimitExceedException:
+ logger.error(f"Failed to update issue body. Skipped issue {num}")
+ continue
+ for id in comments:
+ try:
+ update_comment_by_id(id, remapped_data_dir, github_token, github_repo)
+ except MaxRetryLimitExceedException:
+ logger.error(f"Failed to update issue comments. Skipped comment {id}")
+ continue
+
+ logger.info("Done.")