You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by to...@apache.org on 2022/07/02 01:19:48 UTC

[lucene-jira-archive] branch main updated (8f0bb1fe -> 47457928)

This is an automated email from the ASF dual-hosted git repository.

tomoko pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/lucene-jira-archive.git


    from 8f0bb1fe fix download link for attachments
     new 4d04c861 allow to change attachments download dirpath
     new 47457928 update README

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 README.md                      | 10 +++++++++-
 migration/.env.example         |  3 ++-
 migration/.gitignore           |  1 -
 migration/README.md            | 14 ++++++++++----
 migration/src/common.py        |  5 +++--
 migration/src/download_jira.py | 10 +++++-----
 6 files changed, 29 insertions(+), 14 deletions(-)


[lucene-jira-archive] 01/02: allow to change attachments download dirpath

Posted by to...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tomoko pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/lucene-jira-archive.git

commit 4d04c861fb08ee635aa985036fa3adba909265b6
Author: Tomoko Uchida <to...@gmail.com>
AuthorDate: Sat Jul 2 09:39:49 2022 +0900

    allow to change attachments download dirpath
---
 migration/.env.example         |  3 ++-
 migration/.gitignore           |  1 -
 migration/src/common.py        |  5 +++--
 migration/src/download_jira.py | 10 +++++-----
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/migration/.env.example b/migration/.env.example
index 842824a3..4b087674 100644
--- a/migration/.env.example
+++ b/migration/.env.example
@@ -1,4 +1,5 @@
 export GITHUB_PAT=
 export GITHUB_REPO=
 export GITHUB_ATT_REPO="apache/lucene-jira-archive"
-export GITHUB_ATT_BRANCH="main"
\ No newline at end of file
+export GITHUB_ATT_BRANCH="attachments"
+export ATTACHMENTS_DL_DIR=
\ No newline at end of file
diff --git a/migration/.gitignore b/migration/.gitignore
index a9d17ea0..0c0283b2 100644
--- a/migration/.gitignore
+++ b/migration/.gitignore
@@ -13,4 +13,3 @@ venv/
 .env
 
 log/
-attachments/
\ No newline at end of file
diff --git a/migration/src/common.py b/migration/src/common.py
index a6c373b1..93a6a02f 100644
--- a/migration/src/common.py
+++ b/migration/src/common.py
@@ -3,12 +3,13 @@ import logging
 from datetime import datetime
 import functools
 import time
-
+import os
+import tempfile
 
 LOG_DIRNAME = "log"
 
 JIRA_DUMP_DIRNAME = "jira-dump"
-JIRA_ATTACHMENTS_DIRNAME = "attachments"
+JIRA_ATTACHMENTS_DIRPATH = os.getenv("ATTACHMENTS_DL_DIR", str(Path(tempfile.gettempdir()).joinpath("attachments")))
 GITHUB_IMPORT_DATA_DIRNAME = "github-import-data"
 MAPPINGS_DATA_DIRNAME = "mappings-data"
 
diff --git a/migration/src/download_jira.py b/migration/src/download_jira.py
index 7c3a5e7f..92bebc32 100644
--- a/migration/src/download_jira.py
+++ b/migration/src/download_jira.py
@@ -13,7 +13,7 @@ from dataclasses import dataclass
 
 import requests
 
-from common import LOG_DIRNAME, JIRA_DUMP_DIRNAME, JIRA_ATTACHMENTS_DIRNAME, logging_setup, jira_dump_file, jira_attachments_dir, jira_issue_id
+from common import LOG_DIRNAME, JIRA_DUMP_DIRNAME, JIRA_ATTACHMENTS_DIRPATH, logging_setup, jira_dump_file, jira_attachments_dir, jira_issue_id
 
 log_dir = Path(__file__).resolve().parent.parent.joinpath(LOG_DIRNAME)
 logger = logging_setup(log_dir, "download_jira")
@@ -94,7 +94,7 @@ if __name__ == "__main__":
         dump_dir.mkdir()
     assert dump_dir.exists()
 
-    att_data_dir = Path(__file__).resolve().parent.parent.parent.joinpath(JIRA_ATTACHMENTS_DIRNAME)
+    att_data_dir = Path(JIRA_ATTACHMENTS_DIRPATH)
     if not att_data_dir.exists():
         att_data_dir.mkdir()
     assert att_data_dir.exists()
@@ -108,10 +108,10 @@ if __name__ == "__main__":
         else:
             issues.append(args.min)
     
-    logger.info(f"Downloading Jira issues in {dump_dir}")
+    logger.info(f"Downloading Jira issues in {dump_dir}. Attachments are saved in {att_data_dir}.")
     for num in issues:
-        download_issue(num, dump_dir)
-        download_attachments(num, dump_dir, att_data_dir)
+        if download_issue(num, dump_dir):
+            download_attachments(num, dump_dir, att_data_dir)
         time.sleep(DOWNLOAD_INTERVAL_SEC)
     
     logger.info("Done.")


[lucene-jira-archive] 02/02: update README

Posted by to...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tomoko pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/lucene-jira-archive.git

commit 474579289893c639e2769ec1273ae2737b847b6a
Author: Tomoko Uchida <to...@gmail.com>
AuthorDate: Sat Jul 2 10:19:40 2022 +0900

    update README
---
 README.md           | 10 +++++++++-
 migration/README.md | 14 ++++++++++----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index fb6a4aa6..d832a5de 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,15 @@ This repository serves for:
 
 https://issues.apache.org/jira/browse/LUCENE-10557
 
-- [Archive Jira attachments](./attachments)
+- [Archive Jira attachments](https://github.com/apache/lucene-jira-archive/tree/attachments)
 - Drafting Label management
 - Drafting Issue templates
 - [Migration script](./migration/)
+
+## Recommended: Use --single-branch when cloning
+
+All attachments (800MiB+) are kept in [attachments](https://github.com/apache/lucene-jira-archive/tree/attachments) branch. Not to clone all commit history at initial cloning, use `--single-branch --branch main` option that only clones the history of `main` branch.
+
+```
+git clone --single-branch --branch main git@github.com:apache/lucene-jira-archive.git
+```
diff --git a/migration/README.md b/migration/README.md
index 7629a793..c3227c96 100644
--- a/migration/README.md
+++ b/migration/README.md
@@ -4,6 +4,7 @@
 
 You need Python 3.9+. The scripts were tested on Linux; maybe works also on Mac and Windows (not tested).
 
+On Linux/MacOS:
 ```
 python -V
 Python 3.9.13
@@ -14,18 +15,24 @@ source .venv/bin/activate
 (.venv) pip install -r requirements.txt
 ```
 
-You need a GitHub repository and personal access token for testing. Set `GITHUB_PAT` and `GITHUB_REPO` environment variables.
+You need a GitHub repository and personal access token for testing. Set `GITHUB_PAT` and `GITHUB_REPO` environment variables. See `.env.example` for other variables.
 
+On Linux/MacOS:
 ```
+cp .env.example .env
+
+vi .env
 export GITHUB_PAT=<your token>
 export GITHUB_REPO=<your repository location> # e.g. "mocobeta/sandbox-lucene-10557"
+
+source .env
 ```
 
 ## Usage
 
 ### 1. Download Jira issues
 
-`src/download_jira.py` downloads Jira issues and dumps them as JSON files in `migration/jira-dump`.
+`src/download_jira.py` downloads Jira issues and dumps them as JSON files in `migration/jira-dump`. This also downloads attached filed in each issue.
 
 ```
 (.venv) migration $ python src/download_jira.py --min 10500 --max 10600
@@ -110,6 +117,7 @@ You can:
 
 * migrate all texts in issue descriptions and comments to GitHub; browsing/searching old issues should work fine.
 * extract every issue metadata from Jira and port it to labels or issue descriptions (as plain text).
+* Create links to attachments.
 * map Jira cross-issue link "LUCENE-xxx" to GitHub issue mention "#yyy".
 * map Jira user ids to GitHub accounts if the mapping is given.
 * convert Jira markups to Markdown with parser library.
@@ -122,5 +130,3 @@ You can:
 You cannot:
 
 * simulate original issue reporters or comment authors; they have to be preserved in free-text forms.
-* migrate attached files (patches, images, etc.) to GitHub; these have to remain in Jira.
-   * it's not allowed to programmatically upload files and attach them to issues.