You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "AnandInguva (via GitHub)" <gi...@apache.org> on 2024/02/02 21:21:26 UTC

[PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

AnandInguva opened a new pull request, #30202:
URL: https://github.com/apache/beam/pull/30202

   fixes: https://github.com/apache/beam/issues/30191
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #30202:
URL: https://github.com/apache/beam/pull/30202#issuecomment-1924882115

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @jrmccluskey for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on code in PR #30202:
URL: https://github.com/apache/beam/pull/30202#discussion_r1483537668


##########
sdks/python/apache_beam/ml/transforms/utils.py:
##########
@@ -18,18 +18,47 @@
 __all__ = ['ArtifactsFetcher']
 
 import os
+import tempfile
 import typing
 
 import tensorflow_transform as tft
 from apache_beam.ml.transforms import base
 
+from google.cloud.storage import Client
+from google.cloud.storage import transfer_manager
 
-class ArtifactsFetcher():
+
+def download_artifacts_from_gcs(bucket_name, prefix, local_path):
+  """Downloads artifacts from GCS to the local file system.
+    Args:
+        bucket_name: The name of the GCS bucket to download from.
+        folder_name: The name of the folder to download.

Review Comment:
   ```suggestion
           prefix: Prefix of GCS objects to download.
   ```



##########
sdks/python/apache_beam/ml/transforms/utils.py:
##########
@@ -18,18 +18,47 @@
 __all__ = ['ArtifactsFetcher']
 
 import os
+import tempfile
 import typing
 
 import tensorflow_transform as tft
 from apache_beam.ml.transforms import base
 
+from google.cloud.storage import Client
+from google.cloud.storage import transfer_manager
 
-class ArtifactsFetcher():
+
+def download_artifacts_from_gcs(bucket_name, prefix, local_path):
+  """Downloads artifacts from GCS to the local file system.
+    Args:
+        bucket_name: The name of the GCS bucket to download from.
+        folder_name: The name of the folder to download.
+        local_path: The local path to download the folder to.
+    """

Review Comment:
   ```suggestion
     """
   ```



##########
sdks/python/apache_beam/ml/transforms/utils.py:
##########
@@ -18,18 +18,47 @@
 __all__ = ['ArtifactsFetcher']
 
 import os
+import tempfile
 import typing
 
 import tensorflow_transform as tft
 from apache_beam.ml.transforms import base
 
+from google.cloud.storage import Client
+from google.cloud.storage import transfer_manager
 
-class ArtifactsFetcher():
+
+def download_artifacts_from_gcs(bucket_name, prefix, local_path):
+  """Downloads artifacts from GCS to the local file system.
+    Args:
+        bucket_name: The name of the GCS bucket to download from.
+        folder_name: The name of the folder to download.
+        local_path: The local path to download the folder to.
+    """
+  client = Client()
+  bucket = client.get_bucket(bucket_name)
+  blobs = [blob.name for blob in bucket.list_blobs(prefix=prefix)]
+  _ = transfer_manager.download_many_to_path(
+      bucket, blobs, destination_directory=local_path, max_workers=6)

Review Comment:
   default for max_workers is 8, any particular reason to reduce to specify it and reduce to 6?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #30202:
URL: https://github.com/apache/beam/pull/30202#issuecomment-1932721146

   https://github.com/apache/beam/actions/runs/7819538959/job/21332179529


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva merged PR #30202:
URL: https://github.com/apache/beam/pull/30202


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #30202:
URL: https://github.com/apache/beam/pull/30202#issuecomment-1930735230

   https://github.com/apache/beam/actions/runs/7805971680 - IT test run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #30202:
URL: https://github.com/apache/beam/pull/30202#issuecomment-1924892660

   converting this to draft. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #30202:
URL: https://github.com/apache/beam/pull/30202#issuecomment-1932874909

   R: @tvalentyn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #30202:
URL: https://github.com/apache/beam/pull/30202#issuecomment-1932880600

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Update artifacts fetcher to download artifacts locally using FileSystems [beam]

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.
AnandInguva commented on PR #30202:
URL: https://github.com/apache/beam/pull/30202#issuecomment-1932378736

   https://github.com/apache/beam/actions/runs/7817704461


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org