You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/30 07:15:55 UTC

[GitHub] [beam] carlthome opened a new pull request, #22112: Include ffmpeg in Python SDK Docker images

carlthome opened a new pull request, #22112:
URL: https://github.com/apache/beam/pull/22112

   ## Background
   https://github.com/tensorflow/datasets lets users create versioned datasets for training machine learning (ML) models on raw multimedia data (text, images, audio, video), and [it integrates wonderfully with Beam](https://www.tensorflow.org/datasets/beam_datasets) (and Dataflow).
   
   ## Problem
   A current pain point in the user journey is that in order to use `tfds.features.Audio` or `tfds.features.Video`, the worker runtime has to have `ffmpeg` available which means that `setup.py` hackery (with `apt-get` commands within it), or prebuilding your own image with e.g. Google Cloud Build is necessary. This significantly worsens the UX for ML engineers who expect that a requirements.txt mentioning `librosa` and `apache_beam` should suffice. Frustrations are especially exacerbated by the fact that DirectRunner development usually just works (since many have `ffmpeg` or equivalent on their development machines), and that these issues surface quite late at runtime.
   
   ## Solution?
   Thus, I'm curious if we couldn't just include `ffmpeg` in Beam's Docker Hub hosted images, with the goal of making it easier to create large audio and video datasets. Thoughts?
   
   ## Examples of user journeys
   - https://stackoverflow.com/questions/55581449/installing-ffmpeg-package-from-setup-py-in-apache-beam-pipeline-running-on-goo
   - https://stackoverflow.com/questions/45494952/reading-video-during-cloud-dataflow-using-gcsfuse-download-locally-or-write-n
   - https://stackoverflow.com/questions/35321113/can-google-cloud-dataflow-apache-beam-use-ffmpeg-to-process-video-or-image-dat
   - https://stackoverflow.com/questions/56996028/google-cloud-dataflow-dependencies
   - https://stackoverflow.com/questions/45124289/posthoc-connect-ffmpeg-to-opencv-python-binary-for-google-cloud-dataflow-job
   - https://lists.apache.org/thread/jhgfb2o3c7ms4mg1qlsgqh470nltoosj :wave: 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] carlthome closed pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
carlthome closed pull request #22112: Include ffmpeg in Python SDK Docker images
URL: https://github.com/apache/beam/pull/22112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
AnandInguva commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1181946764

   > I think the idea of a thin and fat container makes a lot of sense. Probably worth proposing on the list rather than a PR to pound out the details.
   > […](#)
   > On Sun, Jul 10, 2022 at 5:21 AM Carl Thomé ***@***.***> wrote: Closed #22112. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
   
   I plan on proposing this on the dev/user list. Hopefully soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1179533650

   Reminder, please take a look at this pr: @AnandInguva 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] carlthome commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
carlthome commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1179718371

   > Can we close the PR if there isn’t anything else?
   
   👍


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1170855596

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1170855597

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1170900828

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @AnandInguva for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
robertwb commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1180656565

   I think the idea of a thin and fat container makes a lot of sense.
   Probably worth proposing on the list rather than a PR to pound out the
   details.
   
   On Sun, Jul 10, 2022 at 5:21 AM Carl Thomé ***@***.***> wrote:
   >
   > Closed #22112.
   >
   > —
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   > You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1170873986

   # [Codecov](https://codecov.io/gh/apache/beam/pull/22112?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#22112](https://codecov.io/gh/apache/beam/pull/22112?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9e637cc) into [master](https://codecov.io/gh/apache/beam/commit/07ed486d653df440b7993679bc6226e0dc4dd6dc?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (07ed486) will **decrease** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #22112      +/-   ##
   ==========================================
   - Coverage   73.98%   73.98%   -0.01%     
   ==========================================
     Files         703      703              
     Lines       92949    92949              
   ==========================================
   - Hits        68770    68769       -1     
   - Misses      22913    22914       +1     
     Partials     1266     1266              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `83.57% <ø> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/22112?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/combiners.py](https://codecov.io/gh/apache/beam/pull/22112/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb21iaW5lcnMucHk=) | `93.05% <0.00%> (-0.39%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/22112/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5) | `92.57% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/22112/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.42% <0.00%> (+0.12%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/22112?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/22112?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [07ed486...9e637cc](https://codecov.io/gh/apache/beam/pull/22112?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1170855614

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
AnandInguva commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1171574369

   You can also use a custom container[1].
   
   [1] https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#custom-containers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
AnandInguva commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1179634096

   Can we close the PR if there isn’t anything else?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
AnandInguva commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1172364405

   Yes, slim version seems like a good choice. I recall we had that discussion previously to support slim version for the apache_beam container image. lets see whats the road map for this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1170855594

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] carlthome commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
carlthome commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1172359297

   Figured image size was the reasoning, but worth a shot. :smiling_face_with_tear: 
   
   Perhaps there could be a `-slim` variant of tags instead, like how the official Python images does it?
   
   Feels to me like that might be a nicer UX compared to teaching data scientists to build/push Docker images, or requiring Debian specific tricks with setuptools that only seem to confuse people about which `setup.py` is meant for local development, and which is only meant for configuring a remote worker runtime (I've seen this cause confusion before, especially for Mac users).
   
   `conda` sounds like a nice future complement. Looking forward to it! :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1170855604

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #22112: Include ffmpeg in Python SDK Docker images

Posted by GitBox <gi...@apache.org>.
AnandInguva commented on PR #22112:
URL: https://github.com/apache/beam/pull/22112#issuecomment-1171567689

   Since this is one of the may use cases out there where a user needs to install a package that is not available on Pypi, we recommend using a `setup.py`[1] for these dependencies. Even though the packages are not so big, this will still increase the size of Docker container and we want to avoid that. 
   
   Right now, we support installing packages through `pip` but in the future, we have plans to support `conda` then it would provide more flexibility to the users. 
   
   
   [1] https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#nonpython


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org