You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Ryan Williams (JIRA)" <ji...@apache.org> on 2018/07/09 19:49:00 UTC

[jira] [Commented] (BEAM-4742) Allow custom docker-image in portable wordcount example

    [ https://issues.apache.org/jira/browse/BEAM-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537466#comment-16537466 ] 

Ryan Williams commented on BEAM-4742:
-------------------------------------

OK these were both basically mistakes on my part: {{FileBasedSink.initialize_write }} [creates the directories containing the output path|https://github.com/apache/beam/blob/a8eaa1b3ec0544de8b56dd504bc249d4c1a2017f/sdks/python/apache_beam/io/filebasedsink.py#L162], so [~angoenka]'s guess as to why I might've seen the {{IOError}} above is that I had a typo in the output path and was trying to create a top-level directory I didn't have permissions for (e.g. {{/tmpz}}).

On the point about observing the output of a portable wordcount example, the output gets written to the filesystem inside the docker container; {{docker ps}} will display the ID of the container, {{docker exec -it <id> bash}} will attach a shell to it, and then {{ls}} etc will allow inspecting the output files.

[The metrics normally collected/logged when running wordcount in {{DirectRunner}}|https://github.com/apache/beam/blob/f063b157eea480d079c4e966e528eef050a0c192/sdks/python/apache_beam/examples/wordcount.py#L121-L134] are not collected and/or not output in {{PortableRunner}} atm.

> Allow custom docker-image in portable wordcount example
> -------------------------------------------------------
>
>                 Key: BEAM-4742
>                 URL: https://issues.apache.org/jira/browse/BEAM-4742
>             Project: Beam
>          Issue Type: Improvement
>          Components: examples-python
>    Affects Versions: 2.5.0
>            Reporter: Ryan Williams
>            Assignee: Ryan Williams
>            Priority: Minor
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> I hit a couple snags [running the portable wordcount example|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/build.gradle#L200-L214]:
>  * -[the default docker image is hard-coded to a bintray URL|https://github.com/apache/beam/blob/997ee3afe74483ae44e2dcb32ca0e24876129cd9/sdks/python/apache_beam/runners/portability/portable_runner.py#L60-L68], but I published my image to Docker Hub- I missed that [there's already a pipeline option for this|https://github.com/apache/beam/pull/5902#discussion_r201071859]! Thanks [~lcwik]
>  * the default output path is in a temporary directory that doesn't exist at the time of the {{open}} call, so I got {{IOError: [Errno 2] No such file or directory}} 
> I'll send a PR with fixes to each of these shortly.
> I've also not found where to observe output from successfully running the example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)