You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/14 15:29:23 UTC

[GitHub] [flink] GJL commented on issue #10746: [FLINK-15417] Remove the docker volume or mount when starting Mesos e…

GJL commented on issue #10746: [FLINK-15417] Remove the docker volume or mount when starting Mesos e…
URL: https://github.com/apache/flink/pull/10746#issuecomment-574229434
 
 
   When we opened FLINK-15377, we discovered that that the mesos logs cannot be deleted due to permission problems (logs written in the container have a different owner than the host's current user id). Since we are also deleting the job's output that is from within the container, I was wondering why the test is currently passing on Travis at all. It turns out that we also fail to remove the job's output (`${TEST_DATA_DIR}/out/wc_out_mesos`). However, because the clean up code is executed as a `trap`, and we redirect `stderr` to `/dev/null` [1], the error is never visible on Travis. 
   
   Your PR works around the permission problem by copying the data from the container. The issues I see with this approach are:
   - `wait_job_terminal_state_mesos` uses a new strategy to poll whether the job has terminated (compared to the standalone mode)
   - `copy_logs_from_container` will not work if the container dies unexpectedly, making it hard to debug a certain class of bugs (mesos exiting prematurely)
   
   The approach described [here](https://vsupalov.com/docker-shared-permissions/) doesn't work on OS X due to the docker daemon running on a hypervisor. However, on OS X we apparently do not suffer from the permission issue.
   
   A simple way out of this might be to just create the directories in advance with the right permissions. For example, if we ran
   
   ```
   mkdir ${TEST_DATA_DIR}/out/ # run on the host, not in the container
   ```
   
   prior to submitting the job, we will be able to delete the `/out` directory later from the host (unless nested directories are created within the container). Another option I see is to run `chmod -R ugo+rw` at the end of the test from within the container against all files/directories that we need to delete later from the host.
   
   Let me know what you think.
   
   [1] https://github.com/apache/flink/blob/6f6fb43ca2f8413e81a1b19e77c5cf3101b7e61d/flink-end-to-end-tests/test-scripts/test-runner-common.sh#L107

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services