You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Joseph Wu <jo...@mesosphere.io> on 2017/05/02 01:57:56 UTC

Review Request 58900: Changed the default fetcher cache directory.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/
-----------------------------------------------------------

Review request for mesos and Jie Yu.


Bugs: MESOS-7304
    https://issues.apache.org/jira/browse/MESOS-7304


Repository: mesos


Description
-------

The fetcher cache directory was historically located (by default)
in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
be used to change this value.

The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
for each `SlaveID`.  This was done because multiple agents can run on
the same node.  If all the agents use the same default fetcher cache
directory, they will collide and cause unpredictable results.
As a result, the `SlaveID` needed to be passed into the fetcher
after the agent recovers and/or registers with the master, because
that is when the `SlaveID` is determined.

This changes the default fetcher cache directory to
`/tmp/mesos/fetch/XXXXXX`, where the 6 X's are replaced by `::mkdtemp`.
The `SlaveID` subdirectory has also been removed.

This change, while techically a breaking change, is safe because of
how the fetcher uses this directory.  Upon starting up, the fetcher
"recovers" by clearing this directory.  By using a temporary directory,
we similarly get an empty fetcher cache upon startup.

This change will only cause breakages if multiple agents are run
with the same `--fetcher_cache_dir`.  In this case, each agent
will delete the fetcher caches of all the other agents.


Diffs
-----

  src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
  src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
  src/slave/flags.cpp ed99fadbf1aa91f7f0a57c9fb351c0247a40c6f4 


Diff: https://reviews.apache.org/r/58900/diff/1/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu


Re: Review Request 58900: Changed the default fetcher cache directory.

Posted by Joseph Wu <jo...@mesosphere.io>.

> On May 11, 2017, 1:53 a.m., Jie Yu wrote:
> > src/local/local.cpp
> > Lines 372-374 (patched)
> > <https://reviews.apache.org/r/58900/diff/5/?file=1714302#file1714302line372>
> >
> >     I'd suggest we put fetcher dir under work_dir as well for local mode so that all directories related to this local run have a common parent directory.
> >     
> >     ```
> >     propagatedFlags["work_dir"] =
> >       path::join(flags.work_dir, "agents", stringify(i), "work_dir");
> >       
> >     propagatedFlags["runtime_dir"] =
> >       path::join(flags.work_dir, "agents", stringify(i), "runtime_dir");
> >       
> >     propagatedFlags["fetcher_cache_dir"] =
> >       path::join(flags.work_dir, "agents", stringify(i), "fetcher_cache_dir");
> >     ```
> >     
> >     No need for the `runtime_dir` in local flags.

This is logically a separate change (with some flag documentation updates), so I'll open a separate review with this change.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/#review174618
-----------------------------------------------------------


On May 19, 2017, 3:05 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58900/
> -----------------------------------------------------------
> 
> (Updated May 19, 2017, 3:05 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-7304
>     https://issues.apache.org/jira/browse/MESOS-7304
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The fetcher cache directory was historically located (by default)
> in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
> be used to change this value.
> 
> The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
> for each `SlaveID`.  This was done because multiple agents can run on
> the same node.  If all the agents use the same default fetcher cache
> directory, they will collide and cause unpredictable results.
> As a result, the `SlaveID` needed to be passed into the fetcher
> after the agent recovers and/or registers with the master, because
> that is when the `SlaveID` is determined.
> 
> This changes the default fetcher cache directory to
> `/tmp/mesos/fetch`.  The `SlaveID` subdirectory has been removed.
> 
> This change, while techically a breaking change, is safe because of
> how the fetcher uses this directory.  Upon starting up, the fetcher
> "recovers" by clearing this directory.  Although the subdirectory
> has been removed, the fetcher still clears the fetcher cache
> on startup.
> 
> This change will only cause breakages if multiple agents are run
> with the same `--fetcher_cache_dir`.  In this case, each agent
> will delete the fetcher caches of all the other agents.
> 
> ---
> 
> With the removal of the `SlaveID` field in the fetcher's methods,
> it is no longer necessary to pass in the `SlaveID` or agent Flags
> at agent recovery time.  Instead, the flags can be passed in during
> the fetcher's construction.
> 
> Similarly, the fetcher's "recovery" (clearing the fetcher cache)
> can be done immediately upon construction, which simplifies the
> code slightly.
> 
> 
> Diffs
> -----
> 
>   src/local/local.cpp e47980929db2da1f31cf899a0e1fc452070e11f3 
>   src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
>   src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
>   src/slave/flags.cpp e172aa526cfd38f4f30efda22ef011146e5c1a7d 
>   src/slave/main.cpp 507d59996a90f51c7cd1e0f836a34c2e9c484791 
>   src/slave/slave.cpp 14de72fa4a8746504a251d735bf56041b934df40 
> 
> 
> Diff: https://reviews.apache.org/r/58900/diff/6/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 58900: Changed the default fetcher cache directory.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/#review174618
-----------------------------------------------------------



Great clean up!


src/local/local.cpp
Lines 372-374 (patched)
<https://reviews.apache.org/r/58900/#comment247822>

    I'd suggest we put fetcher dir under work_dir as well for local mode so that all directories related to this local run have a common parent directory.
    
    ```
    propagatedFlags["work_dir"] =
      path::join(flags.work_dir, "agents", stringify(i), "work_dir");
      
    propagatedFlags["runtime_dir"] =
      path::join(flags.work_dir, "agents", stringify(i), "runtime_dir");
      
    propagatedFlags["fetcher_cache_dir"] =
      path::join(flags.work_dir, "agents", stringify(i), "fetcher_cache_dir");
    ```
    
    No need for the `runtime_dir` in local flags.


- Jie Yu


On May 10, 2017, 8:32 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58900/
> -----------------------------------------------------------
> 
> (Updated May 10, 2017, 8:32 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-7304
>     https://issues.apache.org/jira/browse/MESOS-7304
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The fetcher cache directory was historically located (by default)
> in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
> be used to change this value.
> 
> The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
> for each `SlaveID`.  This was done because multiple agents can run on
> the same node.  If all the agents use the same default fetcher cache
> directory, they will collide and cause unpredictable results.
> As a result, the `SlaveID` needed to be passed into the fetcher
> after the agent recovers and/or registers with the master, because
> that is when the `SlaveID` is determined.
> 
> This changes the default fetcher cache directory to
> `/tmp/mesos/fetch`.  The `SlaveID` subdirectory has been removed.
> 
> This change, while techically a breaking change, is safe because of
> how the fetcher uses this directory.  Upon starting up, the fetcher
> "recovers" by clearing this directory.  Although the subdirectory
> has been removed, the fetcher still clears the fetcher cache
> on startup.
> 
> This change will only cause breakages if multiple agents are run
> with the same `--fetcher_cache_dir`.  In this case, each agent
> will delete the fetcher caches of all the other agents.
> 
> ---
> 
> With the removal of the `SlaveID` field in the fetcher's methods,
> it is no longer necessary to pass in the `SlaveID` or agent Flags
> at agent recovery time.  Instead, the flags can be passed in during
> the fetcher's construction.
> 
> Similarly, the fetcher's "recovery" (clearing the fetcher cache)
> can be done immediately upon construction, which simplifies the
> code slightly.
> 
> 
> Diffs
> -----
> 
>   src/local/local.cpp e47980929db2da1f31cf899a0e1fc452070e11f3 
>   src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
>   src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
>   src/slave/flags.cpp bc63a6a4cb6115b4b4d592e67e34045f52b50d4c 
>   src/slave/main.cpp 507d59996a90f51c7cd1e0f836a34c2e9c484791 
>   src/slave/slave.cpp 209aff180db1a68ae360d7c278937fe645efdb2b 
> 
> 
> Diff: https://reviews.apache.org/r/58900/diff/5/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 58900: Changed the default fetcher cache directory.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/
-----------------------------------------------------------

(Updated May 19, 2017, 3:05 p.m.)


Review request for mesos and Jie Yu.


Changes
-------

Removed some refactoring artifacts.


Bugs: MESOS-7304
    https://issues.apache.org/jira/browse/MESOS-7304


Repository: mesos


Description
-------

The fetcher cache directory was historically located (by default)
in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
be used to change this value.

The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
for each `SlaveID`.  This was done because multiple agents can run on
the same node.  If all the agents use the same default fetcher cache
directory, they will collide and cause unpredictable results.
As a result, the `SlaveID` needed to be passed into the fetcher
after the agent recovers and/or registers with the master, because
that is when the `SlaveID` is determined.

This changes the default fetcher cache directory to
`/tmp/mesos/fetch`.  The `SlaveID` subdirectory has been removed.

This change, while techically a breaking change, is safe because of
how the fetcher uses this directory.  Upon starting up, the fetcher
"recovers" by clearing this directory.  Although the subdirectory
has been removed, the fetcher still clears the fetcher cache
on startup.

This change will only cause breakages if multiple agents are run
with the same `--fetcher_cache_dir`.  In this case, each agent
will delete the fetcher caches of all the other agents.

---

With the removal of the `SlaveID` field in the fetcher's methods,
it is no longer necessary to pass in the `SlaveID` or agent Flags
at agent recovery time.  Instead, the flags can be passed in during
the fetcher's construction.

Similarly, the fetcher's "recovery" (clearing the fetcher cache)
can be done immediately upon construction, which simplifies the
code slightly.


Diffs (updated)
-----

  src/local/local.cpp e47980929db2da1f31cf899a0e1fc452070e11f3 
  src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
  src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
  src/slave/flags.cpp e172aa526cfd38f4f30efda22ef011146e5c1a7d 
  src/slave/main.cpp 507d59996a90f51c7cd1e0f836a34c2e9c484791 
  src/slave/slave.cpp 14de72fa4a8746504a251d735bf56041b934df40 


Diff: https://reviews.apache.org/r/58900/diff/6/

Changes: https://reviews.apache.org/r/58900/diff/5-6/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu


Re: Review Request 58900: Changed the default fetcher cache directory.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/#review174619
-----------------------------------------------------------


Ship it!




Ship It!

- Jie Yu


On May 10, 2017, 8:32 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58900/
> -----------------------------------------------------------
> 
> (Updated May 10, 2017, 8:32 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-7304
>     https://issues.apache.org/jira/browse/MESOS-7304
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The fetcher cache directory was historically located (by default)
> in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
> be used to change this value.
> 
> The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
> for each `SlaveID`.  This was done because multiple agents can run on
> the same node.  If all the agents use the same default fetcher cache
> directory, they will collide and cause unpredictable results.
> As a result, the `SlaveID` needed to be passed into the fetcher
> after the agent recovers and/or registers with the master, because
> that is when the `SlaveID` is determined.
> 
> This changes the default fetcher cache directory to
> `/tmp/mesos/fetch`.  The `SlaveID` subdirectory has been removed.
> 
> This change, while techically a breaking change, is safe because of
> how the fetcher uses this directory.  Upon starting up, the fetcher
> "recovers" by clearing this directory.  Although the subdirectory
> has been removed, the fetcher still clears the fetcher cache
> on startup.
> 
> This change will only cause breakages if multiple agents are run
> with the same `--fetcher_cache_dir`.  In this case, each agent
> will delete the fetcher caches of all the other agents.
> 
> ---
> 
> With the removal of the `SlaveID` field in the fetcher's methods,
> it is no longer necessary to pass in the `SlaveID` or agent Flags
> at agent recovery time.  Instead, the flags can be passed in during
> the fetcher's construction.
> 
> Similarly, the fetcher's "recovery" (clearing the fetcher cache)
> can be done immediately upon construction, which simplifies the
> code slightly.
> 
> 
> Diffs
> -----
> 
>   src/local/local.cpp e47980929db2da1f31cf899a0e1fc452070e11f3 
>   src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
>   src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
>   src/slave/flags.cpp bc63a6a4cb6115b4b4d592e67e34045f52b50d4c 
>   src/slave/main.cpp 507d59996a90f51c7cd1e0f836a34c2e9c484791 
>   src/slave/slave.cpp 209aff180db1a68ae360d7c278937fe645efdb2b 
> 
> 
> Diff: https://reviews.apache.org/r/58900/diff/5/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 58900: Changed the default fetcher cache directory.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/
-----------------------------------------------------------

(Updated May 10, 2017, 1:32 p.m.)


Review request for mesos and Jie Yu.


Changes
-------

Need to change the local cluster helper to create a different fetcher cache directory per agent.


Bugs: MESOS-7304
    https://issues.apache.org/jira/browse/MESOS-7304


Repository: mesos


Description
-------

The fetcher cache directory was historically located (by default)
in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
be used to change this value.

The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
for each `SlaveID`.  This was done because multiple agents can run on
the same node.  If all the agents use the same default fetcher cache
directory, they will collide and cause unpredictable results.
As a result, the `SlaveID` needed to be passed into the fetcher
after the agent recovers and/or registers with the master, because
that is when the `SlaveID` is determined.

This changes the default fetcher cache directory to
`/tmp/mesos/fetch`.  The `SlaveID` subdirectory has been removed.

This change, while techically a breaking change, is safe because of
how the fetcher uses this directory.  Upon starting up, the fetcher
"recovers" by clearing this directory.  Although the subdirectory
has been removed, the fetcher still clears the fetcher cache
on startup.

This change will only cause breakages if multiple agents are run
with the same `--fetcher_cache_dir`.  In this case, each agent
will delete the fetcher caches of all the other agents.

---

With the removal of the `SlaveID` field in the fetcher's methods,
it is no longer necessary to pass in the `SlaveID` or agent Flags
at agent recovery time.  Instead, the flags can be passed in during
the fetcher's construction.

Similarly, the fetcher's "recovery" (clearing the fetcher cache)
can be done immediately upon construction, which simplifies the
code slightly.


Diffs (updated)
-----

  src/local/local.cpp e47980929db2da1f31cf899a0e1fc452070e11f3 
  src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
  src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
  src/slave/flags.cpp bc63a6a4cb6115b4b4d592e67e34045f52b50d4c 
  src/slave/main.cpp 507d59996a90f51c7cd1e0f836a34c2e9c484791 
  src/slave/slave.cpp 209aff180db1a68ae360d7c278937fe645efdb2b 


Diff: https://reviews.apache.org/r/58900/diff/4/

Changes: https://reviews.apache.org/r/58900/diff/3-4/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu


Re: Review Request 58900: Changed the default fetcher cache directory.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/
-----------------------------------------------------------

(Updated May 10, 2017, 1:24 p.m.)


Review request for mesos and Jie Yu.


Changes
-------

Went with the simplest approach.  The `--fetcher_cache_dir` is no longer the _parent_ of the fetcher cache directory.  It is the fetcher cache directory.  Period.


Bugs: MESOS-7304
    https://issues.apache.org/jira/browse/MESOS-7304


Repository: mesos


Description (updated)
-------

The fetcher cache directory was historically located (by default)
in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
be used to change this value.

The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
for each `SlaveID`.  This was done because multiple agents can run on
the same node.  If all the agents use the same default fetcher cache
directory, they will collide and cause unpredictable results.
As a result, the `SlaveID` needed to be passed into the fetcher
after the agent recovers and/or registers with the master, because
that is when the `SlaveID` is determined.

This changes the default fetcher cache directory to
`/tmp/mesos/fetch`.  The `SlaveID` subdirectory has been removed.

This change, while techically a breaking change, is safe because of
how the fetcher uses this directory.  Upon starting up, the fetcher
"recovers" by clearing this directory.  Although the subdirectory
has been removed, the fetcher still clears the fetcher cache
on startup.

This change will only cause breakages if multiple agents are run
with the same `--fetcher_cache_dir`.  In this case, each agent
will delete the fetcher caches of all the other agents.

---

With the removal of the `SlaveID` field in the fetcher's methods,
it is no longer necessary to pass in the `SlaveID` or agent Flags
at agent recovery time.  Instead, the flags can be passed in during
the fetcher's construction.

Similarly, the fetcher's "recovery" (clearing the fetcher cache)
can be done immediately upon construction, which simplifies the
code slightly.


Diffs (updated)
-----

  src/local/local.cpp e47980929db2da1f31cf899a0e1fc452070e11f3 
  src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
  src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
  src/slave/flags.cpp bc63a6a4cb6115b4b4d592e67e34045f52b50d4c 
  src/slave/main.cpp 507d59996a90f51c7cd1e0f836a34c2e9c484791 
  src/slave/slave.cpp 209aff180db1a68ae360d7c278937fe645efdb2b 


Diff: https://reviews.apache.org/r/58900/diff/3/

Changes: https://reviews.apache.org/r/58900/diff/2-3/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu


Re: Review Request 58900: Changed the default fetcher cache directory.

Posted by Jie Yu <yu...@gmail.com>.

> On May 8, 2017, 10:35 p.m., Joseph Wu wrote:
> > src/slave/containerizer/fetcher.cpp
> > Lines 261 (patched)
> > <https://reviews.apache.org/r/58900/diff/2/?file=1710782#file1710782line295>
> >
> >     This patch is still missing cleanup for this directory.
> >     
> >     Before this patch, we would "leak" fetcher cache directories whenever the SlaveID changed.  This change will make the fetcher leak the cache every time the agent is started.
> >     
> >     I'm considering the following options:
> >     1) Change the `--fetcher_cache_dir` to be the cache directory, rather than the _parent_ of the cache directory.  This has some implications when running multiple agents one a single machine (as the default value of the flag will no longer be acceptable).  But with a single cache directory, managing leaks is as easy as clearing the directory on startup.
> >     2) Delete the fetcher cache directory in the fetcher's destructor.  This will only really do anything when the agent is gracefully shut down.  But graceful shutdowns can be considered somewhat rare.
> >     3) Checkpoint the PID of the current process inside the fetcher cache directory.  Whenever the agent starts, it could scan for these PIDs and check which processes are still running.  We can then delete the caches of dead processes.

First of all, i think making `--fetcher_cache_dir` be the cache directory makes sense to me. I won't be too worried about the case of running multiple agents. If we want to run multiple agents, there are many other flags that we need to make unique as well (e.g., runtime_dir, cgroups_root, etc.)

For 3), can we avoid checkpointing. For instance, can we introduce a `trash` directory under `fetcher_cache_dir`, and rely on `rename` to atomically move the cached artifacts to the trash directory for removal.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/#review174232
-----------------------------------------------------------


On May 8, 2017, 10:20 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58900/
> -----------------------------------------------------------
> 
> (Updated May 8, 2017, 10:20 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-7304
>     https://issues.apache.org/jira/browse/MESOS-7304
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The fetcher cache directory was historically located (by default)
> in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
> be used to change this value.
> 
> The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
> for each `SlaveID`.  This was done because multiple agents can run on
> the same node.  If all the agents use the same default fetcher cache
> directory, they will collide and cause unpredictable results.
> As a result, the `SlaveID` needed to be passed into the fetcher
> after the agent recovers and/or registers with the master, because
> that is when the `SlaveID` is determined.
> 
> This changes the default fetcher cache directory to
> `/tmp/mesos/fetch/XXXXXX`, where the 6 X's are replaced by `::mkdtemp`.
> The `SlaveID` subdirectory has also been removed.
> 
> This change, while techically a breaking change, is safe because of
> how the fetcher uses this directory.  Upon starting up, the fetcher
> "recovers" by clearing this directory.  By using a temporary directory,
> we similarly get an empty fetcher cache upon startup.
> 
> This change will only cause breakages if multiple agents are run
> with the same `--fetcher_cache_dir`.  In this case, each agent
> will delete the fetcher caches of all the other agents.
> 
> ---
> 
> With the removal of the `SlaveID` field in the fetcher's methods,
> it is no longer necessary to pass in the `SlaveID` or agent Flags
> at agent recovery time.  Instead, the flags can be passed in during
> the fetcher's construction.
> 
> Similarly, the fetcher's "recovery" (clearing the fetcher cache)
> can be done immediately upon construction, which simplifies the
> code slightly.
> 
> 
> Diffs
> -----
> 
>   src/local/local.cpp e47980929db2da1f31cf899a0e1fc452070e11f3 
>   src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
>   src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
>   src/slave/flags.hpp e5784ef81ad0720c7ec061ee0b28b8fadae77afd 
>   src/slave/flags.cpp ed99fadbf1aa91f7f0a57c9fb351c0247a40c6f4 
>   src/slave/main.cpp 507d59996a90f51c7cd1e0f836a34c2e9c484791 
>   src/slave/slave.cpp a37c6c888e7573209aadb07576cfb727fa1ec4ff 
> 
> 
> Diff: https://reviews.apache.org/r/58900/diff/2/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 58900: Changed the default fetcher cache directory.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/#review174232
-----------------------------------------------------------



Request for comment...


src/slave/containerizer/fetcher.cpp
Lines 261 (patched)
<https://reviews.apache.org/r/58900/#comment247344>

    This patch is still missing cleanup for this directory.
    
    Before this patch, we would "leak" fetcher cache directories whenever the SlaveID changed.  This change will make the fetcher leak the cache every time the agent is started.
    
    I'm considering the following options:
    1) Change the `--fetcher_cache_dir` to be the cache directory, rather than the _parent_ of the cache directory.  This has some implications when running multiple agents one a single machine (as the default value of the flag will no longer be acceptable).  But with a single cache directory, managing leaks is as easy as clearing the directory on startup.
    2) Delete the fetcher cache directory in the fetcher's destructor.  This will only really do anything when the agent is gracefully shut down.  But graceful shutdowns can be considered somewhat rare.
    3) Checkpoint the PID of the current process inside the fetcher cache directory.  Whenever the agent starts, it could scan for these PIDs and check which processes are still running.  We can then delete the caches of dead processes.


- Joseph Wu


On May 8, 2017, 3:20 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58900/
> -----------------------------------------------------------
> 
> (Updated May 8, 2017, 3:20 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-7304
>     https://issues.apache.org/jira/browse/MESOS-7304
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The fetcher cache directory was historically located (by default)
> in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
> be used to change this value.
> 
> The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
> for each `SlaveID`.  This was done because multiple agents can run on
> the same node.  If all the agents use the same default fetcher cache
> directory, they will collide and cause unpredictable results.
> As a result, the `SlaveID` needed to be passed into the fetcher
> after the agent recovers and/or registers with the master, because
> that is when the `SlaveID` is determined.
> 
> This changes the default fetcher cache directory to
> `/tmp/mesos/fetch/XXXXXX`, where the 6 X's are replaced by `::mkdtemp`.
> The `SlaveID` subdirectory has also been removed.
> 
> This change, while techically a breaking change, is safe because of
> how the fetcher uses this directory.  Upon starting up, the fetcher
> "recovers" by clearing this directory.  By using a temporary directory,
> we similarly get an empty fetcher cache upon startup.
> 
> This change will only cause breakages if multiple agents are run
> with the same `--fetcher_cache_dir`.  In this case, each agent
> will delete the fetcher caches of all the other agents.
> 
> ---
> 
> With the removal of the `SlaveID` field in the fetcher's methods,
> it is no longer necessary to pass in the `SlaveID` or agent Flags
> at agent recovery time.  Instead, the flags can be passed in during
> the fetcher's construction.
> 
> Similarly, the fetcher's "recovery" (clearing the fetcher cache)
> can be done immediately upon construction, which simplifies the
> code slightly.
> 
> 
> Diffs
> -----
> 
>   src/local/local.cpp e47980929db2da1f31cf899a0e1fc452070e11f3 
>   src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
>   src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
>   src/slave/flags.hpp e5784ef81ad0720c7ec061ee0b28b8fadae77afd 
>   src/slave/flags.cpp ed99fadbf1aa91f7f0a57c9fb351c0247a40c6f4 
>   src/slave/main.cpp 507d59996a90f51c7cd1e0f836a34c2e9c484791 
>   src/slave/slave.cpp a37c6c888e7573209aadb07576cfb727fa1ec4ff 
> 
> 
> Diff: https://reviews.apache.org/r/58900/diff/2/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 58900: Changed the default fetcher cache directory.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58900/
-----------------------------------------------------------

(Updated May 8, 2017, 3:20 p.m.)


Review request for mesos and Jie Yu.


Changes
-------

Changed when the fetcher cache directory is generated (in the default case).  Instead of being generated by the flags, it is instead generated by the fetcher process.


Bugs: MESOS-7304
    https://issues.apache.org/jira/browse/MESOS-7304


Repository: mesos


Description (updated)
-------

The fetcher cache directory was historically located (by default)
in `/tmp/mesos/fetch`.  The agent flag `--fetcher_cache_dir` could
be used to change this value.

The fetcher would create a subdirectory underneath `/tmp/mesos/fetch`
for each `SlaveID`.  This was done because multiple agents can run on
the same node.  If all the agents use the same default fetcher cache
directory, they will collide and cause unpredictable results.
As a result, the `SlaveID` needed to be passed into the fetcher
after the agent recovers and/or registers with the master, because
that is when the `SlaveID` is determined.

This changes the default fetcher cache directory to
`/tmp/mesos/fetch/XXXXXX`, where the 6 X's are replaced by `::mkdtemp`.
The `SlaveID` subdirectory has also been removed.

This change, while techically a breaking change, is safe because of
how the fetcher uses this directory.  Upon starting up, the fetcher
"recovers" by clearing this directory.  By using a temporary directory,
we similarly get an empty fetcher cache upon startup.

This change will only cause breakages if multiple agents are run
with the same `--fetcher_cache_dir`.  In this case, each agent
will delete the fetcher caches of all the other agents.

---

With the removal of the `SlaveID` field in the fetcher's methods,
it is no longer necessary to pass in the `SlaveID` or agent Flags
at agent recovery time.  Instead, the flags can be passed in during
the fetcher's construction.

Similarly, the fetcher's "recovery" (clearing the fetcher cache)
can be done immediately upon construction, which simplifies the
code slightly.


Diffs (updated)
-----

  src/local/local.cpp e47980929db2da1f31cf899a0e1fc452070e11f3 
  src/slave/containerizer/fetcher.hpp 9e3018dc087ed55c61b2824d0105bc5339b83043 
  src/slave/containerizer/fetcher.cpp a910fea5a5556afb376524c5bb2ff98d7d84e611 
  src/slave/flags.hpp e5784ef81ad0720c7ec061ee0b28b8fadae77afd 
  src/slave/flags.cpp ed99fadbf1aa91f7f0a57c9fb351c0247a40c6f4 
  src/slave/main.cpp 507d59996a90f51c7cd1e0f836a34c2e9c484791 
  src/slave/slave.cpp a37c6c888e7573209aadb07576cfb727fa1ec4ff 


Diff: https://reviews.apache.org/r/58900/diff/2/

Changes: https://reviews.apache.org/r/58900/diff/1-2/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu