You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Bernd Mathiske <be...@mesosphere.io> on 2015/03/01 11:35:30 UTC

Re: Review Request 30774: Fetcher Cache

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 1, 2015, 2:35 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Improved fetcher cache tests, introduced two cache eviction tests. One tests whether eviction succeeds, the other also tests what happens if it fails.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am 17d0d7aa7361c3a373f6863d36b0a4767f5c05c4 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp 9f31fa46304398e8f87b41b55d8f4cfd4aba10b9 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 0ae0a10b41a2c0f7459c771b31c76bbc0c02df4f 
  src/tests/mesos.hpp f7a0d057edea1a7ec7ae3bb9bc729230bf7dd46d 
  src/tests/mesos.cpp 23f790cbb289f6483dcdfa6ecccd462360ce02f1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review74703
-----------------------------------------------------------


Patch looks great!

Reviews applied: [30606, 30609, 30774]

All tests passed.

- Mesos ReviewBot


On March 1, 2015, 3:42 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 1, 2015, 3:42 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 2, 2015, 8:13 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 759
> > <https://reviews.apache.org/r/30774/diff/25/?file=882228#file882228line759>
> >
> >     If you're incrementing all the time just to count, why not just get the size from list?
> 
> Bernd Mathiske wrote:
>     I am not incrementing to count anything. I am incrementing to hit the right index in a vector that parallels the list I am iterating over. Is there a C++ or Boost construct that can do this without indices?

Switched to using a const_iterator for this. This should be more obviously paralleling the foreach.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review74885
-----------------------------------------------------------


On March 7, 2015, 7:21 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 7, 2015, 7:21 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 2, 2015, 8:13 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 759
> > <https://reviews.apache.org/r/30774/diff/25/?file=882228#file882228line759>
> >
> >     If you're incrementing all the time just to count, why not just get the size from list?

I am not incrementing to count anything. I am incrementing to hit the right index in a vector that parallels the list I am iterating over. Is there a C++ or Boost construct that can do this without indices?


> On March 2, 2015, 8:13 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 831
> > <https://reviews.apache.org/r/30774/diff/25/?file=882228#file882228line831>
> >
> >     Why is the check entries necessary? Seems like if this for test only we should do the validations in test?

This is "in tests". This method is for testing. It says so in its header file comment.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review74885
-----------------------------------------------------------


On March 3, 2015, 5:01 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 3, 2015, 5:01 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   docs/images/fetch_cache.jpg PRE-CREATION 
>   docs/images/fetch_components.jpg PRE-CREATION 
>   docs/images/fetch_flow.jpg PRE-CREATION 
>   docs/images/fetch_force1.jpg PRE-CREATION 
>   docs/images/fetch_force2.jpg PRE-CREATION 
>   docs/images/fetch_state.jpg PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Timothy Chen <tn...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review74885
-----------------------------------------------------------



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment121710>

    If you're incrementing all the time just to count, why not just get the size from list?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment121711>

    Why is the check entries necessary? Seems like if this for test only we should do the validations in test?


- Timothy Chen


On March 2, 2015, 6:27 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 6:27 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   docs/images/fetch_components.jpg PRE-CREATION 
>   docs/images/fetch_flow.jpg PRE-CREATION 
>   docs/images/fetch_state.jpg PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 4, 2015, 4:39 p.m., Jay Buffington wrote:
> > Hey Bernd,
> > 
> > I'm really looking forward to this feature.  There's a lot here, so I was hoping you could help me understand by responding to some of these questions:
> > 
> > Why do you need the cache table data structure?  Just use the filesystem?
> > Why are the expanded files cached as well?  
> > There shouldn’t be different behavior if we’re using the cache.  My understaing is that with this patch, if we use the cache the tar doesn’t exist in the sandbox.  Isn't this a regression?
> > What’s the point of segregating the cache by user?
> > Why not respect http caching headers?
> > Why does the framework need to even know if the cache is in use or not?
> > The images referenced in the fetcher docs aren’t part of the review.  Where can I find them?
> > 
> > Thanks!
> > Jay

Hi Jay,

thanks for these great questions! In summary, everything you are asking for feature-wise can be offered later (soonish) by relatively simple to implement feature additions. 

Answers to your questions in order as follows.

- If I just used the file system to implement the cache without a libprocess actor as complement, I would need to persist state about cache contents, use file locks, coordinate multiple instances of running mesos-fetcher programs, etc. There is a possible alternative architecture for this that would also work. See the JIRA commoents on MESOS-336 for an earlier discussion on this. My personal preference would be to perhaps further develop what is now FetcherProcess into an external program (with fail-over) rather than trying to beef up mesos-fetcher, which would lead to a lot of IPC for coordination.
- I am not aware of caching expanded files. We only cache the archive file itself.
- Not having a tar file in the sandbox is not a regression if you see using the cache at all as a new feature. But I can copy it over optionally if so desired in an add-on patch. This is just MVP and it seems more likely that people would rather not have the tar file copy.
- I would not want to have a framework for one user plant a cache file that a framework of another user then picks up. This file could be lying around for a long time, from way before the second framework starts. We can later make this optional as an extra feature. I am erring on the side of caution in this MVP.
- Excellent suggestion. But this is for later. Extra feature that I also find important.
- We can have another URI.cache value that makes it so.
- Sorry for having removed the images for now. I had trouble applying the patch with pictures in it. Advice on what git/RB supports here is welcome! For now, you can git clone https://github.com/bernd-mesos/MesosFetcherDocs and then open the md files locally or you can look at the PDFs which I also uploaded.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75266
-----------------------------------------------------------


On March 5, 2015, 3:15 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 5, 2015, 3:15 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Jay Buffington <me...@jaybuff.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75266
-----------------------------------------------------------


Hey Bernd,

I'm really looking forward to this feature.  There's a lot here, so I was hoping you could help me understand by responding to some of these questions:

Why do you need the cache table data structure?  Just use the filesystem?
Why are the expanded files cached as well?  
There shouldn’t be different behavior if we’re using the cache.  My understaing is that with this patch, if we use the cache the tar doesn’t exist in the sandbox.  Isn't this a regression?
What’s the point of segregating the cache by user?
Why not respect http caching headers?
Why does the framework need to even know if the cache is in use or not?
The images referenced in the fetcher docs aren’t part of the review.  Where can I find them?

Thanks!
Jay


docs/fetcher-cache-internals.md
<https://reviews.apache.org/r/30774/#comment122236>

    s/eactly/exactly/


- Jay Buffington


On March 4, 2015, 12:54 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 4, 2015, 12:54 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 10, 2015, 8:35 a.m., Benjamin Hindman wrote:
> > src/launcher/fetcher.cpp, line 321
> > <https://reviews.apache.org/r/30774/diff/33/?file=888350#file888350line321>
> >
> >     Can you comment the relationship between the FetcherInfo::Item and the FetcherInfo here? Is the FetcherInfo::Item within the FetcherInfo but FetcherInfo is included because you just want to get the 'sandbox_directory' and 'cache_directory' and rather than pulling those out explicitly you just passed the entire FetcherInfo?

There are more items in the FetcherInfo than just the one we are working in here. That's why this one is called out explicitly. I changed this to passing both directories in.


> On March 10, 2015, 8:35 a.m., Benjamin Hindman wrote:
> > src/launcher/fetcher.cpp, lines 364-366
> > <https://reviews.apache.org/r/30774/diff/33/?file=888350#file888350line364>
> >
> >     Why are these not CHECKs? Since you're the one setting up the FetcherInfo it seems like you should know explicitly whether or not the cache_filename was set!
> >     
> >     Same for the cache_directory below as well.

What if somebody else uses mesos-fetcher?


> On March 10, 2015, 8:35 a.m., Benjamin Hindman wrote:
> > src/launcher/fetcher.cpp, lines 403-404
> > <https://reviews.apache.org/r/30774/diff/33/?file=888350#file888350line403>
> >
> >     As mentioned above, it would be great to really capture the relationship between the FetcherInfo and the FetcherInfo::Item. If The FetcherInfo encapsulates the FetcherInfo::Item I would also suggest switching the order of the parameters to signify that.

The main purpose here is to fetch this one particular item, not everything FetcherInfo carries. FetcherInfo is a secondary parameter that provides extra parameters like cache_directory, sandbox_directory, and framework_home. Putting it second makes this relationship clear IMHO. Do you suggest adding all these as individual parameters?

Yes, the item is included in the list of items in FetcherInfo. Shall we break up FetcherInfo into several shells, the inner one without items?


> On March 10, 2015, 8:35 a.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, line 487
> > <https://reviews.apache.org/r/30774/diff/33/?file=888358#file888358line487>
> >
> >     Can we make this a Path to start?

Then it would be the only one. Confusing. I'd rather have a wholesale sweep over the whole code base to introduce Path - as a separate ticket.


> On March 10, 2015, 8:35 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 796
> > <https://reviews.apache.org/r/30774/diff/33/?file=888359#file888359line796>
> >
> >     We should do recovery on the fetcher itself:
> >     
> >     Try<Nothing> recover = fetcher->recover(flags, slaveId);
> >     
> >     It seems very weird to have a static generic Fetcher recover functionality that implies that we can't have multiple Fetchers running at the same time. How do we start multiple slaves at the same time?

This is an artefact of the lack of injection of slaveId and flags. It should be cleaned up when we refactor those. The slave does not have access to the fetcher instance as it is right now. It would cause a lot of collateral changes if it did. I advise to refrain for now. I have put a comment at the static method to explain this. That's the best fix for now IMHO.

There is no problem starting multiple slaves, because they all have a different slaveID that gets passed into this call.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75754
-----------------------------------------------------------


On March 7, 2015, 7:21 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 7, 2015, 7:21 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 10, 2015, 8:35 a.m., Benjamin Hindman wrote:
> > src/launcher/fetcher.cpp, lines 364-366
> > <https://reviews.apache.org/r/30774/diff/33/?file=888350#file888350line364>
> >
> >     Why are these not CHECKs? Since you're the one setting up the FetcherInfo it seems like you should know explicitly whether or not the cache_filename was set!
> >     
> >     Same for the cache_directory below as well.
> 
> Bernd Mathiske wrote:
>     What if somebody else uses mesos-fetcher?

Adding a comment why this is not a check.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75754
-----------------------------------------------------------


On April 30, 2015, 7:40 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 30, 2015, 7:40 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 10, 2015, 8:35 a.m., Benjamin Hindman wrote:
> > src/launcher/fetcher.cpp, lines 403-404
> > <https://reviews.apache.org/r/30774/diff/33/?file=888350#file888350line403>
> >
> >     As mentioned above, it would be great to really capture the relationship between the FetcherInfo and the FetcherInfo::Item. If The FetcherInfo encapsulates the FetcherInfo::Item I would also suggest switching the order of the parameters to signify that.
> 
> Bernd Mathiske wrote:
>     The main purpose here is to fetch this one particular item, not everything FetcherInfo carries. FetcherInfo is a secondary parameter that provides extra parameters like cache_directory, sandbox_directory, and framework_home. Putting it second makes this relationship clear IMHO. Do you suggest adding all these as individual parameters?
>     
>     Yes, the item is included in the list of items in FetcherInfo. Shall we break up FetcherInfo into several shells, the inner one without items?

Now using split up parameters instead of FetcherInfo to pass around sandbox dir, cache dir, frameworks home.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75754
-----------------------------------------------------------


On April 30, 2015, 7:40 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 30, 2015, 7:40 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75754
-----------------------------------------------------------



include/mesos/fetcher/fetcher.proto
<https://reviews.apache.org/r/30774/#comment122998>

    First, the "preloading" feature does not exist so let's not comment as much. Second, if you perceive this being optional in the future we should make it be optional now.



include/mesos/mesos.proto
<https://reviews.apache.org/r/30774/#comment122995>

    Let's remove the 'force' functionality for now. Why? First, as a user this is a tedious primitive to use: it requires the scheduler to set the field when it wants a new version of the URI but there is no way for the scheduler to know after the new version has been properly cached on a particular machine. If a user always wants the latest version they should just set 'cache' to false. A better approach here would be to introduce a 'sha' for the URI that the fetcher can compare against and if the SHAs are different force the download then (but let's not do the SHA right now, we can do that as a follow up, and keep the MVP simple).
    
    Given that then we'll have a basic enum for 'Cache' and folks have already been confused by the values 'FETCH' and 'FORCE' I think we should revert to the simpler: 'optional bool cache'. What would be other potential 'enum Cache' values going forward? If we can think of good examples then the right thing to do here would probably be to introduce a 'message CacheInfo' that we embed in URI.



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122999>

    Can we put a newline above this to make it easier to read? The code is very compressed here with the line continuation from 'sourcePath ='.



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment123000>

    Why not 'else if'? The value of doing 'else if' here is that this block is comparing the status of the same variable so continuing with an 'else if' helps to capture that semantics.



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment123001>

    Can we please add a TODO here to refactor this into stout so that people can more easily chmod an exectuable? For example, we could define some static flags so that someone can do:
    
    os::chmod(path, EXECUTABLE_CHMOD_FLAGS);



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment123004>

    Can we improve the error message here (and below), e.g.:
    
    return Error("Failed to determine the basename of the URI: " + basename.error());



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment123006>

    Can you comment the relationship between the FetcherInfo::Item and the FetcherInfo here? Is the FetcherInfo::Item within the FetcherInfo but FetcherInfo is included because you just want to get the 'sandbox_directory' and 'cache_directory' and rather than pulling those out explicitly you just passed the entire FetcherInfo?



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment123012>

    Understanding the logic here is confusing:
    
    (1) Why do we copy the file if extracting failed? It's clear to me that this is because this is the way we've always done this, but with the new code pulled into this function we should capture this with comments for a future reader.
    
    (2) Why do we return the sandbox directory when extraction was sucessful? Why do we return the value from 'chmodExecutable'? I would have expected to test whether 'chmodExecuable' was successful and then returned whatever the semantics of this function should return. Let's comment these functions return semantics.



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment123025>

    Why are these not CHECKs? Since you're the one setting up the FetcherInfo it seems like you should know explicitly whether or not the cache_filename was set!
    
    Same for the cache_directory below as well.



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment123015>

    Newline above here to break this up from the complicated line above.



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment123028>

    As mentioned above, it would be great to really capture the relationship between the FetcherInfo and the FetcherInfo::Item. If The FetcherInfo encapsulates the FetcherInfo::Item I would also suggest switching the order of the parameters to signify that.



src/slave/flags.hpp
<https://reviews.apache.org/r/30774/#comment123031>

    Can we make this a Path to start?



src/slave/slave.cpp
<https://reviews.apache.org/r/30774/#comment123202>

    We should do recovery on the fetcher itself:
    
    Try<Nothing> recover = fetcher->recover(flags, slaveId);
    
    It seems very weird to have a static generic Fetcher recover functionality that implies that we can't have multiple Fetchers running at the same time. How do we start multiple slaves at the same time?


- Benjamin Hindman


On March 7, 2015, 3:21 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 7, 2015, 3:21 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 9, 2015, 8:37 a.m., Joerg Schad wrote:
> > src/tests/fetcher_cache_tests.cpp, line 134
> > <https://reviews.apache.org/r/30774/diff/33/?file=888361#file888361line134>
> >
> >     Can't we simulate SERIALIZED_TASK externally (as discussed)? In this we would not have several modes...

The whole ExecutionMode enum should go. We should use executeTask inside the loop that creates TaskInfos in each test and then wait explicitly inide or outside the loop as needed. I'll refactor accordingly in the next iteration. Also, we don't need enum value FAIL_TO_FETCH. It's not used anywhere any more.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75700
-----------------------------------------------------------


On March 7, 2015, 7:21 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 7, 2015, 7:21 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 9, 2015, 8:37 a.m., Joerg Schad wrote:
> > include/mesos/mesos.proto, line 208
> > <https://reviews.apache.org/r/30774/diff/33/?file=888347#file888347line208>
> >
> >     Could you add a comment (i.e. backlink to the documention) reminding developers to update docs/fetcher.md when the protobuf is changed?

Since we are dropping the enum, there will be no such comment. There is one next to the remaining "cache" filed, though.


> On March 9, 2015, 8:37 a.m., Joerg Schad wrote:
> > src/slave/containerizer/fetcher.cpp, line 450
> > <https://reviews.apache.org/r/30774/diff/33/?file=888355#file888355line450>
> >
> >     size_t position?

Using an iterator now.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75700
-----------------------------------------------------------


On March 7, 2015, 7:21 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 7, 2015, 7:21 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Joerg Schad <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75700
-----------------------------------------------------------



docs/fetcher.md
<https://reviews.apache.org/r/30774/#comment122920>

    Could we add an explicit Note about the current behavior that the cache does not consider changed/updated URIs (It is mentioned in the above paragraph but having an explicit note about this would be helpful)?



include/mesos/mesos.proto
<https://reviews.apache.org/r/30774/#comment122919>

    Could you add a comment (i.e. backlink to the documention) reminding developers to update docs/fetcher.md when the protobuf is changed?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122922>

    size_t position?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122921>

    Should we add a Check that items and uris have the same length in order to document this assumption explicitly?



src/slave/flags.hpp
<https://reviews.apache.org/r/30774/#comment122918>

    Could you add a description of these flags to configuration.md (i.e. http://mesos.apache.org/documentation/latest/configuration/)?



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment122924>

    Can't we simulate SERIALIZED_TASK externally (as discussed)? In this we would not have several modes...


- Joerg Schad


On March 7, 2015, 3:21 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 7, 2015, 3:21 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review77019
-----------------------------------------------------------



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124786>

    The error case is handled right after line 711, which closes the branch for non-error.


- Bernd Mathiske


On March 17, 2015, 6:59 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 17, 2015, 6:59 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
>   src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
>   src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
>   src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 18, 2015, 11:05 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 491
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line491>
> >
> >     Why not just mock _fetch and do a barrier on it by giving it a promise in test?
> 
> Bernd Mathiske wrote:
>     "just mock _fetch" is more work and harder to understand.
>     
>     It would also function, but then you would need to touch test code every time you change _fetch(). Furthermore, it would not be as clear why we wait for this particular call.

Meanwhile I tried mocking _fetch, but it does not work. See the related/duplicate issue below. Let's drop this one here now so we can keep the comments on the same topic and code region in one place going forward, OK?


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review76902
-----------------------------------------------------------


On April 10, 2015, 4:33 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 10, 2015, 4:33 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 3a8e8bf303e0576c212951f6028af77e54d93537 
>   include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 
>   src/Makefile.am fa609da08e23d6595a3f6d2efddd3e333b6c78f1 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp 6893684e6d199a5d69fc8bba8e60c4acaae9c3c9 
>   src/slave/containerizer/docker.cpp f9fb07806e3b7d7d2afc1be3b8756eac23b32dcd 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp e4136095fca55637864f495098189ab3ad8d8fe7 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp 35f56252cfda5011d21aa188f33cc3e68a694968 
>   src/slave/slave.cpp 9fec023b643d410f4d511fa6f80e9835bab95b7e 
>   src/tests/docker_containerizer_tests.cpp c772d4c836de18b0e87636cb42200356d24ec73d 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 0e98572a62ae05437bd2bc800c370ad1a0c43751 
>   src/tests/mesos.cpp 02cbb4b8cf1206d0f32d160addc91d7e0f1ab28b 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 18, 2015, 11:05 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 503
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line503>
> >
> >     Since this is only called in one place, how about put this in ___fetch, pass it the future and check if it failed log it there?
> 
> Bernd Mathiske wrote:
>     How would this be simpler and more readable?
>     
>     What is wrong with abstracting functions that are called only once? Doing so saves a comment / pulls what would have been a comment into code!

Since we probably don't need a comment here, I'll fix it.


> On March 18, 2015, 11:05 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 726
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line726>
> >
> >     Why ignore error?
> 
> Bernd Mathiske wrote:
>     The code that follows this line as of line 712 handles the error case.

See issue below for resolution.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review76902
-----------------------------------------------------------


On March 18, 2015, 11:43 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 18, 2015, 11:43 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
>   src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
>   src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
>   src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 18, 2015, 11:05 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 406
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line406>
> >
> >     Do we call fetch even if we don't have anything to fetch? I think it will be a good idea to have a fast return if there is nothing to be fetched.

There is a check for this in Fetcher::fetch(). No need to even dispatch the call to the process either if there is nothing to fetch.


> On March 18, 2015, 11:05 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 491
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line491>
> >
> >     Why not just mock _fetch and do a barrier on it by giving it a promise in test?

"just mock _fetch" is more work and harder to understand.

It would also function, but then you would need to touch test code every time you change _fetch(). Furthermore, it would not be as clear why we wait for this particular call.


> On March 18, 2015, 11:05 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 503
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line503>
> >
> >     Since this is only called in one place, how about put this in ___fetch, pass it the future and check if it failed log it there?

How would this be simpler and more readable?

What is wrong with abstracting functions that are called only once? Doing so saves a comment / pulls what would have been a comment into code!


> On March 18, 2015, 11:05 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 518
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line518>
> >
> >     In what scenario should a cache entry not exist?
> >     If it doesn't somehow we won't be able to use it too?

As you can see at the call sites, this method is used in scenarios where fetching succeeded, where it failed, and incidentally where it left a partial download lying around. I added this comment:

  // We may or may not have started downloading. The download may or may
  // not have been partial. In any case, clean up whatever is there.
  
If there is no file, that's fine. Then we tried fetching and failed before starting to write the file. 

In any case, we remove the cache entry and the space amount it had reserved/claimed is released for later use.


> On March 18, 2015, 11:05 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 521
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line521>
> >
> >     Feel like this can be in a infintie loop, where if we can expire one item then forever other fetch items will get stuck?
> >     I wonder if we should have some remedy action, or simply crash too?

This is not a loop, because the cache entry gets removed BEFORE we attempt to delete the file. See line 500 just above.

However, just in case future changed code were ever to call this method several times on the same entry, I added a line that sets the entry's size field to zero. This way, accounted cache space is only released once.


> On March 18, 2015, 11:05 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 726
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line726>
> >
> >     Why ignore error?

The code that follows this line as of line 712 handles the error case.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review76902
-----------------------------------------------------------


On March 17, 2015, 6:59 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 17, 2015, 6:59 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
>   src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
>   src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
>   src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Timothy Chen <tn...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review76902
-----------------------------------------------------------



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124598>

    What's the intention of this helper method? 
    Seems odd to only have one for HDFS and not local or Net. If none of this is shared, I say we can lump this all in fetchSize.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124605>

    Do we call fetch even if we don't have anything to fetch? I think it will be a good idea to have a fast return if there is nothing to be fetched.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124599>

    Why not just mock _fetch and do a barrier on it by giving it a promise in test?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124602>

    Since this is only called in one place, how about put this in ___fetch, pass it the future and check if it failed log it there?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124600>

    In what scenario should a cache entry not exist?
    If it doesn't somehow we won't be able to use it too?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124601>

    Feel like this can be in a infintie loop, where if we can expire one item then forever other fetch items will get stuck?
    I wonder if we should have some remedy action, or simply crash too?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124607>

    Why ignore error?


- Timothy Chen


On March 17, 2015, 1:59 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 17, 2015, 1:59 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
>   src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
>   src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
>   src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 19, 2015, 9:40 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 379
> > <https://reviews.apache.org/r/30774/diff/38/?file=899705#file899705line379>
> >
> >     benh, what do you think of Bernd's contentionBarrier injection? commonly we always just mock the callback (_fetch in this case) in tests to block, but Bernd wanted to introduce a specific empty method for tests. I told him this is not a pattern we use in Mesos, but like to see what you think.
> 
> Bernd Mathiske wrote:
>     Of course I will stick to the prevalent patterns unless you start liking this one :-)

Turns out there is no method that offers itself opportunistically for the prevalent pattern. The next call up is AFTER waiting for the futures that the barrier needs to be BEFORE. Suggestions?

Alternatives (without judging them):
- Factor out the loop that gathers the futures and make it a method that gets called once. Then mock this method and have it dual-purposed as contention barrier. 
- Make the futures globally visible, await them in the test, too.
- Use an explicit lock instead of mocking.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review77057
-----------------------------------------------------------


On March 24, 2015, 6:57 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 24, 2015, 6:57 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
>   include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 
>   src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
>   src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
>   src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 19, 2015, 9:40 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 379
> > <https://reviews.apache.org/r/30774/diff/38/?file=899705#file899705line379>
> >
> >     benh, what do you think of Bernd's contentionBarrier injection? commonly we always just mock the callback (_fetch in this case) in tests to block, but Bernd wanted to introduce a specific empty method for tests. I told him this is not a pattern we use in Mesos, but like to see what you think.
> 
> Bernd Mathiske wrote:
>     Of course I will stick to the prevalent patterns unless you start liking this one :-)
> 
> Bernd Mathiske wrote:
>     Turns out there is no method that offers itself opportunistically for the prevalent pattern. The next call up is AFTER waiting for the futures that the barrier needs to be BEFORE. Suggestions?
>     
>     Alternatives (without judging them):
>     - Factor out the loop that gathers the futures and make it a method that gets called once. Then mock this method and have it dual-purposed as contention barrier. 
>     - Make the futures globally visible, await them in the test, too.
>     - Use an explicit lock instead of mocking.

Another option has been implemented in the latest patch: cut off the lower half of fetch() and call it _fetch(), then mock the latter. This variant ressembles existing other code the most.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review77057
-----------------------------------------------------------


On April 30, 2015, 7:40 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 30, 2015, 7:40 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 19, 2015, 9:40 a.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 379
> > <https://reviews.apache.org/r/30774/diff/38/?file=899705#file899705line379>
> >
> >     benh, what do you think of Bernd's contentionBarrier injection? commonly we always just mock the callback (_fetch in this case) in tests to block, but Bernd wanted to introduce a specific empty method for tests. I told him this is not a pattern we use in Mesos, but like to see what you think.

Of course I will stick to the prevalent patterns unless you start liking this one :-)


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review77057
-----------------------------------------------------------


On March 18, 2015, 11:43 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 18, 2015, 11:43 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
>   src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
>   src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
>   src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Timothy Chen <tn...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review77057
-----------------------------------------------------------



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124854>

    benh, what do you think of Bernd's contentionBarrier injection? commonly we always just mock the callback (_fetch in this case) in tests to block, but Bernd wanted to introduce a specific empty method for tests. I told him this is not a pattern we use in Mesos, but like to see what you think.


- Timothy Chen


On March 19, 2015, 6:43 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 19, 2015, 6:43 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
>   src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
>   src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
>   src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On April 12, 2015, 11:11 p.m., Timothy Chen wrote:
> > src/tests/fetcher_cache_tests.cpp, line 308
> > <https://reviews.apache.org/r/30774/diff/42/?file=922829#file922829line308>
> >
> >     Not sure why you picked an arbitrary number 5 here, why not let it be passed in?

OK, I will add an explanation in a comment. Two requirements need to be met by this constant.
- It needs to be larger than the expected number of status updates. We might choose something much larger than 5, but all tests run just fine with 5.
- It needs to be finite. Otherwise we will keep waiting for updates when none arrive due to a bug.

However, if we passed this constant in, then we would need to explain it at all the call sites, i.e. multiple times instead of only once. But the situation is exactly the same every time. So I will refrain from that.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review79838
-----------------------------------------------------------


On April 10, 2015, 4:33 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 10, 2015, 4:33 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 3a8e8bf303e0576c212951f6028af77e54d93537 
>   include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 
>   src/Makefile.am fa609da08e23d6595a3f6d2efddd3e333b6c78f1 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp 6893684e6d199a5d69fc8bba8e60c4acaae9c3c9 
>   src/slave/containerizer/docker.cpp f9fb07806e3b7d7d2afc1be3b8756eac23b32dcd 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp e4136095fca55637864f495098189ab3ad8d8fe7 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp 35f56252cfda5011d21aa188f33cc3e68a694968 
>   src/slave/slave.cpp 9fec023b643d410f4d511fa6f80e9835bab95b7e 
>   src/tests/docker_containerizer_tests.cpp c772d4c836de18b0e87636cb42200356d24ec73d 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 0e98572a62ae05437bd2bc800c370ad1a0c43751 
>   src/tests/mesos.cpp 02cbb4b8cf1206d0f32d160addc91d7e0f1ab28b 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Timothy Chen <tn...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review79838
-----------------------------------------------------------



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment129437>

    Not sure why you picked an arbitrary number 5 here, why not let it be passed in?



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment129438>

    Always one file expected in the cache


- Timothy Chen


On April 10, 2015, 11:33 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 10, 2015, 11:33 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 3a8e8bf303e0576c212951f6028af77e54d93537 
>   include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 
>   src/Makefile.am fa609da08e23d6595a3f6d2efddd3e333b6c78f1 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp 6893684e6d199a5d69fc8bba8e60c4acaae9c3c9 
>   src/slave/containerizer/docker.cpp f9fb07806e3b7d7d2afc1be3b8756eac23b32dcd 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp e4136095fca55637864f495098189ab3ad8d8fe7 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp 35f56252cfda5011d21aa188f33cc3e68a694968 
>   src/slave/slave.cpp 9fec023b643d410f4d511fa6f80e9835bab95b7e 
>   src/tests/docker_containerizer_tests.cpp c772d4c836de18b0e87636cb42200356d24ec73d 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 0e98572a62ae05437bd2bc800c370ad1a0c43751 
>   src/tests/mesos.cpp 02cbb4b8cf1206d0f32d160addc91d7e0f1ab28b 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On April 29, 2015, 1:46 p.m., Benjamin Hindman wrote:
> > src/slave/containerizer/fetcher.cpp, lines 386-387
> > <https://reviews.apache.org/r/30774/diff/41/?file=904359#file904359line386>
> >
> >     How about still calling this s/reference/get/ and then checking the references? Also, we're deprecating and removing the return value of const &. Thus:
> >     
> >     Option<shared_ptr<Cache::Entry>>& entry =
> >       cache.get(commandUser, uri.value());
> >     
> >     if (entry.isSome()) {
> >       CHECK(entry.get()->references() > 0);
> >       ... 
> >     }

After talking about this we concluded to go with explicit referencing outside the accessor.


> On April 29, 2015, 1:46 p.m., Benjamin Hindman wrote:
> > src/slave/containerizer/fetcher.cpp, lines 394-395
> > <https://reviews.apache.org/r/30774/diff/41/?file=904359#file904359line394>
> >
> >     Let's get a CHECK here too:
> >     
> >     CHECK(newEntry->references() > 0);

See above.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review78530
-----------------------------------------------------------


On April 30, 2015, 7:40 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 30, 2015, 7:40 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review78530
-----------------------------------------------------------



include/mesos/type_utils.hpp
<https://reviews.apache.org/r/30774/#comment127347>

    Let's also 'hash_combine' the 'extract' and 'executable' information here too.



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment127351>

    Great TODO, let's point the comments in slave.cpp to this TODO please!



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment127352>

    This function no longer exists!



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment127353>

    +4 please!



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment127354>

    Not used!



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment127356>

    Two spaces between top-level definitions please!



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment127355>

    Please comment/TODO why this is being used and that you plan to remove this once we have C++11 lambdas!



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment127358>

    How about still calling this s/reference/get/ and then checking the references? Also, we're deprecating and removing the return value of const &. Thus:
    
    Option<shared_ptr<Cache::Entry>>& entry =
      cache.get(commandUser, uri.value());
    
    if (entry.isSome()) {
      CHECK(entry.get()->references() > 0);
      ... 
    }



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment127357>

    +2 not +4



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment127359>

    Let's get a CHECK here too:
    
    CHECK(newEntry->references() > 0);



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment127360>

    Let's move this comment to the top of this function so that readers of the code have a better idea of what we're trying to accomplish sooner!



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment127365>

    Let's add another comment to remind readers why if the entry's 'completion' is still pending this implies that we need to DOWNLOAD_AND_CACHE? For example:
    
    // Since the entry is not yet "complete", i.e., 'completion().isPending()', 
    // then it must be the case that we created the entry in
    // FetcherProcess::fetch otherwise the entry should have been
    // in the cache and we would have waited for the completion in
    // FetcherProcess::fetch.



src/slave/slave.cpp
<https://reviews.apache.org/r/30774/#comment127348>

    s/recoverCache/recover/ <-- Since in the future there might be other "recovery" things that need to get done that doesn't have anything to do with the cache.
    
    Also, let's please leave a comment here that explains why we're calling a static method rather than invoking a method on an instance of a 'Fetcher' directly. Our intuition is that this will likely have to change in the future.



src/slave/slave.cpp
<https://reviews.apache.org/r/30774/#comment127350>

    Comment here too, please, thanks!


- Benjamin Hindman


On April 29, 2015, 8:42 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 29, 2015, 8:42 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On April 29, 2015, 3:41 p.m., Benjamin Hindman wrote:
> > src/slave/containerizer/fetcher.cpp, line 386
> > <https://reviews.apache.org/r/30774/diff/44/?file=945694#file945694line386>
> >
> >     Kill the const &.

Killed the &. Any reason this should not be const?


> On April 29, 2015, 3:41 p.m., Benjamin Hindman wrote:
> > src/slave/containerizer/fetcher.cpp, line 597
> > <https://reviews.apache.org/r/30774/diff/44/?file=945694#file945694line597>
> >
> >     Let's add some helper functions on Fetcher::Cache so that we can just get this information directly in the tests rather than this "helper" function.
> >     
> >     // Return the cache.
> >     Try<list<Path>> FetcherProcess::cacheFiles();
> >     
> >     // Returns the number of entries in the cache.
> >     size_t FetcherProcess::cacheSize();

Can we please postpone this until after refactoring fetcher injection into slave/containerizer? It will be much easier to make these member functions then. I'll put a TODO for now.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review82017
-----------------------------------------------------------


On April 29, 2015, 1:42 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 29, 2015, 1:42 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On April 29, 2015, 3:41 p.m., Benjamin Hindman wrote:
> > src/slave/containerizer/fetcher.cpp, lines 408-417
> > <https://reviews.apache.org/r/30774/diff/44/?file=945694#file945694line408>
> >
> >     For the future:
> >     
> >     auto futures = filter(entries, [](const auto& entry) { return entry.isSome() ? entry.get() : None(); });
> >     
> >     ;-)

I could not find a suitable filter function  in std, boost, or stout yet. Shall we create one?


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review82017
-----------------------------------------------------------


On May 12, 2015, 3:43 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 12, 2015, 3:43 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review82017
-----------------------------------------------------------



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment132684>

    const Bytes&
    
    ?



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment132638>

    No longer used function.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132642>

    Please replace tab with spaces.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132648>

    Kill the const &.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132657>

    Let's be explicit for now with references:
    
    Option<shared_ptr<Cache::Entry>> entry = cache.get();
    
    if (entry.isSome()) {
      entry.get()->reference();
      entries[uri] = entry.get()->completion()
        .then(defer(self(), [=]() { return entry.get(); });
    } else {
      shared_ptr<Cache::Entry> newEntry =
        cache.create(cacheDirectory, commandUser, uri);
    
      newEntry->reference();
    
      entries[uri] = async(&fetchSize, uri.value(), flags.frameworks_home)
        .then(defer(self(), 
    }



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132645>

    +2 not +4 here.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132646>

    // Wait for the URI to be downloaded into the cache (or fail).



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132643>

    .then(defer(self(), [=]() { return entry.get(); }))
    
    Then please kill the 'value' function above!



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132688>

    // NOTE: We break this into two pieces because we want to be able to __block__ an instance of ...
    return _fetch(...);



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132658>

    For the future:
    
    auto futures = filter(entries, [](const auto& entry) { return entry.isSome() ? entry.get() : None(); });
    
    ;-)



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132666>

    Let's add some helper functions on Fetcher::Cache so that we can just get this information directly in the tests rather than this "helper" function.
    
    // Return the cache.
    Try<list<Path>> FetcherProcess::cacheFiles();
    
    // Returns the number of entries in the cache.
    size_t FetcherProcess::cacheSize();



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132667>

    CHECK_READY



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132669>

    Lambda-ify! ;-)



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132668>

    Please replaces tabs with spaces.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132670>

    Lambda-ify! ;-)



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132672>

    { on newline please.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment132671>

    CHECK_SOME



src/slave/slave.cpp
<https://reviews.apache.org/r/30774/#comment132632>

    Ideally we can inject the Fetcher instance into the Slave so that we don't have this global recover operation that is actually per slave:
    
    --------------------------------------------------
    
    Fetcher fetcher(flags);
    
    Slave slave(..., &fetcher);
    
    Slave::registered(...)
    {
      ...;
      Try<Nothing> recover = fetcher->recover(slaveid);
      if (recover.is...) {
        ...;
      }
      ...;
    }
    
    --------------------------------------------------
    
    But for now, let's just s/recoverCache/recover/ since the fact that the fetcher has a cache is an implementation detail.


- Benjamin Hindman


On April 29, 2015, 8:42 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 29, 2015, 8:42 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review83946
-----------------------------------------------------------


- Bernd Mathiske


On May 13, 2015, 3:07 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 13, 2015, 3:07 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On May 15, 2015, 10:34 a.m., Bernd Mathiske wrote:
> > src/tests/fetcher_cache_tests.cpp, line 426
> > <https://reviews.apache.org/r/30774/diff/48/?file=958588#file958588line426>
> >
> >     This needs to be a pointer. Or use a simple struct. Name: FetcherCacheTest::Task
> 
> Bernd Mathiske wrote:
>     Can't use the return struct approach, since Queue turns out not to be copyable. Going with passing in pointers.

Was looking at the wrong queue. Was able to use the return struct approach.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review83945
-----------------------------------------------------------


On May 13, 2015, 3:07 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 13, 2015, 3:07 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On May 15, 2015, 10:34 a.m., Bernd Mathiske wrote:
> > src/tests/fetcher_cache_tests.cpp, line 426
> > <https://reviews.apache.org/r/30774/diff/48/?file=958588#file958588line426>
> >
> >     This needs to be a pointer. Or use a simple struct. Name: FetcherCacheTest::Task

Can't use the return struct approach, since Queue turns out not to be copyable. Going with passing in pointers.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review83945
-----------------------------------------------------------


On May 13, 2015, 3:07 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 13, 2015, 3:07 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review83945
-----------------------------------------------------------



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment135004>

    This needs to be a pointer. Or use a simple struct. Name: FetcherCacheTest::Task


- Bernd Mathiske


On May 13, 2015, 3:07 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 13, 2015, 3:07 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On June 5, 2015, 7:10 p.m., Ben Mahler wrote:
> > Just took a cursory glance since this is a huge diff, could it have been broken apart further? We've found large diffs like this one are next to impossible to review thoroughly :)

Agreed that this is normally the case. And this had been multiple smaller diffs (see "description" above). Benh eventually asked me to combine them for relatively long review sessions that covered a lot of ground in one swoop.


> On June 5, 2015, 7:10 p.m., Ben Mahler wrote:
> > src/slave/containerizer/fetcher.hpp, lines 288-295
> > <https://reviews.apache.org/r/30774/diff/52/?file=966644#file966644line288>
> >
> >     We might be able to get away with more descriptive names here to avoid the need for these comments:
> >     
> >     ```
> >     Bytes capacity;
> >     Bytes reserved;
> >     unsigned long fileCounter;
> >     ```
> >     
> >     'space' seems to suggest available space (to me), whereas 'capacity' seems pretty standard as a name for this. For 'tally', I can't tell from the name what is being tallied, but if we change the name to 'reserved' I have an understanding that this is the reserved space, not necessarily occupied but reserved for a purpose.

Much better indeed. Thanks!


> On June 5, 2015, 7:10 p.m., Ben Mahler wrote:
> > src/slave/containerizer/fetcher.hpp, lines 297-299
> > <https://reviews.apache.org/r/30774/diff/52/?file=966644#file966644line297>
> >
> >     Hard to tell why shared_ptr here is needed rather than Shared, or just Cache::Entry directly. Is there concurrent modification happening, or?

Shared is not mutable. Are you suggesting to exchange the whole entry every time we update a field?


> On June 5, 2015, 7:10 p.m., Ben Mahler wrote:
> > src/slave/containerizer/fetcher.cpp, line 617
> > <https://reviews.apache.org/r/30774/diff/52/?file=966645#file966645line617>
> >
> >     No need for the stringify here and below.

Thanks! will fix


> On June 5, 2015, 7:10 p.m., Ben Mahler wrote:
> > src/slave/containerizer/fetcher.cpp, lines 1131-1132
> > <https://reviews.apache.org/r/30774/diff/52/?file=966645#file966645line1131>
> >
> >     CHECK_LT will print the two numbers for you :)

Thanks! will fix


> On June 5, 2015, 7:10 p.m., Ben Mahler wrote:
> > src/slave/containerizer/fetcher.cpp, lines 1144-1145
> > <https://reviews.apache.org/r/30774/diff/52/?file=966645#file966645line1144>
> >
> >     Seems like an odd message format, since normally a meaning follows from a ':'
> >     
> >     ```
> >     Fetcher cache space overflow - space used: 2GB, exceeds total fetcher cache space: 1GB
> >     ```
> >     
> >     Here's another format where the meaning is described after the colon:
> >     
> >     ```
> >     Fetcher cache space overflow: 2GB used vs 1GB capacity
> >     ```

Yep, that's better.


> On June 5, 2015, 7:10 p.m., Ben Mahler wrote:
> > src/slave/containerizer/mesos/containerizer.hpp, lines 183-188
> > <https://reviews.apache.org/r/30774/diff/52/?file=966646#file966646line183>
> >
> >     I can't tell why slave id is being passed here, is there something subtle going on?

Yes. The slaveId is needed to create per-slave cache directories. There are multiple comments about this in other places that explain this and how this will go away when we will inject the slaveId after some refactoring. I will add a comment here as well.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review86871
-----------------------------------------------------------


On May 21, 2015, 9:05 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 21, 2015, 9:05 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9cc5782256156ed59fd4640091413b76480d939f 
>   include/mesos/type_utils.hpp 837be6f1844d5fa01c0fd84a585e7ff2cc0c987b 
>   src/Makefile.am 34755cf795391c9b8051a5e4acc6caf844984496 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp ed4ee19c85387882ab2e31baa5610acb8e222d50 
>   src/slave/containerizer/docker.cpp 408a4435a6f11973992486eac1659beeccc4beac 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 3e18617b0dbac58176bfd41dc550898eb6a4a79e 
>   src/slave/containerizer/mesos/containerizer.cpp 696e359de66305512eedf8e269543fafa21f4bc3 
>   src/slave/flags.hpp 5c57478fcfdbcbd8ac0e5c3c79809403054e96e6 
>   src/slave/flags.cpp b5e25186dad36bc1306cc6ecb268aba951a18f7e 
>   src/slave/slave.cpp 8e88482f41f37ce7f2559fe793565b66ac46fb35 
>   src/tests/docker_containerizer_tests.cpp 154bf981c007ebcb8e0b2fe8551defb5ea2ba063 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp a60df75350beab8d7091cbe66213ecd920942fa4 
>   src/tests/mesos.cpp 1d5639c85517229f3396b40f2d8bd421b2ed7325 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review86871
-----------------------------------------------------------


Just took a cursory glance since this is a huge diff, could it have been broken apart further? We've found large diffs like this one are next to impossible to review thoroughly :)


src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment139000>

    We might be able to get away with more descriptive names here to avoid the need for these comments:
    
    ```
    Bytes capacity;
    Bytes reserved;
    unsigned long fileCounter;
    ```
    
    'space' seems to suggest available space (to me), whereas 'capacity' seems pretty standard as a name for this. For 'tally', I can't tell from the name what is being tallied, but if we change the name to 'reserved' I have an understanding that this is the reserved space, not necessarily occupied but reserved for a purpose.



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment139052>

    Hard to tell why shared_ptr here is needed rather than Shared, or just Cache::Entry directly. Is there concurrent modification happening, or?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment139051>

    No need for the stringify here and below.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment139053>

    CHECK_LT will print the two numbers for you :)



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment139055>

    Seems like an odd message format, since normally a meaning follows from a ':'
    
    ```
    Fetcher cache space overflow - space used: 2GB, exceeds total fetcher cache space: 1GB
    ```
    
    Here's another format where the meaning is described after the colon:
    
    ```
    Fetcher cache space overflow: 2GB used vs 1GB capacity
    ```



src/slave/containerizer/mesos/containerizer.hpp
<https://reviews.apache.org/r/30774/#comment139050>

    I can't tell why slave id is being passed here, is there something subtle going on?


- Ben Mahler


On May 21, 2015, 4:05 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 21, 2015, 4:05 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9cc5782256156ed59fd4640091413b76480d939f 
>   include/mesos/type_utils.hpp 837be6f1844d5fa01c0fd84a585e7ff2cc0c987b 
>   src/Makefile.am 34755cf795391c9b8051a5e4acc6caf844984496 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp ed4ee19c85387882ab2e31baa5610acb8e222d50 
>   src/slave/containerizer/docker.cpp 408a4435a6f11973992486eac1659beeccc4beac 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 3e18617b0dbac58176bfd41dc550898eb6a4a79e 
>   src/slave/containerizer/mesos/containerizer.cpp 696e359de66305512eedf8e269543fafa21f4bc3 
>   src/slave/flags.hpp 5c57478fcfdbcbd8ac0e5c3c79809403054e96e6 
>   src/slave/flags.cpp b5e25186dad36bc1306cc6ecb268aba951a18f7e 
>   src/slave/slave.cpp 8e88482f41f37ce7f2559fe793565b66ac46fb35 
>   src/tests/docker_containerizer_tests.cpp 154bf981c007ebcb8e0b2fe8551defb5ea2ba063 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp a60df75350beab8d7091cbe66213ecd920942fa4 
>   src/tests/mesos.cpp 1d5639c85517229f3396b40f2d8bd421b2ed7325 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review87615
-----------------------------------------------------------



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment140008>

    Any reason not using AWAIT_READY? CHECK_READY will abort the process while AWAIT_READY will just abort the test.
    
    ```
    19:53:22 DEBUG: F0611 19:53:23.743101 60646 fetcher_cache_tests.cpp:354] CHECK_READY(offers): is PENDING Failed to wait for resource offers
    19:53:22 DEBUG: *** Check failure stack trace: ***
    19:53:22 DEBUG:     @     0x7f431c3a644d  google::LogMessage::Fail()
    19:53:22 DEBUG:     @     0x7f431c3a828d  google::LogMessage::SendToLog()
    19:53:22 DEBUG:     @     0x7f431c3a603c  google::LogMessage::Flush()
    19:53:22 DEBUG:     @     0x7f431c3a8b89  google::LogMessageFatal::~LogMessageFatal()
    19:53:22 DEBUG:     @           0x53d9b8  _CheckFatal::~_CheckFatal()
    19:53:22 DEBUG:     @           0x66c26f  mesos::internal::tests::FetcherCacheTest::launchTask()
    19:53:22 DEBUG:     @           0x66fb09  mesos::internal::tests::FetcherCacheTest_CachedFallback_Test::TestBody()
    19:53:22 DEBUG:     @           0xbb1db3  testing::internal::HandleExceptionsInMethodIfSupported<>()
    19:53:22 DEBUG:     @           0xba9057  testing::Test::Run()
    19:53:22 DEBUG:     @           0xba90fe  testing::TestInfo::Run()
    19:53:22 DEBUG:     @           0xba9205  testing::TestCase::Run()
    19:53:22 DEBUG:     @           0xba94a8  testing::internal::UnitTestImpl::RunAllTests()
    19:53:22 DEBUG:     @           0xba9747  testing::UnitTest::Run()
    19:53:22 DEBUG:     @           0x4a1dc3  main
    19:53:22 DEBUG:     @     0x7f431a1d7d5d  __libc_start_main
    19:53:22 DEBUG:     @           0x4ad109  (unknown)
    19:53:23 DEBUG: make[3]: *** [check-local] Aborted (core dumped)
    19:53:23 DEBUG: make[3]: Leaving directory `/builddir/build/BUILD/mesos-0.23.0/src'
    19:53:23 DEBUG: make[2]: *** [check-am] Error 2
    19:53:23 DEBUG: make[2]: Leaving directory `/builddir/build/BUILD/mesos-0.23.0/src'
    19:53:23 DEBUG: make[1]: *** [check] Error 2
    19:53:23 DEBUG: make[1]: Leaving directory `/builddir/build/BUILD/mesos-0.23.0/src'
    19:53:23 DEBUG: make: *** [check-recursive] Error 1
    19:53:23 DEBUG: RPM build errors:
    ```


- Jie Yu


On May 21, 2015, 4:05 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 21, 2015, 4:05 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9cc5782256156ed59fd4640091413b76480d939f 
>   include/mesos/type_utils.hpp 837be6f1844d5fa01c0fd84a585e7ff2cc0c987b 
>   src/Makefile.am 34755cf795391c9b8051a5e4acc6caf844984496 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp ed4ee19c85387882ab2e31baa5610acb8e222d50 
>   src/slave/containerizer/docker.cpp 408a4435a6f11973992486eac1659beeccc4beac 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 3e18617b0dbac58176bfd41dc550898eb6a4a79e 
>   src/slave/containerizer/mesos/containerizer.cpp 696e359de66305512eedf8e269543fafa21f4bc3 
>   src/slave/flags.hpp 5c57478fcfdbcbd8ac0e5c3c79809403054e96e6 
>   src/slave/flags.cpp b5e25186dad36bc1306cc6ecb268aba951a18f7e 
>   src/slave/slave.cpp 8e88482f41f37ce7f2559fe793565b66ac46fb35 
>   src/tests/docker_containerizer_tests.cpp 154bf981c007ebcb8e0b2fe8551defb5ea2ba063 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp a60df75350beab8d7091cbe66213ecd920942fa4 
>   src/tests/mesos.cpp 1d5639c85517229f3396b40f2d8bd421b2ed7325 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review84857
-----------------------------------------------------------

Ship it!



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment136275>

    This looks like a good candidate for inlining too.



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment136270>

    Dead code?



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment136271>

    Dead code?



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment136272>

    These two can easily and cleanly be inlined via lambdas.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment137788>

    A good candidate for a C++11 lambda.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment137789>

    Another candidate for a C++11 lamdba!



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment136273>

    Why the body of this is not inlined should be commented here otherwise someone is likely wondering what was the reason for not inlining `_fetch` here.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment137790>

    Why is os::chown no longer done any more? If we don't need to chown, why? And does that mean we can clean up this code considerably as the comment suggests?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment137792>

    What about closing 'out' and 'err' here now?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment137793>

    And need to close 'out' and 'err' here too.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment136274>

    This comment needs updating, looks like when you removed `__run` you forgot to do a global search for all instances of `__run`?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment137794>

    CHECK_EQ(space, bytes);



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment137795>

    Is this really possibly anymore given the current design where 'adjust' will fail if the size we downloaded was bigger than what we first determined via 'fetchSize'?



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment137797>

    Please put '{' on newline and this could really use a comment!



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment137805>

    I'd like us to elaborate the comment here and add a TODO that we want a generic HTTP server for use in tests that has functionality like pausing requests. I'm not a huge fan of the half-actor half not strategy here, it's not something we want others to replicate and we should explicitly call that out so we don't get more broken windows.



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment137803>

    This might break if someone has a machine configured where the default IP is not 127.0.0.1, but everything should stringify correctly for you anyway:
    
    return "http://" + stringify(self().address) + "/" + self().id + "/";



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment137806>

    Using locks here is dangerous because on a smaller machine depending on the number of requests we have come through a test we might actually deadlock the entire process. We should call this out explicitly and leave a TODO on how we can do this asynchronously.


- Benjamin Hindman


On May 21, 2015, 4:05 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 21, 2015, 4:05 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9cc5782256156ed59fd4640091413b76480d939f 
>   include/mesos/type_utils.hpp 837be6f1844d5fa01c0fd84a585e7ff2cc0c987b 
>   src/Makefile.am 34755cf795391c9b8051a5e4acc6caf844984496 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp ed4ee19c85387882ab2e31baa5610acb8e222d50 
>   src/slave/containerizer/docker.cpp 408a4435a6f11973992486eac1659beeccc4beac 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 3e18617b0dbac58176bfd41dc550898eb6a4a79e 
>   src/slave/containerizer/mesos/containerizer.cpp 696e359de66305512eedf8e269543fafa21f4bc3 
>   src/slave/flags.hpp 5c57478fcfdbcbd8ac0e5c3c79809403054e96e6 
>   src/slave/flags.cpp b5e25186dad36bc1306cc6ecb268aba951a18f7e 
>   src/slave/slave.cpp 8e88482f41f37ce7f2559fe793565b66ac46fb35 
>   src/tests/docker_containerizer_tests.cpp 154bf981c007ebcb8e0b2fe8551defb5ea2ba063 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp a60df75350beab8d7091cbe66213ecd920942fa4 
>   src/tests/mesos.cpp 1d5639c85517229f3396b40f2d8bd421b2ed7325 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated May 21, 2015, 9:05 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

memory::shared_ptr -> std::shared_ptr. This allows removing #include <stout/memory.hpp>, which blocked make distcheck.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9cc5782256156ed59fd4640091413b76480d939f 
  include/mesos/type_utils.hpp 837be6f1844d5fa01c0fd84a585e7ff2cc0c987b 
  src/Makefile.am 34755cf795391c9b8051a5e4acc6caf844984496 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp ed4ee19c85387882ab2e31baa5610acb8e222d50 
  src/slave/containerizer/docker.cpp 408a4435a6f11973992486eac1659beeccc4beac 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 3e18617b0dbac58176bfd41dc550898eb6a4a79e 
  src/slave/containerizer/mesos/containerizer.cpp 696e359de66305512eedf8e269543fafa21f4bc3 
  src/slave/flags.hpp 5c57478fcfdbcbd8ac0e5c3c79809403054e96e6 
  src/slave/flags.cpp b5e25186dad36bc1306cc6ecb268aba951a18f7e 
  src/slave/slave.cpp 8e88482f41f37ce7f2559fe793565b66ac46fb35 
  src/tests/docker_containerizer_tests.cpp 154bf981c007ebcb8e0b2fe8551defb5ea2ba063 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp a60df75350beab8d7091cbe66213ecd920942fa4 
  src/tests/mesos.cpp 1d5639c85517229f3396b40f2d8bd421b2ed7325 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated May 21, 2015, 5:06 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Fixed minor style issues.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9cc5782256156ed59fd4640091413b76480d939f 
  include/mesos/type_utils.hpp 837be6f1844d5fa01c0fd84a585e7ff2cc0c987b 
  src/Makefile.am 34755cf795391c9b8051a5e4acc6caf844984496 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp ed4ee19c85387882ab2e31baa5610acb8e222d50 
  src/slave/containerizer/docker.cpp 408a4435a6f11973992486eac1659beeccc4beac 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 3e18617b0dbac58176bfd41dc550898eb6a4a79e 
  src/slave/containerizer/mesos/containerizer.cpp 696e359de66305512eedf8e269543fafa21f4bc3 
  src/slave/flags.hpp 5c57478fcfdbcbd8ac0e5c3c79809403054e96e6 
  src/slave/flags.cpp b5e25186dad36bc1306cc6ecb268aba951a18f7e 
  src/slave/slave.cpp 8e88482f41f37ce7f2559fe793565b66ac46fb35 
  src/tests/docker_containerizer_tests.cpp 154bf981c007ebcb8e0b2fe8551defb5ea2ba063 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp a60df75350beab8d7091cbe66213ecd920942fa4 
  src/tests/mesos.cpp 1d5639c85517229f3396b40f2d8bd421b2ed7325 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated May 19, 2015, 8:58 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Rebased to latest master.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9cc5782256156ed59fd4640091413b76480d939f 
  include/mesos/type_utils.hpp 837be6f1844d5fa01c0fd84a585e7ff2cc0c987b 
  src/Makefile.am 34755cf795391c9b8051a5e4acc6caf844984496 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp ed4ee19c85387882ab2e31baa5610acb8e222d50 
  src/slave/containerizer/docker.cpp 408a4435a6f11973992486eac1659beeccc4beac 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 3e18617b0dbac58176bfd41dc550898eb6a4a79e 
  src/slave/containerizer/mesos/containerizer.cpp 696e359de66305512eedf8e269543fafa21f4bc3 
  src/slave/flags.hpp 5c57478fcfdbcbd8ac0e5c3c79809403054e96e6 
  src/slave/flags.cpp b5e25186dad36bc1306cc6ecb268aba951a18f7e 
  src/slave/slave.cpp 8e88482f41f37ce7f2559fe793565b66ac46fb35 
  src/tests/docker_containerizer_tests.cpp 154bf981c007ebcb8e0b2fe8551defb5ea2ba063 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp a60df75350beab8d7091cbe66213ecd920942fa4 
  src/tests/mesos.cpp 1d5639c85517229f3396b40f2d8bd421b2ed7325 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated May 15, 2015, 3:07 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Fixed remaining issues.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
  include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
  src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
  src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
  src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
  src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
  src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
  src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
  src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
  src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review83944
-----------------------------------------------------------



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment135003>

    AWAIT_READY, not AWAIT_READY_FOR, coz the default is 15 and that works fine, too.


- Bernd Mathiske


On May 13, 2015, 3:07 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 13, 2015, 3:07 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated May 13, 2015, 3:07 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Changed the way waiting for task completion is handled. Moved such waiting outside of launchTask(). Made helper functions create futures we can wait for with AWAIT_READY_FOR.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
  include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
  src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
  src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
  src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
  src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
  src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
  src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
  src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
  src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated May 12, 2015, 3:43 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Minor cleanups, mostly removing & from const assignments.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
  include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
  src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
  src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
  src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
  src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
  src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
  src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
  src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
  src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated May 4, 2015, 4:29 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Addressed most open issues. The rest will be dealt with soon.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
  include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
  src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
  src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
  src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
  src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
  src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
  src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
  src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
  src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On May 1, 2015, 12:17 p.m., Benjamin Hindman wrote:
> > src/tests/fetcher_cache_tests.cpp, line 1179
> > <https://reviews.apache.org/r/30774/diff/45/?file=947044#file947044line1179>
> >
> >     Path
> >     {
> >       Try<bool> executable() const;
> >     };
> >     
> >     
> >     Path(runDirectory, commandFilename).executable();
> 
> Bernd Mathiske wrote:
>     Should we wait with this until Path has been fleshed out? This would be a separate review.

I added a TODO and this will lead to a separate JIRA ticket to enhance Path.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review82260
-----------------------------------------------------------


On May 13, 2015, 3:07 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 13, 2015, 3:07 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On May 1, 2015, 12:17 p.m., Benjamin Hindman wrote:
> > src/tests/fetcher_cache_tests.cpp, line 247
> > <https://reviews.apache.org/r/30774/diff/45/?file=947044#file947044line247>
> >
> >     Why are we doing 'driver->start();' here?

Good point, does not belong inside here. Moved it out of this method. Put extra calls at the call sites.


> On May 1, 2015, 12:17 p.m., Benjamin Hindman wrote:
> > src/tests/fetcher_cache_tests.cpp, line 681
> > <https://reviews.apache.org/r/30774/diff/45/?file=947044#file947044line681>
> >
> >     Capitlization please. Everywhere. ;-)

That would not be following the style guide: http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Type_Names


> On May 1, 2015, 12:17 p.m., Benjamin Hindman wrote:
> > src/tests/fetcher_cache_tests.cpp, line 1179
> > <https://reviews.apache.org/r/30774/diff/45/?file=947044#file947044line1179>
> >
> >     Path
> >     {
> >       Try<bool> executable() const;
> >     };
> >     
> >     
> >     Path(runDirectory, commandFilename).executable();

Should we wait with this until Path has been fleshed out? This would be a separate review.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review82260
-----------------------------------------------------------


On April 30, 2015, 7:40 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated April 30, 2015, 7:40 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review82260
-----------------------------------------------------------



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132957>

    s/Option<string>(/Some(/



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132958>

    s/c/create/



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132963>

    Try<MesosContainerizer*> create =
      MesosContainerizer::create(flags, true, fetcher);



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132959>

    Newline before comment please.



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132961>

    Why are we doing 'driver->start();' here?



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132964>

    s/maxExpectedStatusUpdates/MAX_EXPECTED_STATUS_UPDATES/



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132966>

    auto await = [=](const TaskStatus& status) {
      CHECK_EQ(task.task_id(), status.task_id());
      if (status.state() == TASK_FINISHED) {
        return Nothing();
      }
      return taskStatusQueue.get()
        .then([=](const TaskStatus& status) {
          return await(status);
        });
    }
    
    Future<Nothing> finished = taskStatusQueue.get()
      .then([=](const TaskStatus& status) { return await(status); });
    
    AWAIT_READY_FOR(finished, Seconds(10));
    
    // ------------
    
    AWAIT_READY_FOR(awaitFinished(taskStatusQueue), Seconds(10));
    
    // ------------
    
    Promise<Nothing> finished;
    EXPECT_CALL(scheduler, statusUpdate(driver, _))
      .WillRepeatedly([=, &finished](
          const SchedulerDriver&, const TaskStatus& status) {
        CHECK_EQ(task.task_id(), status.task_id());
        if (status.state() == TASK_FINISHED) {
          finished.set(Nothing());
        }
      });
    
    AWAIT_READY_FOR(finished.future(), Seconds(10));
    
    // ------------------------
    
    But since we don't have C++11 lambdas as gmock actions yet, instead create your own action:
    
    ACTION_TEMPLATE(FinishedStatusUpdate, promise)
    {
      const TaskStatus& status = arg1;
      // TODO: check
      if (status.state() == TASK_FINISHED) {
        promise->set(Nothing());
      }
    }
    
    And finally, let's pull this out of s/runTask/launchTask/ so it has consistent behavior with launchTasks.



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132968>

    InvokeLambda<N>(args, lambda)
    {
      return lambda(get<0>(args), get<1>)(args), get<2>(args), ...);
    }
    
    InvokeVoidLambda<N>(args, lambda)
    {
      lambda(get<0>(args), get<1>)(args), get<2>(args), ...);
    }



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132980>

    Let's rename this to something that is more indicative of what's happening when we reach or block on a FetcherProcess::_fetch.



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132981>

    CHECK_READY
    
    Here and everywhere else please. ;-)



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132982>

    const&



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132986>

    Capitlization please. Everywhere. ;-)



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132987>

    if (strings::contains(event.request->path, COMMAND_NAME)) {
    
    }
    
    Below please too!



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132989>

    Unroll please, the switch is harder to grok below.



src/tests/fetcher_cache_tests.cpp
<https://reviews.apache.org/r/30774/#comment132988>

    Path
    {
      Try<bool> executable() const;
    };
    
    Path(runDirectory, commandFilename).executable();



src/tests/fetcher_tests.cpp
<https://reviews.apache.org/r/30774/#comment132990>

    EXPECT_ERROR
    
    (Search for 'isError())' and replace. ;-) )



src/tests/fetcher_tests.cpp
<https://reviews.apache.org/r/30774/#comment132991>

    s/..../bernd/



src/tests/fetcher_tests.cpp
<https://reviews.apache.org/r/30774/#comment132992>

    EXPECT_TRUE(strings::contains(fetch.failure(), "chown"));
    
    (How about we just search and replace .find?)



src/tests/mesos.cpp
<https://reviews.apache.org/r/30774/#comment132993>

    Let's double check whether or not this is really necessary?


- Benjamin Hindman


On May 1, 2015, 2:40 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated May 1, 2015, 2:40 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
>   include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
>   src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
>   src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
>   src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
>   src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated April 30, 2015, 7:40 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Fixed most issues, but not all. Let's talk about theremaining ones before proceeding!


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
  include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
  src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
  src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
  src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
  src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
  src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
  src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
  src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
  src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated April 29, 2015, 1:42 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Rebased.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
  include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
  src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
  src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
  src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 
  src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
  src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
  src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
  src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
  src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated April 13, 2015, 5:45 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Addressed the latest 2 issues.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 3a8e8bf303e0576c212951f6028af77e54d93537 
  include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 
  src/Makefile.am d15a37365bcdd5c3906160b46b389635b38b1673 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp 6893684e6d199a5d69fc8bba8e60c4acaae9c3c9 
  src/slave/containerizer/docker.cpp f9fb07806e3b7d7d2afc1be3b8756eac23b32dcd 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp e4136095fca55637864f495098189ab3ad8d8fe7 
  src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
  src/slave/flags.cpp 35f56252cfda5011d21aa188f33cc3e68a694968 
  src/slave/slave.cpp a0595f93ce4720f5b9926326d01210460ccb0667 
  src/tests/docker_containerizer_tests.cpp c772d4c836de18b0e87636cb42200356d24ec73d 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 42e42ac425a448fcc5e93db1cef1112cbf5e67c4 
  src/tests/mesos.cpp fc534e9febed1e293076e00e0f5c3879a78df90f 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated April 10, 2015, 4:33 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Rebased.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 3a8e8bf303e0576c212951f6028af77e54d93537 
  include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 
  src/Makefile.am fa609da08e23d6595a3f6d2efddd3e333b6c78f1 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp 6893684e6d199a5d69fc8bba8e60c4acaae9c3c9 
  src/slave/containerizer/docker.cpp f9fb07806e3b7d7d2afc1be3b8756eac23b32dcd 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp e4136095fca55637864f495098189ab3ad8d8fe7 
  src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
  src/slave/flags.cpp 35f56252cfda5011d21aa188f33cc3e68a694968 
  src/slave/slave.cpp 9fec023b643d410f4d511fa6f80e9835bab95b7e 
  src/tests/docker_containerizer_tests.cpp c772d4c836de18b0e87636cb42200356d24ec73d 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 0e98572a62ae05437bd2bc800c370ad1a0c43751 
  src/tests/mesos.cpp 02cbb4b8cf1206d0f32d160addc91d7e0f1ab28b 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 24, 2015, 6:57 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

New structure for the core algorithm that attacks invariants more locally. Thanks to Ben H for advising and trail blazing how to do this!

Todo: remove contentionBarrier() and mock _fetch() instead, rebase.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
  include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45 
  src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
  src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
  src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 21, 2015, 11:54 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Cleanups. Thanks, Till!


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
  src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
  src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
  src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 19, 2015, 10:48 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Fixed 3 more review issues.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
  src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
  src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
  src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 18, 2015, 11:48 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 726
> > <https://reviews.apache.org/r/30774/diff/37/?file=897704#file897704line726>
> >
> >     I'm not sure I understand, the error is never logged and in the end we simply return 0 if os::find returns a error. To me that looks like we're ignoring if Try has an error right?

No problem, I'll rewrite it and put a comment: When there is an error then the cache directory does not exist, which means the number of files in the cache is zero.


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review77025
-----------------------------------------------------------


On March 18, 2015, 11:43 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 18, 2015, 11:43 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
>   src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
>   src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
>   src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Timothy Chen <tn...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review77025
-----------------------------------------------------------



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment124810>

    I'm not sure I understand, the error is never logged and in the end we simply return 0 if os::find returns a error. To me that looks like we're ignoring if Try has an error right?


- Timothy Chen


On March 19, 2015, 6:43 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 19, 2015, 6:43 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
>   src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
>   src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
>   src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 18, 2015, 11:43 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Addressed most of Tim's review, left comments for the rest. Tim, Ben, many thanks for the reviews! Please revisit your issues and update or close the open ones, given my reponses.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
  src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
  src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
  src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 17, 2015, 6:59 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Rebased to current master.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md 7119b1421ac1506fa118e9f91d07e027dec3d92e 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto ec8efaec13f54a56d82411f6cdbdb8ad8b103748 
  src/Makefile.am 7a06c7028eca8164b1f5fdea6a7ecd37ee6826bb 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
  src/slave/flags.hpp dbaf5f532d0bc65a6d16856b8ffcc2c06a98f1fa 
  src/slave/slave.cpp 0f99e4efb8fa2b96f120a3e49191158ca0364c06 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 17, 2015, 4:52 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Added more rational for the code structure in fetch() to docs/fetcher-cache-internals. Refactored fetch() somewhat: replaced cleanupNewCacheEntries() with ___fetch(), inlined what was in makeCacheItems() and applyFallbacks(). There should now be a more straightforward  "flow" to the whole process. However, I did not find a suitable substitute for FetcherItem as phase 1 future parameter. Sticking with that.

(I know there is a slight regression wrt. os::stat::size, so this is not rebased. Will fix that in the next patch. Apart from this, this patch is fully functional and runs all tests on an older version of master.)


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md fc3afec248b534b1d5eb625eb66de5f90cd8cd33 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 11, 2015, 10:50 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Rebased.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md fc3afec248b534b1d5eb625eb66de5f90cd8cd33 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am 3059818231c46484039d179cd6916932eff6cd68 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp 71ae84bbfcef208cc2ee603f3c8a79225e48a7d5 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 11, 2015, 5:47 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Addressed almost all issues. Simplified fetcher cache test source code.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md fc3afec248b534b1d5eb625eb66de5f90cd8cd33 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 7, 2015, 7:21 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Addressed Tim's latest review.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/launcher/fetcher.cpp, line 178
> > <https://reviews.apache.org/r/30774/diff/32/?file=887354#file887354line178>
> >
> >     You log the extraction command but in this case don't log the copy command.
> >     
> >     I think to be consistent, let's not log the command, and like you do here only log when the command fails.
> >     
> >     What you think?

Logging the command in both cases now.


> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.hpp, line 195
> > <https://reviews.apache.org/r/30774/diff/32/?file=887358#file887358line195>
> >
> >     Why not just store the Path and return that?

"directory" is a temporary artefact that will disappear once we refactor so that flags gets injected into the fetcher. I added a comment syaing that.


> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 175
> > <https://reviews.apache.org/r/30774/diff/32/?file=887359#file887359line175>
> >
> >     Let's use strings::contains instead of find to be consistent here.

Also fixed all other occurences.


> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/slave/slave.cpp, line 3710
> > <https://reviews.apache.org/r/30774/diff/32/?file=887363#file887363line3710>
> >
> >     Why is this just a Failure but the other recover is a LOG(FATAL)? Shouldn't we exit here too if unable to recover cache is a critical event?

The method we are in returns a future, so we can return a Failure here. This leads to exiting ventually. At the other site, the method we are in only returns void. Suggestions for that?


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75491
-----------------------------------------------------------


On March 6, 2015, 5:46 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 6, 2015, 5:46 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > include/mesos/fetcher/fetcher.proto, line 58
> > <https://reviews.apache.org/r/30774/diff/32/?file=887350#file887350line58>
> >
> >     It's harder to make a optional field required, but it's much easier the other way around.
> >     
> >     If we always want it to be required, I think we should make the sandbox a required field.
> 
> Bernd Mathiske wrote:
>     There was some discussion about whether this field should be required or not. The general idea here is that a task might be able to run without fetching anything into its sandbox. In this case, the framework may get away without naming the sandbox. But since a task always has one, we could also make it required. I am impartial in this choice, but I see that your argument that required->optional is easier has pull.

I have heard good arguments both ways. Here is how I see it. 

For the recipient of a message, "optional" is the preferred choice. Then any legacy recipient's code is always prepared for everything and robust wrt. changing to "required". Not the other way around.

But for the sender, "required" is the better choice, making sender code more robust. If legacy senders still provide the field when it has become optional, that's OK. Not the other way around.

So which side are we on in this case? As much as this is an internal protocol, we are on neither side and we can change this in arbitrary ways. 

This is an external protocol if someone else than a Mesos slave uses mesos-fetcher. (Maybe a special external containerizer.) Then we are providing the message recipient and we have to be on that side. Therefore I am voting for "optional".


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75491
-----------------------------------------------------------


On March 7, 2015, 7:21 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 7, 2015, 7:21 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.

> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > include/mesos/fetcher/fetcher.proto, line 58
> > <https://reviews.apache.org/r/30774/diff/32/?file=887350#file887350line58>
> >
> >     It's harder to make a optional field required, but it's much easier the other way around.
> >     
> >     If we always want it to be required, I think we should make the sandbox a required field.

There was some discussion about whether this field should be required or not. The general idea here is that a task might be able to run without fetching anything into its sandbox. In this case, the framework may get away without naming the sandbox. But since a task always has one, we could also make it required. I am impartial in this choice, but I see that your argument that required->optional is easier has pull.


> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/hdfs/hdfs.hpp, line 82
> > <https://reviews.apache.org/r/30774/diff/32/?file=887353#file887353line82>
> >
> >     IMO this should be in another patch and we can get this commited right away.

I WAS in another patch: 30616. BenH advised to put all fetcher cache related patches that are not for stout or libprocess together in one patch.


> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.hpp, line 76
> > <https://reviews.apache.org/r/30774/diff/32/?file=887358#file887358line76>
> >
> >     Are these static methods going to be used somewhere else? Quite a lot of static methods now in the header and perhaps we just need to put the implementation in the cpp file, BenH also mentioned this last time in the review meeting.

Yes, these are all used both in launcher/fetcher.cpp as well as containerizer/fetcher.cpp. They are factored out for consistency and easier maintenance. In previous fetcher implementations they were not, enjoying a duplicitous existence.


> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.hpp, line 191
> > <https://reviews.apache.org/r/30774/diff/32/?file=887358#file887358line191>
> >
> >     I think there is a bit of complication with this interface, where a user has to call two things at the end of a downloading from an Entry:
> >     
> >     - unreference
> >     - complete/fail
> >     
> >     And I don't see how one ever wants to use them without each other.
> >     
> >     Why not hide unreference all together, and decrement the reference count in complete or fail?
> >     
> >     This way it's a lot less error prone, and harder to make mistakes with future changes.

unreference is not only used for new cache entries that find closure in completion or failure. It is primarily used for pre-existing entries that have downloads by concurrent fetch runs. In both cases, we need to call unreference at the very end of fetching in our current run. If we hid unreference in only one of the two cases, we'd have a bug.

How can anything be less error-prone than calling unreference on everything that got referenced? By hard-coupling "reference" with storing the unreference action in a collection. I had a version exactly like this but it was turned down by a reviewer, because it introduced an extra class.


> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 749
> > <https://reviews.apache.org/r/30774/diff/32/?file=887359#file887359line749>
> >
> >     What's the point of this empty branch?

I like putting comments about what happens in the code path / case not taken in an empty branch instead of placing them in a less directly related place. This is much more clear IMHO.


> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/slave/containerizer/fetcher.cpp, line 1087
> > <https://reviews.apache.org/r/30774/diff/32/?file=887359#file887359line1087>
> >
> >     This seems to be very easy to make mistakes with setSpace, especially when it's not even expected to be called more than once? 
> >     
> >     I thought you said you're going to change it so it's only set once during initialization?

BenH and I decided to postpone this refactoring. It would lead to a lot of additional code changes for ni current semantic or general architecture changes.


> On March 6, 2015, 2:15 p.m., Timothy Chen wrote:
> > src/tests/mesos.hpp, line 703
> > <https://reviews.apache.org/r/30774/diff/32/?file=887367#file887367line703>
> >
> >     We want to have a consistent naming style across all the tests and files, and we usually in all the tests in Mesos, the "unmocked" methods tend to just have a prefix of "_", so run -> _run

If you use "_run" how do you distinguish this from a continuation of the same name? We cannot possibly use this naming scheme. Please either convert to mine or come up with a better one. I think that unmocked-something makes it very clear what is going on without making first readers guess or having to put extra comments. So I'd prefer leaving it like that.

(I had mocked a method _fetch in an earlier patch and there is also a continuation __fetch...)


- Bernd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75491
-----------------------------------------------------------


On March 6, 2015, 5:46 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 6, 2015, 5:46 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Timothy Chen <tn...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review75491
-----------------------------------------------------------



include/mesos/fetcher/fetcher.proto
<https://reviews.apache.org/r/30774/#comment122581>

    It's harder to make a optional field required, but it's much easier the other way around.
    
    If we always want it to be required, I think we should make the sandbox a required field.



src/hdfs/hdfs.hpp
<https://reviews.apache.org/r/30774/#comment122582>

    IMO this should be in another patch and we can get this commited right away.



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122584>

    Why not log the error if it's there? available.error()



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122585>

    You log the extraction command but in this case don't log the copy command.
    
    I think to be consistent, let's not log the command, and like you do here only log when the command fails.
    
    What you think?



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122591>

    Failed to chmod executable



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122597>

    CHECK_EQ(item.action(), FetcherInfo::Item::DOWNLOAD_AND_CACHE)



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment122598>

    Are these static methods going to be used somewhere else? Quite a lot of static methods now in the header and perhaps we just need to put the implementation in the cpp file, BenH also mentioned this last time in the review meeting.



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment122601>

    I think there is a bit of complication with this interface, where a user has to call two things at the end of a downloading from an Entry:
    
    - unreference
    - complete/fail
    
    And I don't see how one ever wants to use them without each other.
    
    Why not hide unreference all together, and decrement the reference count in complete or fail?
    
    This way it's a lot less error prone, and harder to make mistakes with future changes.



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment122732>

    Why not just store the Path and return that?



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment122731>

    Where is this being used?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122588>

    Let's use strings::contains instead of find to be consistent here.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122736>

    What's the point of this empty branch?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122738>

    This seems to be very easy to make mistakes with setSpace, especially when it's not even expected to be called more than once? 
    
    I thought you said you're going to change it so it's only set once during initialization?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122733>

    We should be more explicit here, that we're claiming more than the space we've set to.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122734>

    Add some message?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment122735>

    Add some message?



src/slave/slave.cpp
<https://reviews.apache.org/r/30774/#comment122730>

    Why is this just a Failure but the other recover is a LOG(FATAL)? Shouldn't we exit here too if unable to recover cache is a critical event?



src/tests/mesos.hpp
<https://reviews.apache.org/r/30774/#comment122578>

    We want to have a consistent naming style across all the tests and files, and we usually in all the tests in Mesos, the "unmocked" methods tend to just have a prefix of "_", so run -> _run


- Timothy Chen


On March 6, 2015, 1:46 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 6, 2015, 1:46 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher-cache-internals.md PRE-CREATION 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
>   src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
>   src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 6, 2015, 5:46 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Added outlook on future features and made other small improvements to documentation. Based on feedback from Robert Lacroix and Jay Buffington.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 5, 2015, 3:15 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Removed Fetcher::clearCache().


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 4, 2015, 4:54 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Refactored fetch() and its continuations. Now separating cached and uncached URIs at the beginning. Simplified a lot of control flow after removing async file operations. All tests run.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 3, 2015, 11:40 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Removed a major piece of both functionality and complexity: no more async() for file operations. This is deemed good enough for MVP. The legacy fetcher behavior is not made worse by this.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 3, 2015, 9:53 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Removed binary files.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 3, 2015, 9:30 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Patched in suggested patches from Benh. Thanks, Ben!


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  docs/images/fetch_cache.jpg PRE-CREATION 
  docs/images/fetch_components.jpg PRE-CREATION 
  docs/images/fetch_flow.jpg PRE-CREATION 
  docs/images/fetch_force1.jpg PRE-CREATION 
  docs/images/fetch_force2.jpg PRE-CREATION 
  docs/images/fetch_state.jpg PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 3, 2015, 5:01 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

More fetcher cache internals documentation with diagrams.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  docs/images/fetch_cache.jpg PRE-CREATION 
  docs/images/fetch_components.jpg PRE-CREATION 
  docs/images/fetch_flow.jpg PRE-CREATION 
  docs/images/fetch_force1.jpg PRE-CREATION 
  docs/images/fetch_force2.jpg PRE-CREATION 
  docs/images/fetch_state.jpg PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 2, 2015, 10:27 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Fetcher cache internals doc with overview diagram, control flow diagram, cache entry state diagram.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  docs/images/fetch_components.jpg PRE-CREATION 
  docs/images/fetch_flow.jpg PRE-CREATION 
  docs/images/fetch_state.jpg PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 1, 2015, 12:27 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Added a test for bypassing the cache when upfront size fetching fails.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Bernd Mathiske <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 1, 2015, 7:42 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Rebased to current master, changed nothing in the outcome, just adjusted the diff files.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am d299f07d865080676ca8a550cf6005c6ab32839f 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp a06d68032f26ccb3f786b6ea7c3a6c3c52449bd2 
  src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp e91e5e484eea4587ac8f2eb9cefeab4acc9f4615 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included :-)
- To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske


Re: Review Request 30774: Fetcher Cache

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review74695
-----------------------------------------------------------


Bad patch!

Reviews applied: [30606, 30609, 30774]

Failed command: ./support/apply-review.sh -n -r 30774

Error:
 2015-03-01 10:54:39 URL:https://reviews.apache.org/r/30774/diff/raw/ [169279/169279] -> "30774.patch" [1]
error: patch failed: src/tests/fetcher_tests.cpp:48
error: src/tests/fetcher_tests.cpp: patch does not apply
error: patch failed: src/tests/mesos.cpp:47
error: src/tests/mesos.cpp: patch does not apply
Failed to apply patch

- Mesos ReviewBot


On March 1, 2015, 10:35 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated March 1, 2015, 10:35 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs can now be cached in a cache directory designated by a slave flag. This only happens when asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size limit also given by a new slave flag. Cache-resident files are evicted as necessary to make space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one download. The fetcher program remains external for safety reasons, but is now augmented with more elaborate parameters packed into a JSON object to implement specific fetch actions for all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
>   include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
>   src/Makefile.am 17d0d7aa7361c3a373f6863d36b0a4767f5c05c4 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
>   src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
>   src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
>   src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
>   src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
>   src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
>   src/slave/containerizer/mesos/containerizer.hpp ae61a0fcd19f2ba808624312401f020121baf5d4 
>   src/slave/containerizer/mesos/containerizer.cpp ec4626f903d44c0911093ff763ef16ad27c418a9 
>   src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
>   src/slave/slave.cpp 9f31fa46304398e8f87b41b55d8f4cfd4aba10b9 
>   src/tests/docker_containerizer_tests.cpp 06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp 0ae0a10b41a2c0f7459c771b31c76bbc0c02df4f 
>   src/tests/mesos.hpp f7a0d057edea1a7ec7ae3bb9bc729230bf7dd46d 
>   src/tests/mesos.cpp 23f790cbb289f6483dcdfa6ecccd462360ce02f1 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher uses JSON in a single env var as a parameter. They never tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection, which is not really needed for debugging, and thus the next patch can refactor fetch() and run(). (The latter comes in two varieties, which complicates matters without much benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher info protobuf. And then propagates these additions throughout the rest of the code base where applicable. This includes passing the slave ID all the way down to the place where the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors fetch() and run(), so there is only one of each. Introduces about half of the actual cache logic, including a hashmap of cache file objects for bookkeeping and basic operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the file size for the URI passed as argument. This is needed to determine the size of a file on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor, since it is going to be reused by download size queries when we introduce fetcher cache management. Also factored out URI validation, which will be used the same way by mesos-fetcher and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included :-)
> - To find out how much space is needed, downloading has a prelude in which we query the download size from the URI. This works for all URI types that mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files are. We use reference counting for this, since there may be concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>