You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Szilard Nemeth (JIRA)" <ji...@apache.org> on 2018/12/19 11:27:00 UTC

[jira] [Comment Edited] (YARN-9098) Separate mtab file reader code and cgroups file system hierarchy parser code from CGroupsHandlerImpl and ResourceHandlerModule

    [ https://issues.apache.org/jira/browse/YARN-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724900#comment-16724900 ] 

Szilard Nemeth edited comment on YARN-9098 at 12/19/18 11:26 AM:
-----------------------------------------------------------------

Hi [~Jim_Brennan]!

 

Thanks for looking at my patch and taking steps on testing!

First of all:  the static import is fixed with the new patch.

Actually, I understand your fear regarding this refactoring but this eliminates some code duplication and improves the quality of the code significantly. 

 

About the test errors: 
 These are very strange errors. At this level, they seem to be all test framework issues, as somehow on your computer, the temp directory storing cgroups and the cpu controller underneath have not created, which is very strange as the tests are having assertions on the file and directory creations.

I re-built and ran the testcases with these commands again on my computer, locally: 
 1.
{code:java}
mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true{code}
2.
{code:java}
mvn test -pl org.apache.hadoop:hadoop-yarn-server-nodemanager -fae | tee ~/maventest`date +%Y%m%d`
{code}
 

I don't have test failures on any of the classes where you had, this is the excerpt of the output:

 
{noformat}
[INFO] Running org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestMtabFileParser
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.138 s - in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestMtabFileParser
...

[INFO] Running org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsHandlerImpl
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.538 s - in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsHandlerImpl
 
...
[INFO] Running org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsControllerPaths
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.127 s - in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsControllerPaths
{noformat}
 

 

This is the file listing of my target directory: 
{noformat}
??-( szilardnemeth@snemeth-MBP[10:17:19] <0> @YARN-9098 )--( ~/development/apache/hadoop )--
└-$ cat /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/cgroups/025802ba-4abe-4862-942b-beef2d279ca7
none /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/cgroups/cp cgroup rw,relatime,cpu 0 0
none /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/cgroups/cpu cgroup rw,relatime,cpu 0 0
none /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/cgroups/blkio cgroup rw,relatime,blkio 0 0{noformat}
I even tried to remove the test directory (with {{rm -rf /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir}}) and re-execute the tests but they work fine.
 Anyway, I added some code that better describes the assertion errors plus added some logging of cgroup paths to 2 testcases.

 

Could you please rerun the tests with the new code and do a file listing like I did? 
 This must be some platform issue I suppose.

The test failures for {{TestCGroupsHandlerImpl}}: 
 {{TestCGroupsHandlerImpl#createPremountedCgroups}} calls:
{noformat}
File cpuCgroup = new File(parentDir, "cpu");
//and later on...
assertTrue("Directory should be created", cpuCgroup.mkdirs());
{noformat}
 

This should create cgroups for cpu, and as you can see, it is even asserted properly.

Could you please re-test?

Thanks!

 

 


was (Author: snemeth):
Hi [~Jim_Brennan]!

 

Thanks for looking at my patch and taking steps on testing!

First of all:  the static import is fixed with the new patch.

Actually, I understand your fear regarding this refactoring but this eliminates some code duplication and improves the quality of the code significantly. 

 

About the test errors: 
These are very strange errors. At this level, they seem to be all test framework issues, as somehow on your computer, the temp directory storing cgroups and the cpu controller underneath have not created, which is very strange as the tests are having assertions on the file and directory creations.


I re-built and ran the testcases with these commands again on my computer, locally: 
1.
{code:java}
mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true{code}

2.
{code:java}
mvn test -pl org.apache.hadoop:hadoop-yarn-server-nodemanager -fae | tee ~/maventest`date +%Y%m%d`
{code}
 

I don't have test failures on any of the classes where you had, this is the excerpt of the output:

 
{noformat}
[INFO] Running org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestMtabFileParser
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.138 s - in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestMtabFileParser
...

[INFO] Running org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsHandlerImpl
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.538 s - in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsHandlerImpl
 
...
[INFO] Running org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsControllerPaths
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.127 s - in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsControllerPaths
{noformat}
 

 

This is the file listing of my target directory: 
{noformat}
??-( szilardnemeth@snemeth-MBP[10:17:19] <0> @YARN-9098 )--( ~/development/apache/hadoop )--
└-$ cat /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/cgroups/025802ba-4abe-4862-942b-beef2d279ca7
none /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/cgroups/cp cgroup rw,relatime,cpu 0 0
none /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/cgroups/cpu cgroup rw,relatime,cpu 0 0
none /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/cgroups/blkio cgroup rw,relatime,blkio 0 0{noformat}

I even tried to remove the test directory (with {{rm -rf /Users/szilardnemeth/development/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir}}) and re-execute the tests but they work fine.
Anyway, I added some code that better describes the assertion errors plus added some logging of cgroup paths to 2 testcases.

 

Could you please rerun the tests with the new code and do a file listing like I did? 
This must be some platform issue I suppose.


The test failures for {{TestCGroupsHandlerImpl}}: 
{{TestCGroupsHandlerImpl#createPremountedCgroups}} calls: 
{noformat}
File cpuCgroup = new File(parentDir, "cpu");
//and later on...
assertTrue("Directory should be created", cpuCgroup.mkdirs());
{noformat}
 

This should create cgroups for cpu, and as you can see, it is even asserted properly.

 

Thanks!

 

 

> Separate mtab file reader code and cgroups file system hierarchy parser code from CGroupsHandlerImpl and ResourceHandlerModule
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-9098
>                 URL: https://issues.apache.org/jira/browse/YARN-9098
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-9098.002.patch, YARN-9098.003.patch, YARN-9098.004.patch
>
>
> Separate mtab file reader code and cgroups file system hierarchy parser code from CGroupsHandlerImpl and ResourceHandlerModule
> CGroupsHandlerImpl has a method parseMtab that parses an mtab file and stores cgroups data.
> CGroupsLCEResourcesHandler also has a method with the same name, with identical code.
> The parser code should be extracted from these places and be added in a new class as this is a separate responsibility.
> As the output of the file parser is a Map<String, Set<String>>, it's better to encapsulate it in a domain object, named 'CGroupsMountConfig' for instance.
> ResourceHandlerModule has a method named parseConfiguredCGroupPath, that is responsible for producing the same results (Map<String, Set<String>>) to store cgroups data, it does not operate on mtab file, but looking at the filesystem for cgroup settings. As the output is the same, CGroupsMountConfig should be used here, too.
> Again, this could should not be part of ResourceHandlerModule as it is a different responsibility.
> One more thing which is strongly related to the methods above is CGroupsHandlerImpl.initializeFromMountConfig: This method processes the result of a parsed mtab file or a parsed cgroups filesystem data and stores file system paths for all available controllers. This method invokes findControllerPathInMountConfig, which is a duplicated in CGroupsHandlerImpl and CGroupsLCEResourcesHandler, so it should be moved to a single place. To store filesystem path and controller mappings, a new domain object could be introduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org