You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Hadoop QA (Jira)" <ji...@apache.org> on 2021/07/14 10:59:00 UTC

[jira] [Commented] (YARN-10789) RM HA startup can fail due to race conditions in ZKConfigurationStore

    [ https://issues.apache.org/jira/browse/YARN-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380500#comment-17380500 ] 

Hadoop QA commented on YARN-10789:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  8m 48s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 39s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 43s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 35s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 49s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 28s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 37s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 15m 39s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 35s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 36s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 36s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 27s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 40s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m  1s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 30s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 42s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 38s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1112/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 31s{color} | {color:green}{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m 13s{color} | {color:black}{color} | {color:black}{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.TestFSSchedulerConfigurationStore |
|   | hadoop.yarn.server.resourcemanager.metrics.TestCombinedSystemMetricsPublisher |
|   | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1112/artifact/out/Dockerfile |
| JIRA Issue | YARN-10789 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13030225/YARN-10789.branch-3.2.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle spotbugs |
| uname | Linux a3b9939fe226 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | branch-3.2 / f9832031c2c |
| Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 |
|  Test Results | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1112/testReport/ |
| Max. process+thread count | 806 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
| Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1112/console |
| versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> RM HA startup can fail due to race conditions in ZKConfigurationStore
> ---------------------------------------------------------------------
>
>                 Key: YARN-10789
>                 URL: https://issues.apache.org/jira/browse/YARN-10789
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Tarun Parimi
>            Assignee: Tarun Parimi
>            Priority: Major
>             Fix For: 3.4.0
>
>         Attachments: YARN-10789.001.patch, YARN-10789.002.patch, YARN-10789.branch-3.2.001.patch, YARN-10789.branch-3.3.001.patch
>
>
> We are observing below error randomly during hadoop install and RM initial startup when HA is enabled and yarn.scheduler.configuration.store.class=zk is configured. This causes one of the RMs to not startup.
> {code:java}
> 2021-05-26 12:59:18,986 INFO org.apache.hadoop.service.AbstractService: Service RMActiveServices failed in state INITED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /confstore/CONF_STORE
> {code}
> We are trying to create the znode /confstore/CONF_STORE when we initialize the ZKConfigurationStore. But the problem is that the ZKConfigurationStore is initialized when CapacityScheduler does a serviceInit. This serviceInit is done by both Active and Standby RM. So we can run into a race condition when both Active and Standby try to create the same znode when both RM are started at same time.
> ZKRMStateStore on the other hand avoids such race conditions, by creating the znodes only after serviceStart. serviceStart only happens for the active RM which won the leader election, unlike serviceInit which happens irrespective of leader election.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org