You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "ZhiWei Shi (Jira)" <ji...@apache.org> on 2022/12/09 14:21:00 UTC

[jira] [Assigned] (HDFS-16867) Exiting Mover due to an exception in MoverMetrics.create

     [ https://issues.apache.org/jira/browse/HDFS-16867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ZhiWei Shi reassigned HDFS-16867:
---------------------------------

    Assignee: ZhiWei Shi

> Exiting Mover due to an exception in MoverMetrics.create
> --------------------------------------------------------
>
>                 Key: HDFS-16867
>                 URL: https://issues.apache.org/jira/browse/HDFS-16867
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZhiWei Shi
>            Assignee: ZhiWei Shi
>            Priority: Major
>
> After the Mover process is started for a period of time, the process exits unexpectedly and an error is reported in the log
> {code:java}
> [hdfs@${hostname} hadoop-3.3.2-nn]$ nohup bin/hdfs mover -p /test-mover-jira9534 > mover.log.jira9534.20221209.2 &
> [hdfs@{hostname}  hadoop-3.3.2-nn]$ tail -f mover.log.jira9534.20221209.2
> ...
> 22/12/09 14:22:32 INFO balancer.Dispatcher: Start moving blk_1073911285_170466 with size=134217728 from 10.108.182.205:800:DISK to ${ip1}:800:ARCHIVE through ${ip2}:800
> 22/12/09 14:22:32 INFO balancer.Dispatcher: Successfully moved blk_1073911285_170466 with size=134217728 from 10.108.182.205:800:DISK to ${ip1}:800:ARCHIVE through ${ip2}:800
> 22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Stopping Mover metrics system...
> 22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Mover metrics system stopped.
> 22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Mover metrics system shutdown complete.
> Dec 9, 2022, 2:22:42 PM  Mover took 13mins, 19sec
> 22/12/09 14:22:42 ERROR mover.Mover: Exiting Mover due to an exception
> org.apache.hadoop.metrics2.MetricsException: Metrics source Mover-${BlockpoolID} already exists!
>         at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>         at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>         at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>         at org.apache.hadoop.hdfs.server.mover.MoverMetrics.create(MoverMetrics.java:49)
>         at org.apache.hadoop.hdfs.server.mover.Mover.<init>(Mover.java:162)
>         at org.apache.hadoop.hdfs.server.mover.Mover.run(Mover.java:684)
>         at org.apache.hadoop.hdfs.server.mover.Mover$Cli.run(Mover.java:826)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
>         at org.apache.hadoop.hdfs.server.mover.Mover.main(Mover.java:908) {code}
> 1、“final ExitStatus r = m.run()”return only after scheduled one of replica
> 2、“r == ExitStatus.IN_PROGRESS”,won’t run iter.remove()
> 3、Execute “new Mover” and “this.metrics = MoverMetrics.create(this)” multiple times for the same nnc,which leads to the error
> {code:java}
> //Mover.java
>  for (final StorageType t : diff.existing) {
>   for (final MLocation ml : locations) {
>     final Source source = storages.getSource(ml);
>     if (ml.storageType == t && source != null) {
>       // try to schedule one replica move.
>       if (scheduleMoveReplica(db, source, diff.expected)) { // 1、return only after scheduled one of replica             
>          return true;
>       }
>     }
>   }
> }
> while (connectors.size() > 0) {
>   Collections.shuffle(connectors);
>   Iterator<NameNodeConnector> iter = connectors.iterator();
>   while (iter.hasNext()) {
>     NameNodeConnector nnc = iter.next();
> //3、Execute “new Mover” and “this.metrics = MoverMetrics.create(this)” multiple times for the same nnc,which leads to the error
>      final Mover m = new Mover(nnc, conf, retryCount,   
>          excludedPinnedBlocks);
>     final ExitStatus r = m.run();
>     if (r == ExitStatus.SUCCESS) { // 2、r ==ExitStatus.IN_PROGRESS,won’t run iter.remove()
>        IOUtils.cleanupWithLogger(LOG, nnc);
>       iter.remove();
>     } {code}
> Probably, we should initialize movermetrics when we initialize nnc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org