You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/09/01 11:19:00 UTC

[jira] [Work logged] (HIVE-26495) MSCK repair perf issue HMSChecker ThreadPool is blocked at fs.listStatus

     [ https://issues.apache.org/jira/browse/HIVE-26495?focusedWorklogId=805478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-805478 ]

ASF GitHub Bot logged work on HIVE-26495:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Sep/22 11:18
            Start Date: 01/Sep/22 11:18
    Worklog Time Spent: 10m 
      Work Description: ayushtkn commented on code in PR #3549:
URL: https://github.com/apache/hive/pull/3549#discussion_r960527439


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##########
@@ -585,10 +586,18 @@ private Path processPathDepthInfo(final PathDepthInfo pd)
       if (currentDepth == partColNames.size()) {
         return currentPath;
       }
-      FileStatus[] fileStatuses = fs.listStatus(currentPath, FileUtils.HIDDEN_FILES_PATH_FILTER);

Review Comment:
   Can use `RemoteIterators.filteringRemoteIterator`
   there is one wrapper in FileUtils as well.
   https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L1339-L1342





Issue Time Tracking
-------------------

    Worklog Id:     (was: 805478)
    Time Spent: 1.5h  (was: 1h 20m)

> MSCK repair perf issue HMSChecker ThreadPool is blocked at fs.listStatus
> ------------------------------------------------------------------------
>
>                 Key: HIVE-26495
>                 URL: https://issues.apache.org/jira/browse/HIVE-26495
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Naresh P R
>            Assignee: Naresh P R
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> With hive.metastore.fshandler.threads = 15, all 15 *MSCK-GetPaths-xx* are slogging at following trace.
> {code:java}
> "MSCK-GetPaths-11" #12345 daemon prio=5 os_prio=0 tid= nid= waiting on condition [0x00007f9f099a6000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x00000003f92d1668> (a java.util.concurrent.CompletableFuture$Signaller)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>     at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> ...
> at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3230)
>     at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953)
>     at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1995)
>     at org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.processPathDepthInfo(HiveMetaStoreChecker.java:550)
>     at org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:543)
>     at org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:525)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750){code}
> We should take advantage of non-block listStatusIterator instead of listStatus which is a blocking call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)