You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/07/30 03:03:00 UTC

[jira] [Created] (IMPALA-11464) hasNext() throws FileNotFoundException on staging files and breaks file metadata loading

Quanlong Huang created IMPALA-11464:
---------------------------------------

             Summary: hasNext() throws FileNotFoundException on staging files and breaks file metadata loading
                 Key: IMPALA-11464
                 URL: https://issues.apache.org/jira/browse/IMPALA-11464
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 4.1.0, Impala 4.0.0
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang
         Attachments: Selection_034.png, hdfs-touch-1002-files.sh

HDFS provides a RemoteIterator for listing files. The two key interfaces are hasNext() and next(). We only handle the FileNotFoundException thrown in next(). However, hasNext() could also thrown FileNotFoundException for staging files:
 * The HDFS client will do a partial listing when creating the RemoteIterator. By default it only list 1000 items under the dir. The limit is configured by "[dfs.ls.limit|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml#dfs.ls.limit]".
 * Usually the hasNext() call just consume the partial list and it's enough. However, if the dir contains more than 1000 items. The 1001-th hasNext() call will do a continue listing. This could hit FileNotFoundException if the dir was removed.

Such a FileNotFoundException thrown from hasNext() will break the metadata loading:
https://github.com/apache/impala/blob/89c3e1f821ccd335c3c5507496bb53b80c1cc07a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L948

*Reproducing*

To stably reproduce the issue, I use IntelliJ to stop catalogd at some point, and then manually delete the dir. Here are the detailed steps:

Create a table and run a simple query on it to let catalogd cache its metadata.
{code:sql}
impala-shell> create table my_alltypes like functional.alltypes;
impala-shell> desc my_alltypes;
{code}
Manually create staging dir with more than 1000 files in it.
{code:bash}
hdfs dfs -mkdir -p hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
bash hdfs-touch-1002-files.sh
{code}

In IntelliJ, set a breakpoint at this line and attach to a running catalogd:
https://github.com/apache/impala/blob/89c3e1f821ccd335c3c5507496bb53b80c1cc07a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L987

In Hive, trigger an INSERT on this table.
{code:sql}
$ beeline -u "jdbc:hive2://localhost:11050"
beeline> insert into my_alltypes partition(year=2009, month=4) select id,bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col from functional.alltypes where year=2009 and month=4;
{code}

IntelliJ will prompt out for debugging. Step over until the partial listing is done on the staging dir "-ext-10001", as the screenshot shows.

Manually remove the staging dir "-ext-10001"
{code}
hdfs dfs -rm -r -skipTrash hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
Deleted hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
{code}
Then stop debugging to resume the process. I can see the following errors in catalogd. Event processor is broken and requires a global INVALIDATE METADATA to recover.
{noformat}
I0730 10:36:27.031431  9452 FileSystemUtil.java:837] Ignoring hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001/996 since it is either in a hidden directory or a temporary staging directory hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1
I0730 10:36:27.031471  9452 FileSystemUtil.java:837] Ignoring hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001/997 since it is either in a hidden directory or a temporary staging directory hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1
E0730 10:36:27.034335  9452 ParallelFileMetadataLoader.java:171] Loading file and block metadata for 1 paths for table default.my_alltypes encountered an error loading data for path hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4
Java exception follows:
java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001 does not exist.
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
        at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86)
        at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:168)
        at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
        at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:780)
        at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:743)
        at org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:904)
        at org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1465)
        at org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4068)
        at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1907)
        at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:511)
        at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:935)
        at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:833)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001 does not exist.
        at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:1317)
        at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:1288)
        at org.apache.impala.common.FileSystemUtil$RecursingIterator.hasNext(FileSystemUtil.java:948)
        at org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:888)
        at org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202)
        at org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:162)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322)
        at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
        at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66)
        at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
        at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:162)
        ... 17 more
E0730 10:36:27.034795  9452 MetastoreEventsProcessor.java:841] Event processing needs a invalidate command to resolve the state
Java exception follows:
org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException: EventId: 728392 EventType: ADD_PARTITION Failed to refresh newly added partitions of table {}. Event processing cannot continue. Issue an invalidate metadata command to reset event processor.
        at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1921)
        at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:511)
        at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:935)
        at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:833)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.impala.catalog.TableLoadingException: Loading file and block metadata for 1 paths for table default.my_alltypes: failed to load 1 paths. Check the catalog server log for more details.
        at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:185)
        at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
        at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:780)
        at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:743)
        at org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:904)
        at org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1465)
        at org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4068)
        at org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1907)
        ... 10 more
I0730 10:36:27.622951  9547 catalog-server.cc:400] A catalog update with 1 entries is assembled. Catalog version: 1697 Last sent catalog version: 1696
I0730 10:36:27.625458  9536 catalog-server.cc:813] Collected update: 1:CATALOG_SERVICE_ID, version=1698, original size=60, compressed size=58
W0730 10:36:28.035472  9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391
W0730 10:36:29.035904  9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391
I0730 10:36:29.623289  9547 catalog-server.cc:400] A catalog update with 1 entries is assembled. Catalog version: 1698 Last sent catalog version: 1697
W0730 10:36:30.036418  9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391
W0730 10:36:31.036957  9452 MetastoreEventsProcessor.java:826] Event processing is skipped since status is NEEDS_INVALIDATE. Last synced event id is 728391
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)