You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Guanpx (via GitHub)" <gi...@apache.org> on 2023/04/17 06:27:51 UTC

[GitHub] [hudi] Guanpx opened a new issue, #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Guanpx opened a new issue, #8475:
URL: https://github.com/apache/hudi/issues/8475

   
   
   **Describe the problem you faced**
   
   when i use org.apache.hudi.utilities.HoodieCleaner to clean hudi old version files, meet this error;
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   ```
   spark-submit --master yarn \
   --name B_P1_HUDI_clean_$1 \
   --deploy-mode cluster \
   --driver-memory 2g \
   --executor-memory 500m \
   --executor-cores 2 \
   --num-executors 1 \
   --conf spark.default.parallelism=200 \
   --conf spark.dynamicAllocation.enabled=false \
   --class  org.apache.hudi.utilities.HoodieCleaner \
   /home/cdh/pxguan/spark_offline/hudi/hudi-utilities-bundle_2.12-0.12.0.jar 
   --target-base-path path_to_table --props props
   ```
   
   
   **Expected behavior**
   
   **there are 8000+ files in table_path ;
   when files 100+, this error does not face**
   
   **Environment Description**
   
   * Hudi version : 0.12.0
   
   * Spark version : 2.4.3
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   
   **Stacktrace**
   
   ```
   ERROR service.RequestHandler: Got runtime exception servicing request partition=&maxinstant=20230417133654798&basepath=hdfs%3A%2Fhudi%2Fdw%2Frds.db%table_path&lastinstantts=20230417140011979&timelinehash=bb0d1f56fec5e7b3f00202df2d61e989975ae7a568d6fe4dd0965615c431715b
   org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files in partition hdfs:/hudi/dw/rds.db/table_path from metadata
   	at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:137)
   	at org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:305)
   	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:296)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:744)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:758)
   	at org.apache.hudi.timeline.service.handlers.FileSliceHandler.getReplacedFileGroupsBefore(FileSliceHandler.java:102)
   	at org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$21(RequestHandler.java:402)
   	at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:498)
   	at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
   	at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
   	at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
   	at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
   	at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
   	at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
   	at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
   	at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
   	at org.apache.hudi.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
   	at org.apache.hudi.org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
   	at org.apache.hudi.org.eclipse.jetty.server.Server.handle(Server.java:502)
   	at org.apache.hudi.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
   	at org.apache.hudi.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
   	at org.apache.hudi.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
   	at org.apache.hudi.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
   	at org.apache.hudi.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:132)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.util.NoSuchElementException: No value present in Option
   	at org.apache.hudi.common.util.Option.get(Option.java:89)
   	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1052)
   	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:996)
   	at org.apache.hudi.metadata.HoodieBackedTableMetadata.getPartitionFileSliceToKeysMapping(HoodieBackedTableMetadata.java:379)
   	at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:206)
   	at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:141)
   	at org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:312)
   	at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:135)
   	... 39 more
   23/04/17 14:18:38 WARN core.ExceptionMapper: Uncaught exception
   org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files in partition hdfs:/hudi/dw/rds.db/rms_inloan_nephele/rds_rms_inloan_nephele_inloan_risk_customer_level_log from metadata
   	at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:137)
   	at org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:305)
   	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:296)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:744)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:758)
   	at org.apache.hudi.timeline.service.handlers.FileSliceHandler.getReplacedFileGroupsBefore(FileSliceHandler.java:102)
   	at org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$21(RequestHandler.java:402)
   	at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:498)
   	at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
   	at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
   	at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
   	at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
   	at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
   	at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
   	at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
   	at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
   	at org.apache.hudi.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
   	at org.apache.hudi.org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
   	at org.apache.hudi.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
   	at org.apache.hudi.org.eclipse.jetty.server.Server.handle(Server.java:502)
   	at org.apache.hudi.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
   	at org.apache.hudi.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
   	at org.apache.hudi.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
   	at org.apache.hudi.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
   	at org.apache.hudi.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:132)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
   	at org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.util.NoSuchElementException: No value present in Option
   	at org.apache.hudi.common.util.Option.get(Option.java:89)
   	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1052)
   	at org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:996)
   	at org.apache.hudi.metadata.HoodieBackedTableMetadata.getPartitionFileSliceToKeysMapping(HoodieBackedTableMetadata.java:379)
   	at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:206)
   	at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:141)
   	at org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:312)
   	at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:135)
   	... 39 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8475:
URL: https://github.com/apache/hudi/issues/8475#issuecomment-1569813272

   @Guanpx Did you got a chance to try out with version 0.12.3?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Guanpx commented on issue #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Posted by "Guanpx (via GitHub)" <gi...@apache.org>.
Guanpx commented on issue #8475:
URL: https://github.com/apache/hudi/issues/8475#issuecomment-1538154789

   > are you running a separate stand alone job to do cleaning? if its part of your regular ingestion job. regular ingestion should ensure metadata is fully populated. but if you are running a standalone job, you need to ensure metadata configs match w/ what you set with regular ingestion job and what you set for your cleaner job
   
   
   Yes, clean file with spark offline task ,But not all tasks were failed, I will try update hudi to 0.12.3 and feedback
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Guanpx commented on issue #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Posted by "Guanpx (via GitHub)" <gi...@apache.org>.
Guanpx commented on issue #8475:
URL: https://github.com/apache/hudi/issues/8475#issuecomment-1512762919

   @danny0405 I was wondering if it would be possible for you to lend me a hand with this? I would be extremely grateful for your assistance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8475:
URL: https://github.com/apache/hudi/issues/8475#issuecomment-1514104404

   I guess your table to clean does not actually enable the metadata table yet, and the cleaner by default enable the metadata table though. Somehow we need to disable the metadata table for the cleaner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Guanpx commented on issue #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Posted by "Guanpx (via GitHub)" <gi...@apache.org>.
Guanpx commented on issue #8475:
URL: https://github.com/apache/hudi/issues/8475#issuecomment-1512365925

   > 
   
   i use hudi-utilities-bundle_2.12-0.13.2.jar , no Exception but doesn't clean files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8475:
URL: https://github.com/apache/hudi/issues/8475#issuecomment-1511275016

   @Guanpx This was the old bug which got fixed with https://github.com/apache/hudi/pull/6836 I guess.
   
   Are you facing the issue with this commit or newer version also?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #8475:
URL: https://github.com/apache/hudi/issues/8475#issuecomment-1520669772

   are you running a separate stand alone job to do cleaning? if its part of your regular ingestion job. regular ingestion should ensure metadata is fully populated. but if you are running a standalone job, you need to ensure metadata configs match w/ what you set with regular ingestion job and what you set for your cleaner job


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Guanpx commented on issue #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Posted by "Guanpx (via GitHub)" <gi...@apache.org>.
Guanpx commented on issue #8475:
URL: https://github.com/apache/hudi/issues/8475#issuecomment-1515966392

   > I guess your table to clean does not actually enable the metadata table yet, and the cleaner by default enable the metadata table though. Somehow we need to disable the metadata table for the cleaner.
   
    only a part of the table failed to execute hudi cleaning.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Guanpx commented on issue #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

Posted by "Guanpx (via GitHub)" <gi...@apache.org>.
Guanpx commented on issue #8475:
URL: https://github.com/apache/hudi/issues/8475#issuecomment-1538154366

   > 
   
   yes, clean file with spark offline task ,But not all tasks were failed, I will try  update hudi to 0.12.3 and  feedback


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org