You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/13 20:10:01 UTC

[GitHub] [hudi] aznwarmonkey opened a new issue #4803: [SUPPORT] Clustering throwing exception

aznwarmonkey opened a new issue #4803:
URL: https://github.com/apache/hudi/issues/4803


   Hello,
   
   I am trying to run clustering and the job is erroring out without much indication as to why.
   
   Here's the command I am using to run clustering:
   
   ```sh
   spark-submit \
   --class org.apache.hudi.utilities.HoodieClusteringJob \
   /usr/lib/hudi/hudi-utilities-bundle.jar \
   --props s3://path-to-test/clustering.properties \
   --mode scheduleAndExecute \
   --base-path s3://path-to-test/data/hudi/test/country/ \
   --table-name country --spark-memory 1g
   ```
   
   Here's the properties file:
   ```
   hoodie.clustering.async.enabled=true
   hoodie.clustering.async.max.commits=1
   hoodie.clustering.plan.strategy.target.file.max.bytes=1073741824
   hoodie.clustering.plan.strategy.small.file.limit=629145600
   hoodie.clustering.execution.strategy.class=org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy
   hoodie.clustering.plan.strategy.sort.columns=enrich_selector_id
   ```
   
   And here's the console output of clustering job.
   ```shell
   22/02/13 20:00:48 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 1541, ip-172-31-74-236.ec2.internal, executor 3, partition 0, PROCESS_LOCAL, 7927 bytes)
   22/02/13 20:00:48 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 1542, ip-172-31-69-5.ec2.internal, executor 2, partition 1, PROCESS_LOCAL, 7922 bytes)
   22/02/13 20:00:48 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-74-236.ec2.internal:46827 (size: 102.2 KB, free: 4.8 GB)
   22/02/13 20:00:48 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-69-5.ec2.internal:43517 (size: 102.2 KB, free: 366.1 MB)
   22/02/13 20:00:49 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/hoodie.properties' for reading
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213084348.replacecommit' for reading
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213103928.replacecommit' for reading
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213122909.replacecommit' for reading
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213142348.replacecommit' for reading
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213162102.replacecommit' for reading
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213181256.replacecommit.requested' for reading
   22/02/13 20:00:50 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 1541) in 1859 ms on ip-172-31-74-236.ec2.internal (executor 3) (1/2)
   22/02/13 20:00:50 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 1542) in 1865 ms on ip-172-31-69-5.ec2.internal (executor 2) (2/2)
   22/02/13 20:00:50 INFO YarnScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool
   22/02/13 20:00:50 INFO DAGScheduler: ResultStage 3 (collect at HoodieSparkEngineContext.java:78) finished in 1.891 s
   22/02/13 20:00:50 INFO DAGScheduler: Job 3 finished: collect at HoodieSparkEngineContext.java:78, took 1.893633 s
   22/02/13 20:00:50 INFO Javalin: Stopping Javalin ...
   22/02/13 20:00:50 INFO Javalin: Javalin has stopped
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213084348.replacecommit' for reading
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213103928.replacecommit' for reading
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213122909.replacecommit' for reading
   22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213142348.replacecommit' for reading
   22/02/13 20:00:51 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213162102.replacecommit' for reading
   22/02/13 20:00:51 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213181256.replacecommit.requested' for reading
   22/02/13 20:00:51 ERROR HoodieClusteringJob: Clustering with basePath: s3://path-to-test/data/hudi/test/country/, tableName: country, runningMode: scheduleAndExecute failed
   22/02/13 20:00:51 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-66-151.ec2.internal:4041
   22/02/13 20:00:51 INFO YarnClientSchedulerBackend: Interrupting monitor thread
   22/02/13 20:00:51 INFO YarnClientSchedulerBackend: Shutting down all executors
   22/02/13 20:00:51 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
   22/02/13 20:00:51 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
   (serviceOption=None,
    services=List(),
    started=false)
   22/02/13 20:00:51 INFO YarnClientSchedulerBackend: Stopped
   22/02/13 20:00:51 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
   22/02/13 20:00:51 INFO MemoryStore: MemoryStore cleared
   22/02/13 20:00:51 INFO BlockManager: BlockManager stopped
   22/02/13 20:00:51 INFO BlockManagerMaster: BlockManagerMaster stopped
   22/02/13 20:00:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
   22/02/13 20:00:51 INFO SparkContext: Successfully stopped SparkContext
   22/02/13 20:00:51 INFO ShutdownHookManager: Shutdown hook called
   22/02/13 20:00:51 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-3dd6c522-17c3-48a5-a809-8a0ad56c6da7
   22/02/13 20:00:51 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-f841de1d-4101-49bc-8178-7bc79ede16b3
   ```
   
   Do any of you guys have any insight as to why this error is happening?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1047215020


   @aznwarmonkey : gentle ping. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1073073259


   do you think you can enable debug logs and share it here. above logs does not have much info to investigate. unfortunately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1066405105


   Hi @aznwarmonkey Sorry for the delay. I just took a closer look at the log you provided. And found something weird.
   Based on your configs, you used --mode scheduleAndExecute. And Hoodie will log some info such as `LOG.info("Running Mode: ["xxx"]`,  ` LOG.info("Step 1: Do schedule");`  no matter it's successful or not. But I couldn't find any hudi-related logs in your provided files.
   
   
   So that could you please check and tell us what version of hudi you used
   
   Check your log related properties or find some logging by `HoodieClusteringJob` something like
   `22/03/14 05:30:51,634 INFO HoodieClusteringJob: Running Mode: [execute]; Do cluster`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1038771285


   Hi @aznwarmonkey Would you mind to have a try using 
   Step 1--mode schedule 
   Step 2 --mode execute --instant-time instant step1 generated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] aznwarmonkey commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
aznwarmonkey commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1047794790


   Hello, 
   
   Apologies for the delay. Attached is the full console output of the schedule job.
   
   [output.txt](https://github.com/apache/hudi/files/8116942/output.txt)
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1061351005


   @zhangyue19921010 : Can you followup on this please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1061400775


   Ack. Sorry, almost forget this issue. Will do more research and response ASAP


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] aznwarmonkey commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
aznwarmonkey commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1071911444


   I am running version 0.9 that is deployed on AWS EMR clusters


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] aznwarmonkey commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
aznwarmonkey commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1071911444


   I am running version 0.9 that is deployed on AWS EMR clusters


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on issue #4803: [SUPPORT] Clustering throwing exception

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1038776138


   Also could you please provide full log of this test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org