You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/13 20:10:01 UTC
[GitHub] [hudi] aznwarmonkey opened a new issue #4803: [SUPPORT] Clustering throwing exception
aznwarmonkey opened a new issue #4803:
URL: https://github.com/apache/hudi/issues/4803
Hello,
I am trying to run clustering and the job is erroring out without much indication as to why.
Here's the command I am using to run clustering:
```sh
spark-submit \
--class org.apache.hudi.utilities.HoodieClusteringJob \
/usr/lib/hudi/hudi-utilities-bundle.jar \
--props s3://path-to-test/clustering.properties \
--mode scheduleAndExecute \
--base-path s3://path-to-test/data/hudi/test/country/ \
--table-name country --spark-memory 1g
```
Here's the properties file:
```
hoodie.clustering.async.enabled=true
hoodie.clustering.async.max.commits=1
hoodie.clustering.plan.strategy.target.file.max.bytes=1073741824
hoodie.clustering.plan.strategy.small.file.limit=629145600
hoodie.clustering.execution.strategy.class=org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy
hoodie.clustering.plan.strategy.sort.columns=enrich_selector_id
```
And here's the console output of clustering job.
```shell
22/02/13 20:00:48 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 1541, ip-172-31-74-236.ec2.internal, executor 3, partition 0, PROCESS_LOCAL, 7927 bytes)
22/02/13 20:00:48 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 1542, ip-172-31-69-5.ec2.internal, executor 2, partition 1, PROCESS_LOCAL, 7922 bytes)
22/02/13 20:00:48 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-74-236.ec2.internal:46827 (size: 102.2 KB, free: 4.8 GB)
22/02/13 20:00:48 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-69-5.ec2.internal:43517 (size: 102.2 KB, free: 366.1 MB)
22/02/13 20:00:49 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/hoodie.properties' for reading
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213084348.replacecommit' for reading
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213103928.replacecommit' for reading
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213122909.replacecommit' for reading
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213142348.replacecommit' for reading
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213162102.replacecommit' for reading
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213181256.replacecommit.requested' for reading
22/02/13 20:00:50 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 1541) in 1859 ms on ip-172-31-74-236.ec2.internal (executor 3) (1/2)
22/02/13 20:00:50 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 1542) in 1865 ms on ip-172-31-69-5.ec2.internal (executor 2) (2/2)
22/02/13 20:00:50 INFO YarnScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool
22/02/13 20:00:50 INFO DAGScheduler: ResultStage 3 (collect at HoodieSparkEngineContext.java:78) finished in 1.891 s
22/02/13 20:00:50 INFO DAGScheduler: Job 3 finished: collect at HoodieSparkEngineContext.java:78, took 1.893633 s
22/02/13 20:00:50 INFO Javalin: Stopping Javalin ...
22/02/13 20:00:50 INFO Javalin: Javalin has stopped
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213084348.replacecommit' for reading
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213103928.replacecommit' for reading
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213122909.replacecommit' for reading
22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213142348.replacecommit' for reading
22/02/13 20:00:51 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213162102.replacecommit' for reading
22/02/13 20:00:51 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213181256.replacecommit.requested' for reading
22/02/13 20:00:51 ERROR HoodieClusteringJob: Clustering with basePath: s3://path-to-test/data/hudi/test/country/, tableName: country, runningMode: scheduleAndExecute failed
22/02/13 20:00:51 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-66-151.ec2.internal:4041
22/02/13 20:00:51 INFO YarnClientSchedulerBackend: Interrupting monitor thread
22/02/13 20:00:51 INFO YarnClientSchedulerBackend: Shutting down all executors
22/02/13 20:00:51 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/02/13 20:00:51 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
22/02/13 20:00:51 INFO YarnClientSchedulerBackend: Stopped
22/02/13 20:00:51 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/02/13 20:00:51 INFO MemoryStore: MemoryStore cleared
22/02/13 20:00:51 INFO BlockManager: BlockManager stopped
22/02/13 20:00:51 INFO BlockManagerMaster: BlockManagerMaster stopped
22/02/13 20:00:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/02/13 20:00:51 INFO SparkContext: Successfully stopped SparkContext
22/02/13 20:00:51 INFO ShutdownHookManager: Shutdown hook called
22/02/13 20:00:51 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-3dd6c522-17c3-48a5-a809-8a0ad56c6da7
22/02/13 20:00:51 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-f841de1d-4101-49bc-8178-7bc79ede16b3
```
Do any of you guys have any insight as to why this error is happening?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1047215020
@aznwarmonkey : gentle ping.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1073073259
do you think you can enable debug logs and share it here. above logs does not have much info to investigate. unfortunately.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1066405105
Hi @aznwarmonkey Sorry for the delay. I just took a closer look at the log you provided. And found something weird.
Based on your configs, you used --mode scheduleAndExecute. And Hoodie will log some info such as `LOG.info("Running Mode: ["xxx"]`, ` LOG.info("Step 1: Do schedule");` no matter it's successful or not. But I couldn't find any hudi-related logs in your provided files.
So that could you please check and tell us what version of hudi you used
Check your log related properties or find some logging by `HoodieClusteringJob` something like
`22/03/14 05:30:51,634 INFO HoodieClusteringJob: Running Mode: [execute]; Do cluster`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1038771285
Hi @aznwarmonkey Would you mind to have a try using
Step 1--mode schedule
Step 2 --mode execute --instant-time instant step1 generated
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] aznwarmonkey commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
aznwarmonkey commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1047794790
Hello,
Apologies for the delay. Attached is the full console output of the schedule job.
[output.txt](https://github.com/apache/hudi/files/8116942/output.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1061351005
@zhangyue19921010 : Can you followup on this please.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1061400775
Ack. Sorry, almost forget this issue. Will do more research and response ASAP
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] aznwarmonkey commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
aznwarmonkey commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1071911444
I am running version 0.9 that is deployed on AWS EMR clusters
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] aznwarmonkey commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
aznwarmonkey commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1071911444
I am running version 0.9 that is deployed on AWS EMR clusters
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on issue #4803: [SUPPORT] Clustering throwing exception
Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on issue #4803:
URL: https://github.com/apache/hudi/issues/4803#issuecomment-1038776138
Also could you please provide full log of this test?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org