You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "alexone95 (via GitHub)" <gi...@apache.org> on 2023/04/12 09:13:14 UTC

[GitHub] [hudi] alexone95 opened a new issue, #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

alexone95 opened a new issue, #8436:
URL: https://github.com/apache/hudi/issues/8436

   Hello, i'm trying to run up the hoodie commit clean process as a step in a cluster EMR via spark submit. I am following the instruction in [(https://hudi.apache.org/docs/hoodie_cleaner/)], so in this way i got this in script argument on EMR:
   
   spark-submit --class "org.apache.hudi.utilities.HoodieCleaner `ls packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` --target-base-path "PATH_TO_.hoodie" --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS --hoodie-conf hoodie.cleaner.commits.retained=10 --hoodie-conf hoodie.cleaner.parallelism=200"
   
   but what a got is the following error: Error: Missing application resource.
   
   **To Reproduce**
   
   To reproduce the problem we add a step on EMR cluster with the previously argument 
   
   **Expected behavior**
   
   I expect that in the .hoodie table i will see only the 10 latest commits
   
   **Environment Description**
   
       Hudi version : 0.12.1-amzn-0
       Spark version : 3.3.0
       Hive version : 3.1.3
       Hadoop version : 3.3.3 amz
       Storage (HDFS/S3/GCS..) : S3
       Running on Docker? (yes/no) : no (EMR 6.9.0)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexone95 commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

Posted by "alexone95 (via GitHub)" <gi...@apache.org>.
alexone95 commented on issue #8436:
URL: https://github.com/apache/hudi/issues/8436#issuecomment-1510840666

   I succeded in calling the cleaner directly in EMR, but the .commit files in the /.hoodie directory are still there. Am i missing something in how the cleaner is working? i expect to find only the last 10 commits. Adding the stacktrace that i get from calling the cleaner:
   
   23/04/13 15:48:33 INFO SparkContext: Running Spark version 3.3.0-amzn-1 
   23/04/13 15:48:33 INFO ResourceUtils: ============================================================== 
   23/04/13 15:48:33 INFO ResourceUtils: No custom resources configured for spark.driver. 
   23/04/13 15:48:33 INFO ResourceUtils: ============================================================== 
   23/04/13 15:48:33 INFO SparkContext: Submitted application: hoodie-cleaner-hudiTable 
   23/04/13 15:48:33 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory-> name: memory, amount: 9108, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount:1.0) 
   23/04/13 15:48:33 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 
   23/04/13 15:48:33 INFO ResourceProfileManager: Added ResourceProfile id: 0 
   23/04/13 15:48:33 INFO SecurityManager: Changing view acls to: root 
   23/04/13 15:48:33 INFO SecurityManager: Changing modify acls to: root 
   23/04/13 15:48:33 INFO SecurityManager: Changing view acls groups to: 
   23/04/13 15:48:33 INFO SecurityManager: Changing modify acls groups to: 
   23/04/13 15:48:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 
   23/04/13 15:48:33 INFO deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec 
   23/04/13 15:48:33 INFO deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type 
   23/04/13 15:48:33 INFO deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 
   23/04/13 15:48:33 INFO Utils: Successfully started service 'sparkDriver' on port 43157. 
   23/04/13 15:48:33 INFO SparkEnv: Registering MapOutputTracker 
   23/04/13 15:48:33 INFO SparkEnv: Registering BlockManagerMaster 
   23/04/13 15:48:33 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 
   23/04/13 15:48:33 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 
   23/04/13 15:48:33 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 
   23/04/13 15:48:33 INFO DiskBlockManager: Created local directory at /mnt/tmp/blockmgr-cbc3b241-36a8-45c6-aa3f-083f987dbb58 
   23/04/13 15:48:33 INFO MemoryStore: MemoryStore started with capacity 912.3 MiB 
   23/04/13 15:48:33 INFO SparkEnv: Registering OutputCommitCoordinator 
   23/04/13 15:48:33 INFO SubResultCacheManager: Sub-result caches are disabled. 
   23/04/13 15:48:34 INFO Utils: Successfully started service 'SparkUI' on port 8090. 
   23/04/13 15:48:34 INFO SparkContext: Added JAR file:///usr/lib/hadoop/hadoop-distcp-3.3.3-amzn-1.jar at spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hadoop-distcp-3.3.3-amzn-1.jar with timestamp 1681400913096 
   23/04/13 15:48:34 INFO SparkContext: Added JAR file:/usr/lib/hudi/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar at spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar with timestamp 1681400913096 
   23/04/13 15:48:34 INFO Executor: Starting executor ID driver on host ip-10-108-166-149.eu-central-1.compute.internal 
   23/04/13 15:48:34 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): 'file:/usr/lib/hadoop-lzo/lib/*,file:/usr/lib/hadoop/hadoop-aws.jar,file:/usr/share/aws/aws-java-sdk/*,file:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/usr/share/aws/emr/security/conf,file:/usr/share/aws/emr/security/lib/*,file:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar,file:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/docker/usr/lib/hadoop-lzo/lib/*,file:/docker/usr/lib/hadoop/hadoop-aws.jar,file:/docker/usr/share/aws/aws-java-sdk/*,file:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/docker/usr/share/aws/emr/security/conf,file:/docker/usr/share/aws/emr/security/lib/*,file:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/docker/usr/share/java/Hiv
 e-JSON-Serde/hive-openx-serde.jar,file:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/root/emr-spark-goodies.jar,file:/root/conf,file:/root/emr-s3-select-spark-connector.jar,file:/root/hadoop-aws.jar,file:/root/hive-openx-serde.jar,file:/root/sagemaker-spark-sdk.jar,file:/root/aws-glue-datacatalog-spark-client.jar,file:/root/*' 
   23/04/13 15:48:34 INFO Executor: Fetching spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hadoop-distcp-3.3.3-amzn-1.jar with timestamp 1681400913096 
   23/04/13 15:48:34 INFO TransportClientFactory: Successfully created connection to ip-10-108-166-149.eu-central-1.compute.internal/10.108.166.149:43157 after37 ms (0 ms spent in bootstraps) 
   23/04/13 15:48:34 INFO Utils: Fetching spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hadoop-distcp-3.3.3-amzn-1.jar to /mnt/tmp/spark-41a0e875-65a4-49e6-85dd-d9906be717b5/userFiles-af2662a0-3ce4-4692-ab2f-5c8a363dcdc2/fetchFileTemp894068807142647604.tmp 
   23/04/13 15:48:34 INFO Executor: Adding file:/mnt/tmp/spark-41a0e875-65a4-49e6-85dd-d9906be717b5/userFiles-af2662a0-3ce4-4692-ab2f-5c8a363dcdc2/hadoop-distcp-3.3.3-amzn-1.jar to class loader 
   23/04/13 15:48:34 INFO Executor: Fetching spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar with timestamp 1681400913096 
   23/04/13 15:48:34 INFO Utils: Fetching spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar to /mnt/tmp/spark-41a0e875-65a4-49e6-85dd-d9906be717b5/userFiles-af2662a0-3ce4-4692-ab2f-5c8a363dcdc2/fetchFileTemp7187610857861244070.tmp 
   23/04/13 15:48:34 INFO Executor: Adding file:/mnt/tmp/spark-41a0e875-65a4-49e6-85dd-d9906be717b5/userFiles-af2662a0-3ce4-4692-ab2f-5c8a363dcdc2/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar to class loader 
   23/04/13 15:48:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38925. 
   23/04/13 15:48:34 INFO NettyBlockTransferService: Server created on ip-10-108-166-149.eu-central-1.compute.internal:38925 
   23/04/13 15:48:34 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 
   23/04/13 15:48:34 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ip-10-108-166-149.eu-central-1.compute.internal, 38925, None) 
   23/04/13 15:48:34 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-108-166-149.eu-central-1.compute.internal:38925 with 912.3 MiB RAM, BlockManagerId(driver, ip-10-108-166-149.eu-central-1.compute.internal, 38925, None) 
   23/04/13 15:48:34 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ip-10-108-166-149.eu-central-1.compute.internal, 38925, None) 
   23/04/13 15:48:34 INFO BlockManager: external shuffle service port = 7337 
   23/04/13 15:48:34 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, ip-10-108-166-149.eu-central-1.compute.internal, 38925, None) 
   23/04/13 15:48:35 INFO SingleEventLogFileWriter: Logging events to hdfs:/var/log/spark/apps/local-1681400914190.inprogress 
   23/04/13 15:48:36 INFO ClientConfigurationFactory: Set initial getObject socket timeout to 2000 ms. 
   23/04/13 15:48:37 INFO Javalin: __ __ _ / /____ _ _ __ ____ _ / /(_)____ __ / // __ `/| | / // __ `// // // __ \ / /_/ // /_/ / | |/ // /_/ // // // / / / \____/ \__,_/ |___/ \__,_//_//_//_/ /_/ https://javalin.io/documentation 
   23/04/13 15:48:37 INFO Javalin: Starting Javalin ... 
   23/04/13 15:48:37 INFO Javalin: Listening on http://localhost:45175/ 
   23/04/13 15:48:37 INFO Javalin: Javalin started in 162ms \o/ 
   23/04/13 15:48:38 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 
   23/04/13 15:48:39 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 
   23/04/13 15:48:39 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/.hoodie/hoodie.properties' for reading 
   23/04/13 15:48:39 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 
   23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 
   23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/.hoodie/hoodie.properties' for reading 
   23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141447019.clean' for reading 
   23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141447019.clean' for reading 
   23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 
   23/04/13 15:48:41 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 
   23/04/13 15:48:41 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/.hoodie/hoodie.properties' for reading 
   23/04/13 15:48:41 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141447019.clean' for reading 
   23/04/13 15:48:41 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141447019.clean' for reading 
   23/04/13 15:48:42 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:137 
   23/04/13 15:48:42 INFO DAGScheduler: Got job 0 (collect at HoodieSparkEngineContext.java:137) with 1 output partitions 
   23/04/13 15:48:42 INFO DAGScheduler: Final stage: ResultStage 0 (collect at HoodieSparkEngineContext.java:137) 
   23/04/13 15:48:42 INFO DAGScheduler: Parents of final stage: List() 
   23/04/13 15:48:42 INFO DAGScheduler: Missing parents: List() 
   23/04/13 15:48:42 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at flatMap at HoodieSparkEngineContext.java:137), which has no missing parents 
   23/04/13 15:48:42 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 112.6 KiB, free 912.2 MiB) 
   23/04/13 15:48:42 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 42.3 KiB, free 912.1 MiB) 
   23/04/13 15:48:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-108-166-149.eu-central-1.compute.internal:38925 (size: 42.3 KiB, free: 912.3 MiB) 
   23/04/13 15:48:42 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1570 
   23/04/13 15:48:42 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at flatMap at HoodieSparkEngineContext.java:137) (first 15 tasks are for partitions Vector(0)) 
   23/04/13 15:48:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0 
   23/04/13 15:48:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 4447 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:42 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 
   23/04/13 15:48:43 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2928 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 456 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver)(1/1) 
   23/04/13 15:48:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
   23/04/13 15:48:43 INFO DAGScheduler: ResultStage 0 (collect at HoodieSparkEngineContext.java:137) finished in 1.056 s 
   23/04/13 15:48:43 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job 
   23/04/13 15:48:43 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished 
   23/04/13 15:48:43 INFO DAGScheduler: Job 0 finished: collect at HoodieSparkEngineContext.java:137, took 1.161074 s 
   23/04/13 15:48:43 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:103 
   23/04/13 15:48:43 INFO DAGScheduler: Got job 1 (collect at HoodieSparkEngineContext.java:103) with 14 output partitions 
   23/04/13 15:48:43 INFO DAGScheduler: Final stage: ResultStage 1 (collect at HoodieSparkEngineContext.java:103) 
   23/04/13 15:48:43 INFO DAGScheduler: Parents of final stage: List() 
   23/04/13 15:48:43 INFO DAGScheduler: Missing parents: List() 
   23/04/13 15:48:43 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at map at HoodieSparkEngineContext.java:103), which has no missing parents 
   23/04/13 15:48:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 112.3 KiB, free 912.0 MiB) 
   23/04/13 15:48:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 42.3 KiB, free 912.0 MiB) 
   23/04/13 15:48:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-108-166-149.eu-central-1.compute.internal:38925 (size: 42.3 KiB, free: 912.2 MiB) 
   23/04/13 15:48:43 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1570 
   23/04/13 15:48:43 INFO DAGScheduler: Submitting 14 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at map at HoodieSparkEngineContext.java:103) (first15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)) 
   23/04/13 15:48:43 INFO TaskSchedulerImpl: Adding task set 1.0 with 14 tasks resource profile 0 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 4598 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 
   23/04/13 15:48:43 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 901 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 1, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO Executor: Running task 1.0 in stage 1.0 (TID 2) 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 209 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver)(1/14) 
   23/04/13 15:48:43 INFO BlockManagerInfo: Removed broadcast_0_piece0 on ip-10-108-166-149.eu-central-1.compute.internal:38925 in memory (size: 42.3 KiB, free: 912.3 MiB) 
   23/04/13 15:48:43 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 968 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 2, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO Executor: Running task 2.0 in stage 1.0 (TID 3) 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 79 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (2/14) 
   23/04/13 15:48:43 INFO Executor: Finished task 2.0 in stage 1.0 (TID 3). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 3, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO Executor: Running task 3.0 in stage 1.0 (TID 4) 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 3) in 31 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (3/14) 
   23/04/13 15:48:43 INFO Executor: Finished task 3.0 in stage 1.0 (TID 4). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 4, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 4) in 26 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (4/14) 
   23/04/13 15:48:43 INFO Executor: Running task 4.0 in stage 1.0 (TID 5) 
   23/04/13 15:48:43 INFO Executor: Finished task 4.0 in stage 1.0 (TID 5). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 5, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 5) in 31 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (5/14) 
   23/04/13 15:48:43 INFO Executor: Running task 5.0 in stage 1.0 (TID 6) 
   23/04/13 15:48:43 INFO Executor: Finished task 5.0 in stage 1.0 (TID 6). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 7) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 6, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO Executor: Running task 6.0 in stage 1.0 (TID 7) 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 6) in 24 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (6/14) 
   23/04/13 15:48:43 INFO Executor: Finished task 6.0 in stage 1.0 (TID 7). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID 8) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 7, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 7) in 30 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (7/14) 
   23/04/13 15:48:43 INFO Executor: Running task 7.0 in stage 1.0 (TID 8) 
   23/04/13 15:48:43 INFO Executor: Finished task 7.0 in stage 1.0 (TID 8). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 8.0 in stage 1.0 (TID 9) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 8, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO Executor: Running task 8.0 in stage 1.0 (TID 9) 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 7.0 in stage 1.0 (TID 8) in 36 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (8/14) 
   23/04/13 15:48:43 INFO Executor: Finished task 8.0 in stage 1.0 (TID 9). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID 10) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 9, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO Executor: Running task 9.0 in stage 1.0 (TID 10) 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 8.0 in stage 1.0 (TID 9) in 29 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (9/14) 
   23/04/13 15:48:43 INFO Executor: Finished task 9.0 in stage 1.0 (TID 10). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 10.0 in stage 1.0 (TID 11) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition10, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO Executor: Running task 10.0 in stage 1.0 (TID 11) 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 10) in 24 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver)(10/14) 
   23/04/13 15:48:43 INFO Executor: Finished task 10.0 in stage 1.0 (TID 11). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 11.0 in stage 1.0 (TID 12) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition11, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 10.0 in stage 1.0 (TID 11) in 24 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (11/14) 
   23/04/13 15:48:43 INFO Executor: Running task 11.0 in stage 1.0 (TID 12) 
   23/04/13 15:48:43 INFO Executor: Finished task 11.0 in stage 1.0 (TID 12). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 12.0 in stage 1.0 (TID 13) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition12, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 11.0 in stage 1.0 (TID 12) in 25 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (12/14) 
   23/04/13 15:48:43 INFO Executor: Running task 12.0 in stage 1.0 (TID 13) 
   23/04/13 15:48:43 INFO Executor: Finished task 12.0 in stage 1.0 (TID 13). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Starting task 13.0 in stage 1.0 (TID 14) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition13, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 12.0 in stage 1.0 (TID 13) in 25 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (13/14) 
   23/04/13 15:48:43 INFO Executor: Running task 13.0 in stage 1.0 (TID 14) 
   23/04/13 15:48:43 INFO Executor: Finished task 13.0 in stage 1.0 (TID 14). 925 bytes result sent to driver 
   23/04/13 15:48:43 INFO TaskSetManager: Finished task 13.0 in stage 1.0 (TID 14) in 26 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (14/14) 
   23/04/13 15:48:43 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
   23/04/13 15:48:43 INFO DAGScheduler: ResultStage 1 (collect at HoodieSparkEngineContext.java:103) finished in 0.603 s 
   23/04/13 15:48:43 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job 
   23/04/13 15:48:43 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished 
   23/04/13 15:48:43 INFO DAGScheduler: Job 1 finished: collect at HoodieSparkEngineContext.java:103, took 0.614621 s 
   23/04/13 15:48:44 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:103 
   23/04/13 15:48:44 INFO DAGScheduler: Got job 2 (collect at HoodieSparkEngineContext.java:103) with 13 output partitions 
   23/04/13 15:48:44 INFO DAGScheduler: Final stage: ResultStage 2 (collect at HoodieSparkEngineContext.java:103) 
   23/04/13 15:48:44 INFO DAGScheduler: Parents of final stage: List() 
   23/04/13 15:48:44 INFO DAGScheduler: Missing parents: List() 
   23/04/13 15:48:44 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[5] at map at HoodieSparkEngineContext.java:103), which has no missing parents 
   23/04/13 15:48:44 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 680.3 KiB, free 911.5 MiB) 
   23/04/13 15:48:44 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 252.0 KiB, free 911.2 MiB) 
   23/04/13 15:48:44 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ip-10-108-166-149.eu-central-1.compute.internal:38925 (size: 252.0 KiB, free:912.0 MiB) 
   23/04/13 15:48:44 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1570 
   23/04/13 15:48:44 INFO DAGScheduler: Submitting 13 missing tasks from ResultStage 2 (MapPartitionsRDD[5] at map at HoodieSparkEngineContext.java:103) (first15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)) 
   23/04/13 15:48:44 INFO TaskSchedulerImpl: Adding task set 2.0 with 13 tasks resource profile 0 
   23/04/13 15:48:44 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 15) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:44 INFO Executor: Running task 0.0 in stage 2.0 (TID 15) 
   23/04/13 15:48:45 INFO BlockManagerInfo: Removed broadcast_1_piece0 on ip-10-108-166-149.eu-central-1.compute.internal:38925 in memory (size: 42.3 KiB, free: 912.1 MiB) 
   23/04/13 15:48:45 INFO MetricsConfig: Loaded properties from hadoop-metrics2.properties 
   23/04/13 15:48:45 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 300 second(s). 
   23/04/13 15:48:45 INFO MetricsSystemImpl: HBase metrics system started 
   23/04/13 15:48:45 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/files-0000_0-28073-422423_20230411160658276001.hfile' for reading 
   23/04/13 15:48:46 INFO ZlibFactory: Successfully loaded & initialized native-zlib library 
   23/04/13 15:48:46 INFO CodecPool: Got brand-new decompressor [.gz] 
   23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230323153445618.rollback' for reading 
   23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230324094400516.rollback' for reading 
   23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230324144931601.rollback' for reading 
   23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230330105151611.rollback' for reading 
   23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230330113040739.rollback' for reading 
   23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405140420990.rollback' for reading 
   23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405141839408.rollback' for reading 
   23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405144726404.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405151032798.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405152220270.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405152832205.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405160019842.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230406073630950.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230407140042928.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230407140100444.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230411072649180.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141002691.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141944579.rollback' for reading 
   23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/.hoodie/hoodie.properties' for reading # WARNING: Unable to attach Serviceability Agent. Unable to attach even with module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.] 
   23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.1_0-28080-422430' for reading 
   23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.2_0-28097-422495' for reading 
   23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.3_0-28214-424972' for reading 
   23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.4_0-28269-425284' for reading 
   23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.5_0-118-2431' for reading 
   23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.5_0-120-2432' for reading 
   23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.6_0-145-2503' for reading 
   23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.7_0-197-2812' for reading 
   23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.8_0-323-5348' for reading 
   23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.9_0-329-5353' for reading 
   23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.10_0-112-2424' for reading 
   23/04/13 15:48:51 INFO Executor: Finished task 0.0 in stage 2.0 (TID 15). 955 bytes result sent to driver 
   23/04/13 15:48:51 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 16) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 1, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:51 INFO Executor: Running task 1.0 in stage 2.0 (TID 16) 
   23/04/13 15:48:51 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 15) in 6734 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (1/13) 
   23/04/13 15:48:51 INFO Executor: Finished task 1.0 in stage 2.0 (TID 16). 912 bytes result sent to driver 
   23/04/13 15:48:51 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 17) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 2, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:51 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 16) in 179 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (2/13) 
   23/04/13 15:48:51 INFO Executor: Running task 2.0 in stage 2.0 (TID 17) 
   23/04/13 15:48:51 INFO Executor: Finished task 2.0 in stage 2.0 (TID 17). 912 bytes result sent to driver 
   23/04/13 15:48:51 INFO TaskSetManager: Starting task 3.0 in stage 2.0 (TID 18) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 3, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:51 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 17) in 168 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (3/13) 
   23/04/13 15:48:51 INFO Executor: Running task 3.0 in stage 2.0 (TID 18) 
   23/04/13 15:48:51 INFO Executor: Finished task 3.0 in stage 2.0 (TID 18). 912 bytes result sent to driver 
   23/04/13 15:48:51 INFO TaskSetManager: Starting task 4.0 in stage 2.0 (TID 19) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 4, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:51 INFO TaskSetManager: Finished task 3.0 in stage 2.0 (TID 18) in 168 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (4/13) 
   23/04/13 15:48:51 INFO Executor: Running task 4.0 in stage 2.0 (TID 19) 
   23/04/13 15:48:51 INFO Executor: Finished task 4.0 in stage 2.0 (TID 19). 912 bytes result sent to driver 
   23/04/13 15:48:51 INFO TaskSetManager: Starting task 5.0 in stage 2.0 (TID 20) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 5, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:51 INFO TaskSetManager: Finished task 4.0 in stage 2.0 (TID 19) in 174 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (5/13) 
   23/04/13 15:48:51 INFO Executor: Running task 5.0 in stage 2.0 (TID 20) 
   23/04/13 15:48:51 INFO Executor: Finished task 5.0 in stage 2.0 (TID 20). 912 bytes result sent to driver 
   23/04/13 15:48:51 INFO TaskSetManager: Starting task 6.0 in stage 2.0 (TID 21) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 6, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:51 INFO TaskSetManager: Finished task 5.0 in stage 2.0 (TID 20) in 154 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (6/13) 
   23/04/13 15:48:51 INFO Executor: Running task 6.0 in stage 2.0 (TID 21) 
   23/04/13 15:48:52 INFO Executor: Finished task 6.0 in stage 2.0 (TID 21). 912 bytes result sent to driver 
   23/04/13 15:48:52 INFO TaskSetManager: Starting task 7.0 in stage 2.0 (TID 22) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 7, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:52 INFO TaskSetManager: Finished task 6.0 in stage 2.0 (TID 21) in 166 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (7/13) 
   23/04/13 15:48:52 INFO Executor: Running task 7.0 in stage 2.0 (TID 22) 
   23/04/13 15:48:52 INFO Executor: Finished task 7.0 in stage 2.0 (TID 22). 912 bytes result sent to driver 
   23/04/13 15:48:52 INFO TaskSetManager: Starting task 8.0 in stage 2.0 (TID 23) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 8, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:52 INFO TaskSetManager: Finished task 7.0 in stage 2.0 (TID 22) in 150 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (8/13) 
   23/04/13 15:48:52 INFO Executor: Running task 8.0 in stage 2.0 (TID 23) 
   23/04/13 15:48:52 INFO Executor: Finished task 8.0 in stage 2.0 (TID 23). 912 bytes result sent to driver 
   23/04/13 15:48:52 INFO TaskSetManager: Starting task 9.0 in stage 2.0 (TID 24) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 9, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:52 INFO Executor: Running task 9.0 in stage 2.0 (TID 24) 
   23/04/13 15:48:52 INFO TaskSetManager: Finished task 8.0 in stage 2.0 (TID 23) in 185 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (9/13) 
   23/04/13 15:48:52 INFO Executor: Finished task 9.0 in stage 2.0 (TID 24). 955 bytes result sent to driver 
   23/04/13 15:48:52 INFO TaskSetManager: Starting task 10.0 in stage 2.0 (TID 25) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition10, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:52 INFO TaskSetManager: Finished task 9.0 in stage 2.0 (TID 24) in 217 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (10/13) 
   23/04/13 15:48:52 INFO Executor: Running task 10.0 in stage 2.0 (TID 25) 
   23/04/13 15:48:52 INFO Executor: Finished task 10.0 in stage 2.0 (TID 25). 912 bytes result sent to driver 
   23/04/13 15:48:52 INFO TaskSetManager: Starting task 11.0 in stage 2.0 (TID 26) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition11, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:52 INFO TaskSetManager: Finished task 10.0 in stage 2.0 (TID 25) in 166 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (11/13) 
   23/04/13 15:48:52 INFO Executor: Running task 11.0 in stage 2.0 (TID 26) 
   23/04/13 15:48:52 INFO Executor: Finished task 11.0 in stage 2.0 (TID 26). 912 bytes result sent to driver 
   23/04/13 15:48:52 INFO TaskSetManager: Starting task 12.0 in stage 2.0 (TID 27) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition12, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 
   23/04/13 15:48:52 INFO Executor: Running task 12.0 in stage 2.0 (TID 27) 
   23/04/13 15:48:52 INFO TaskSetManager: Finished task 11.0 in stage 2.0 (TID 26) in 152 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (12/13) 
   23/04/13 15:48:53 INFO Executor: Finished task 12.0 in stage 2.0 (TID 27). 912 bytes result sent to driver 
   23/04/13 15:48:53 INFO TaskSetManager: Finished task 12.0 in stage 2.0 (TID 27) in 168 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (13/13) 
   23/04/13 15:48:53 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
   23/04/13 15:48:53 INFO DAGScheduler: ResultStage 2 (collect at HoodieSparkEngineContext.java:103) finished in 8.826 s 
   23/04/13 15:48:53 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job 
   23/04/13 15:48:53 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished 
   23/04/13 15:48:53 INFO DAGScheduler: Job 2 finished: collect at HoodieSparkEngineContext.java:103, took 8.834619 s 
   23/04/13 15:48:53 INFO SparkUI: Stopped Spark web UI at http://ip-10-108-166-149.eu-central-1.compute.internal:8090 
   23/04/13 15:48:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 
   23/04/13 15:48:53 INFO MemoryStore: MemoryStore cleared 
   23/04/13 15:48:53 INFO BlockManager: BlockManager stopped 
   23/04/13 15:48:53 INFO BlockManagerMaster: BlockManagerMaster stopped 
   23/04/13 15:48:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 
   23/04/13 15:48:53 INFO SparkContext: Successfully stopped SparkContext


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexone95 commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

Posted by "alexone95 (via GitHub)" <gi...@apache.org>.
alexone95 commented on issue #8436:
URL: https://github.com/apache/hudi/issues/8436#issuecomment-1524958582

   hi, i solved by working on a labda to delete file in archive rather than the .commit file thanks to the hoodie.keep.min.commits and hoodie.keep.max.commits thank


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexone95 commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

Posted by "alexone95 (via GitHub)" <gi...@apache.org>.
alexone95 commented on issue #8436:
URL: https://github.com/apache/hudi/issues/8436#issuecomment-1505209547

   i tried with the hardcoded path
   
   spark-submit --class "org.apache.hudi.utilities.HoodieCleaner /usr/lib/hudi/hudi-utilities-bundle.jar --target-base-path s3://edi-dp-qa-datalake/DATA_PLATFORM/ods/ods_d_crm_crmd_customer_i_prod_r/hudiTable/.hoodie --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS --hoodie-conf hoodie.cleaner.commits.retained=10 --hoodie-conf hoodie.cleaner.parallelism=200"
   
   but got the same error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8436:
URL: https://github.com/apache/hudi/issues/8436#issuecomment-1510378693

   @alexone95 Just checking if the above command works for you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8436:
URL: https://github.com/apache/hudi/issues/8436#issuecomment-1507866544

   I see issue related to quotes in the spark-submit command. Try this - 
   
   spark-submit --class org.apache.hudi.utilities.HoodieCleaner /usr/lib/hudi/hudi-utilities-bundle.jar --target-base-path s3://edi-dp-qa-datalake/DATA_PLATFORM/ods/ods_d_crm_crmd_customer_i_prod_r/hudiTable/.hoodie --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS --hoodie-conf hoodie.cleaner.commits.retained=10 --hoodie-conf hoodie.cleaner.parallelism=200
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8436:
URL: https://github.com/apache/hudi/issues/8436#issuecomment-1505164660

   Looks like it's not able to parse the jar path. Can you try with hardcoded path instead of using `ls packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8436:
URL: https://github.com/apache/hudi/issues/8436#issuecomment-1522874205

   @alexone95 Can you provide the timeline before and after the issue to look into it more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexone95 closed issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

Posted by "alexone95 (via GitHub)" <gi...@apache.org>.
alexone95 closed issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9
URL: https://github.com/apache/hudi/issues/8436


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org