You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/19 04:57:53 UTC

[GitHub] [hudi] VIKASPATID opened a new issue #4635: [SUPPORT] Bulk writing failing due to hudi timeline archive exception

VIKASPATID opened a new issue #4635:
URL: https://github.com/apache/hudi/issues/4635


   Seeing repetitive error when bulk writing to cow table, error message not very clear. Please note that we were able to write bunch of files to hudi successfully and started getting this error.
   
   **Configuration**
   'className' : 'org.apache.hudi'
   'hoodie.write.concurrency.mode':'optimistic_concurrency_control'
   'hoodie.cleaner.policy.failed.writes':'LAZY'
   'hoodie.write.lock.zookeeper.lock_key': f"{table_name}",
   'hoodie.datasource.write.row.writer.enable': 'false',
   'hoodie.table.name': table_name,
   'hoodie.datasource.write.table.type': 'COPY_ON_WRITE',
   'hoodie.datasource.write.recordkey.field': 'TICKER,ORDER_NUM',
   'hoodie.datasource.write.partitionpath.field': 'ISO,DATE',
   'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator',
   'hoodie.datasource.write.precombine.field': "DATE",
   'hoodie.datasource.hive_sync.use_jdbc': 'false',
   'hoodie.datasource.hive_sync.enable': 'false',
   'hoodie.compaction.payload.class': 'org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload',
   'hoodie.datasource.hive_sync.table': f"{table_name}",
   'hoodie.datasource.hive_sync.partition_fields': 'ISO,DATE',
   'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
   'hoodie.copyonwrite.record.size.estimate': 256,
   'hoodie.write.lock.client.wait_time_ms': 1000,
   'hoodie.write.lock.client.num_retries': 50
   'hoodie.parquet.max.file.size': 1024*1024*1024,
   'hoodie.bulkinsert.shuffle.parallelism': 10,
   'compactionSmallFileSize': 100*1024*1024,
   'hoodie.datasource.write.operation': 'bulk_insert'
   
   **Environment Description**
   
   * Running on EMR 6.5.0
   
   * Hudi version : 0.9
   
   * Spark version : 3.1.2
   
   * ZooKeeper version : 3.5.7
   
   * Hive version :3.1.2
   
   * Hadoop version : 
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```java.lang.NullPointerException
           at org.apache.hudi.table.HoodieTimelineArchiveLog.lambda$getInstantsToArchive$8(HoodieTimelineArchiveLog.java:225)
           at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
           at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
           at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
           at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
           at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
           at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
           at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
           at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
           at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
           at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
           at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
           at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
           at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
           at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
           at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
           at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
           at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
           at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:122)
           at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:439)
           at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:191)
           at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
           at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:617)
           at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:274)
           at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:169)
           at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
           at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
           at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
           at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
           at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
           at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
           at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
           at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
           at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
           at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1064735715


   Hi @nsivabalan, is there any update on this ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1030879752


   @yihua : Can you look into this issue of read timeout w/ timeline server based markers. guess you were investigating some other issue on similar lines. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1017014348


   Is there any more logs/stacktrace. I could not find the actual issue from the above stacktrace. 
   Also, I see you have multi-writer enabled. Can you disable them and see if we can get past the issue by triggering writes only from a single writer. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] rkkalluri commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
rkkalluri commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1073440851


   I am able to reproduce this locally on 0.11.0-SNAPSHOT
   
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20220320215909174__commit__INFLIGHT]}
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating remote first table view
   22/03/20 21:59:22 INFO TransactionUtils: Successfully resolved conflicts, if any
   22/03/20 21:59:22 INFO BaseHoodieWriteClient: Committing 20220320215909174 action commit
   22/03/20 21:59:22 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:134
   22/03/20 21:59:22 INFO DAGScheduler: Got job 680 (collect at HoodieSparkEngineContext.java:134) with 1 output partitions
   22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 984 (collect at HoodieSparkEngineContext.java:134)
   22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage: List()
   22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List()
   22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 984 (MapPartitionsRDD[2117] at flatMap at HoodieSparkEngineContext.java:134), which has no missing parents
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_848 stored as values in memory (estimated size 99.5 KiB, free 357.5 MiB)
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_848_piece0 stored as bytes in memory (estimated size 35.1 KiB, free 357.5 MiB)
   22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_848_piece0 in memory on rkalluri.attlocal.net:63252 (size: 35.1 KiB, free: 364.0 MiB)
   22/03/20 21:59:22 INFO SparkContext: Created broadcast 848 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 984 (MapPartitionsRDD[2117] at flatMap at HoodieSparkEngineContext.java:134) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 984.0 with 1 tasks resource profile 0
   22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 984.0 (TID 2266) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4387 bytes) taskResourceAssignments Map()
   22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 984.0 (TID 2266)
   22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 984.0 (TID 2266). 888 bytes result sent to driver
   22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 984.0 (TID 2266) in 22 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 984.0, whose tasks have all completed, from pool
   22/03/20 21:59:22 INFO DAGScheduler: ResultStage 984 (collect at HoodieSparkEngineContext.java:134) finished in 0.041 s
   22/03/20 21:59:22 INFO DAGScheduler: Job 680 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Killing all running tasks in stage 984: Stage finished
   22/03/20 21:59:22 INFO DAGScheduler: Job 680 finished: collect at HoodieSparkEngineContext.java:134, took 0.042314 s
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215908164__deltacommit__COMPLETED]}
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:22 INFO HoodieTableMetadataUtil: Loading latest file slices for metadata table partition files
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=14, NumFileGroups=1, FileGroupsCreationTime=1, StoreTimeTaken=0
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215908164__deltacommit__COMPLETED]}
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20220320215909174__commit__INFLIGHT]}
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableMetadataUtil: Updating at 20220320215909174 from Commit/BULK_INSERT. #partitions_updated=2
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215908164__deltacommit__COMPLETED]}
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:22 INFO HoodieTableMetadataUtil: Loading latest file slices for metadata table partition files
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=14, NumFileGroups=1, FileGroupsCreationTime=2, StoreTimeTaken=0
   22/03/20 21:59:22 INFO BaseHoodieClient: Embedded Timeline Server is disabled. Not starting timeline service
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215908164__deltacommit__COMPLETED]}
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating in-memory based Table View
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215908164__deltacommit__COMPLETED]}
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20220320215909174__commit__INFLIGHT]}
   22/03/20 21:59:22 INFO BaseHoodieWriteClient: Scheduling table service COMPACT
   22/03/20 21:59:22 INFO BaseHoodieWriteClient: Scheduling compaction at instant time :20220320215908164001
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215908164__deltacommit__COMPLETED]}
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating in-memory based Table View
   22/03/20 21:59:22 INFO ScheduleCompactionActionExecutor: Checking if compaction needs to be run on file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215908164__deltacommit__COMPLETED]}
   22/03/20 21:59:22 INFO BaseHoodieWriteClient: Generate a new instant time: 20220320215909174 action: deltacommit
   22/03/20 21:59:22 INFO HoodieHeartbeatClient: Received request to start heartbeat for instant time 20220320215909174
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Creating a new instant [==>20220320215909174__deltacommit__REQUESTED]
   22/03/20 21:59:22 INFO BlockManagerInfo: Removed broadcast_848_piece0 on rkalluri.attlocal.net:63252 in memory (size: 35.1 KiB, free: 364.0 MiB)
   22/03/20 21:59:22 INFO BlockManagerInfo: Removed broadcast_847_piece0 on rkalluri.attlocal.net:63252 in memory (size: 196.6 KiB, free: 364.2 MiB)
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20220320215909174__deltacommit__REQUESTED]}
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating in-memory based Table View
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating InMemory based view for basePath file:/tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20220320215909174__deltacommit__REQUESTED]}
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:22 INFO AsyncCleanerService: The HoodieWriteClient is not configured to auto & async clean. Async clean service will not start.
   22/03/20 21:59:22 INFO AsyncArchiveService: The HoodieWriteClient is not configured to auto & async archive. Async archive service will not start.
   22/03/20 21:59:22 INFO SparkContext: Starting job: countByKey at BaseSparkCommitActionExecutor.java:190
   22/03/20 21:59:22 INFO DAGScheduler: Registering RDD 2123 (countByKey at BaseSparkCommitActionExecutor.java:190) as input to shuffle 182
   22/03/20 21:59:22 INFO DAGScheduler: Got job 681 (countByKey at BaseSparkCommitActionExecutor.java:190) with 1 output partitions
   22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 986 (countByKey at BaseSparkCommitActionExecutor.java:190)
   22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 985)
   22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 985)
   22/03/20 21:59:22 INFO DAGScheduler: Submitting ShuffleMapStage 985 (MapPartitionsRDD[2123] at countByKey at BaseSparkCommitActionExecutor.java:190), which has no missing parents
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_849 stored as values in memory (estimated size 9.5 KiB, free 358.4 MiB)
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_849_piece0 stored as bytes in memory (estimated size 5.2 KiB, free 358.3 MiB)
   22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_849_piece0 in memory on rkalluri.attlocal.net:63252 (size: 5.2 KiB, free: 364.2 MiB)
   22/03/20 21:59:22 INFO SparkContext: Created broadcast 849 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 985 (MapPartitionsRDD[2123] at countByKey at BaseSparkCommitActionExecutor.java:190) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 985.0 with 1 tasks resource profile 0
   22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 985.0 (TID 2267) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4800 bytes) taskResourceAssignments Map()
   22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 985.0 (TID 2267)
   22/03/20 21:59:22 INFO MemoryStore: Block rdd_2121_0 stored as values in memory (estimated size 367.0 B, free 358.3 MiB)
   22/03/20 21:59:22 INFO BlockManagerInfo: Added rdd_2121_0 in memory on rkalluri.attlocal.net:63252 (size: 367.0 B, free: 364.2 MiB)
   22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 985.0 (TID 2267). 1052 bytes result sent to driver
   22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 985.0 (TID 2267) in 5 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 985.0, whose tasks have all completed, from pool
   22/03/20 21:59:22 INFO DAGScheduler: ShuffleMapStage 985 (countByKey at BaseSparkCommitActionExecutor.java:190) finished in 0.008 s
   22/03/20 21:59:22 INFO DAGScheduler: looking for newly runnable stages
   22/03/20 21:59:22 INFO DAGScheduler: running: Set()
   22/03/20 21:59:22 INFO DAGScheduler: waiting: Set(ResultStage 986)
   22/03/20 21:59:22 INFO DAGScheduler: failed: Set()
   22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 986 (ShuffledRDD[2124] at countByKey at BaseSparkCommitActionExecutor.java:190), which has no missing parents
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_850 stored as values in memory (estimated size 5.5 KiB, free 358.3 MiB)
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_850_piece0 stored as bytes in memory (estimated size 3.1 KiB, free 358.3 MiB)
   22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_850_piece0 in memory on rkalluri.attlocal.net:63252 (size: 3.1 KiB, free: 364.2 MiB)
   22/03/20 21:59:22 INFO SparkContext: Created broadcast 850 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 986 (ShuffledRDD[2124] at countByKey at BaseSparkCommitActionExecutor.java:190) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 986.0 with 1 tasks resource profile 0
   22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 986.0 (TID 2268) (rkalluri.attlocal.net, executor driver, partition 0, NODE_LOCAL, 4271 bytes) taskResourceAssignments Map()
   22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 986.0 (TID 2268)
   22/03/20 21:59:22 INFO ShuffleBlockFetcherIterator: Getting 1 (156.0 B) non-empty blocks including 1 (156.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
   22/03/20 21:59:22 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
   22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 986.0 (TID 2268). 1318 bytes result sent to driver
   22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 986.0 (TID 2268) in 4 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 986.0, whose tasks have all completed, from pool
   22/03/20 21:59:22 INFO DAGScheduler: ResultStage 986 (countByKey at BaseSparkCommitActionExecutor.java:190) finished in 0.005 s
   22/03/20 21:59:22 INFO DAGScheduler: Job 681 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Killing all running tasks in stage 986: Stage finished
   22/03/20 21:59:22 INFO DAGScheduler: Job 681 finished: countByKey at BaseSparkCommitActionExecutor.java:190, took 0.014181 s
   22/03/20 21:59:22 INFO BaseSparkCommitActionExecutor: Input workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=0, numUpdates=2}, InputPartitionStat={files=WorkloadStat {numInserts=0, numUpdates=2}}, OutputPartitionStat={}, operationType=UPSERT_PREPPED}
   22/03/20 21:59:22 INFO UpsertPartitioner: AvgRecordSize => 1024
   22/03/20 21:59:22 INFO SparkContext: Starting job: collectAsMap at UpsertPartitioner.java:272
   22/03/20 21:59:22 INFO DAGScheduler: Got job 682 (collectAsMap at UpsertPartitioner.java:272) with 1 output partitions
   22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 987 (collectAsMap at UpsertPartitioner.java:272)
   22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage: List()
   22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List()
   22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 987 (MapPartitionsRDD[2126] at mapToPair at UpsertPartitioner.java:271), which has no missing parents
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_851 stored as values in memory (estimated size 328.6 KiB, free 358.0 MiB)
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_851_piece0 stored as bytes in memory (estimated size 116.9 KiB, free 357.9 MiB)
   22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_851_piece0 in memory on rkalluri.attlocal.net:63252 (size: 116.9 KiB, free: 364.1 MiB)
   22/03/20 21:59:22 INFO SparkContext: Created broadcast 851 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 987 (MapPartitionsRDD[2126] at mapToPair at UpsertPartitioner.java:271) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 987.0 with 1 tasks resource profile 0
   22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 987.0 (TID 2269) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4337 bytes) taskResourceAssignments Map()
   22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 987.0 (TID 2269)
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating in-memory based Table View
   22/03/20 21:59:22 INFO FileSystemViewManager: Creating InMemory based view for basePath file:/tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=14, NumFileGroups=1, FileGroupsCreationTime=2, StoreTimeTaken=0
   22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 987.0 (TID 2269). 829 bytes result sent to driver
   22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 987.0 (TID 2269) in 19 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 987.0, whose tasks have all completed, from pool
   22/03/20 21:59:22 INFO DAGScheduler: ResultStage 987 (collectAsMap at UpsertPartitioner.java:272) finished in 0.074 s
   22/03/20 21:59:22 INFO DAGScheduler: Job 682 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Killing all running tasks in stage 987: Stage finished
   22/03/20 21:59:22 INFO DAGScheduler: Job 682 finished: collectAsMap at UpsertPartitioner.java:272, took 0.074789 s
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:22 INFO UpsertPartitioner: Total Buckets :1, buckets info => {0=BucketInfo {bucketType=UPDATE, fileIdPrefix=files-0000, partitionPath=files}},
   Partition to insert buckets => {},
   UpdateLocations mapped to buckets =>{files-0000=0}
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Checking for file exists ?file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit.requested
   22/03/20 21:59:22 INFO FileIOUtils: Created a new file in meta path: file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit.inflight
   22/03/20 21:59:22 INFO HoodieActiveTimeline: Create new file for toInstant ?file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit.inflight
   22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:22 INFO SparkContext: Starting job: collect at BaseSparkUpdateStrategy.java:51
   22/03/20 21:59:22 INFO DAGScheduler: Registering RDD 2129 (distinct at BaseSparkUpdateStrategy.java:51) as input to shuffle 183
   22/03/20 21:59:22 INFO DAGScheduler: Got job 683 (collect at BaseSparkUpdateStrategy.java:51) with 1 output partitions
   22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 989 (collect at BaseSparkUpdateStrategy.java:51)
   22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 988)
   22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 988)
   22/03/20 21:59:22 INFO DAGScheduler: Submitting ShuffleMapStage 988 (MapPartitionsRDD[2129] at distinct at BaseSparkUpdateStrategy.java:51), which has no missing parents
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_852 stored as values in memory (estimated size 9.5 KiB, free 357.9 MiB)
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_852_piece0 stored as bytes in memory (estimated size 5.1 KiB, free 357.9 MiB)
   22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_852_piece0 in memory on rkalluri.attlocal.net:63252 (size: 5.1 KiB, free: 364.1 MiB)
   22/03/20 21:59:22 INFO SparkContext: Created broadcast 852 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 988 (MapPartitionsRDD[2129] at distinct at BaseSparkUpdateStrategy.java:51) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 988.0 with 1 tasks resource profile 0
   22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 988.0 (TID 2270) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4800 bytes) taskResourceAssignments Map()
   22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 988.0 (TID 2270)
   22/03/20 21:59:22 INFO BlockManager: Found block rdd_2121_0 locally
   22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 988.0 (TID 2270). 1138 bytes result sent to driver
   22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 988.0 (TID 2270) in 5 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 988.0, whose tasks have all completed, from pool
   22/03/20 21:59:22 INFO DAGScheduler: ShuffleMapStage 988 (distinct at BaseSparkUpdateStrategy.java:51) finished in 0.007 s
   22/03/20 21:59:22 INFO DAGScheduler: looking for newly runnable stages
   22/03/20 21:59:22 INFO DAGScheduler: running: Set()
   22/03/20 21:59:22 INFO DAGScheduler: waiting: Set(ResultStage 989)
   22/03/20 21:59:22 INFO DAGScheduler: failed: Set()
   22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 989 (MapPartitionsRDD[2131] at distinct at BaseSparkUpdateStrategy.java:51), which has no missing parents
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_853 stored as values in memory (estimated size 6.3 KiB, free 357.9 MiB)
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_853_piece0 stored as bytes in memory (estimated size 3.4 KiB, free 357.9 MiB)
   22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_853_piece0 in memory on rkalluri.attlocal.net:63252 (size: 3.4 KiB, free: 364.1 MiB)
   22/03/20 21:59:22 INFO SparkContext: Created broadcast 853 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 989 (MapPartitionsRDD[2131] at distinct at BaseSparkUpdateStrategy.java:51) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 989.0 with 1 tasks resource profile 0
   22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 989.0 (TID 2271) (rkalluri.attlocal.net, executor driver, partition 0, NODE_LOCAL, 4271 bytes) taskResourceAssignments Map()
   22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 989.0 (TID 2271)
   22/03/20 21:59:22 INFO ShuffleBlockFetcherIterator: Getting 1 (117.0 B) non-empty blocks including 1 (117.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
   22/03/20 21:59:22 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
   22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 989.0 (TID 2271). 1249 bytes result sent to driver
   22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 989.0 (TID 2271) in 4 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 989.0, whose tasks have all completed, from pool
   22/03/20 21:59:22 INFO DAGScheduler: ResultStage 989 (collect at BaseSparkUpdateStrategy.java:51) finished in 0.005 s
   22/03/20 21:59:22 INFO DAGScheduler: Job 683 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Killing all running tasks in stage 989: Stage finished
   22/03/20 21:59:22 INFO DAGScheduler: Job 683 finished: collect at BaseSparkUpdateStrategy.java:51, took 0.012885 s
   22/03/20 21:59:22 INFO BaseSparkCommitActionExecutor: no validators configured.
   22/03/20 21:59:22 INFO BaseCommitActionExecutor: Auto commit enabled: Committing 20220320215909174
   22/03/20 21:59:22 INFO SparkContext: Starting job: collect at BaseSparkCommitActionExecutor.java:275
   22/03/20 21:59:22 INFO DAGScheduler: Registering RDD 2132 (mapToPair at BaseSparkCommitActionExecutor.java:227) as input to shuffle 184
   22/03/20 21:59:22 INFO DAGScheduler: Got job 684 (collect at BaseSparkCommitActionExecutor.java:275) with 1 output partitions
   22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 991 (collect at BaseSparkCommitActionExecutor.java:275)
   22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 990)
   22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 990)
   22/03/20 21:59:22 INFO DAGScheduler: Submitting ShuffleMapStage 990 (MapPartitionsRDD[2132] at mapToPair at BaseSparkCommitActionExecutor.java:227), which has no missing parents
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_854 stored as values in memory (estimated size 332.8 KiB, free 357.6 MiB)
   22/03/20 21:59:22 INFO MemoryStore: Block broadcast_854_piece0 stored as bytes in memory (estimated size 119.4 KiB, free 357.4 MiB)
   22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_854_piece0 in memory on rkalluri.attlocal.net:63252 (size: 119.4 KiB, free: 364.0 MiB)
   22/03/20 21:59:22 INFO SparkContext: Created broadcast 854 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 990 (MapPartitionsRDD[2132] at mapToPair at BaseSparkCommitActionExecutor.java:227) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 990.0 with 1 tasks resource profile 0
   22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 990.0 (TID 2272) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4800 bytes) taskResourceAssignments Map()
   22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 990.0 (TID 2272)
   22/03/20 21:59:22 INFO BlockManager: Found block rdd_2121_0 locally
   22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 990.0 (TID 2272). 1052 bytes result sent to driver
   22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 990.0 (TID 2272) in 19 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 990.0, whose tasks have all completed, from pool
   22/03/20 21:59:22 INFO DAGScheduler: ShuffleMapStage 990 (mapToPair at BaseSparkCommitActionExecutor.java:227) finished in 0.073 s
   22/03/20 21:59:22 INFO DAGScheduler: looking for newly runnable stages
   22/03/20 21:59:22 INFO DAGScheduler: running: Set()
   22/03/20 21:59:22 INFO DAGScheduler: waiting: Set(ResultStage 991)
   22/03/20 21:59:22 INFO DAGScheduler: failed: Set()
   22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 991 (MapPartitionsRDD[2137] at map at BaseSparkCommitActionExecutor.java:275), which has no missing parents
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_855 stored as values in memory (estimated size 435.7 KiB, free 357.0 MiB)
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_855_piece0 stored as bytes in memory (estimated size 156.4 KiB, free 356.9 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_855_piece0 in memory on rkalluri.attlocal.net:63252 (size: 156.4 KiB, free: 363.8 MiB)
   22/03/20 21:59:23 INFO SparkContext: Created broadcast 855 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 991 (MapPartitionsRDD[2137] at map at BaseSparkCommitActionExecutor.java:275) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 991.0 with 1 tasks resource profile 0
   22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 991.0 (TID 2273) (rkalluri.attlocal.net, executor driver, partition 0, NODE_LOCAL, 4271 bytes) taskResourceAssignments Map()
   22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 991.0 (TID 2273)
   22/03/20 21:59:23 INFO ShuffleBlockFetcherIterator: Getting 1 (445.0 B) non-empty blocks including 1 (445.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
   22/03/20 21:59:23 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
   22/03/20 21:59:23 INFO BaseSparkDeltaCommitActionExecutor: Merging updates for commit 20220320215909174 for file files-0000
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating in-memory based Table View
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating InMemory based view for basePath file:/tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:23 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:23 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=14, NumFileGroups=1, FileGroupsCreationTime=1, StoreTimeTaken=0
   22/03/20 21:59:23 INFO DirectWriteMarkers: Creating Marker Path=file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/.temp/20220320215909174/files/files-0000_0-991-2273_20220320215907162001.hfile.marker.APPEND
   22/03/20 21:59:23 INFO DirectWriteMarkers: [direct] Created marker file file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/.temp/20220320215909174/files/files-0000_0-991-2273_20220320215907162001.hfile.marker.APPEND in 23 ms
   22/03/20 21:59:23 INFO HoodieLogFormat$WriterBuilder: Building HoodieLogFormat Writer
   22/03/20 21:59:23 INFO HoodieLogFormat$WriterBuilder: HoodieLogFile on path file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255
   22/03/20 21:59:23 INFO HoodieLogFormatWriter: Append not supported.. Rolling over to HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273', fileLen=-1}
   22/03/20 21:59:23 INFO CacheConfig: Created cacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432, maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95, multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
   22/03/20 21:59:23 INFO CodecPool: Got brand-new compressor [.gz]
   22/03/20 21:59:23 INFO CodecPool: Got brand-new compressor [.gz]
   22/03/20 21:59:23 INFO HoodieAppendHandle: AppendHandle for partitionPath files filePath files/.files-0000_20220320215907162001.log.2_0-991-2273, took 47 ms.
   22/03/20 21:59:23 INFO MemoryStore: Block rdd_2136_0 stored as values in memory (estimated size 339.0 B, free 356.9 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Added rdd_2136_0 in memory on rkalluri.attlocal.net:63252 (size: 339.0 B, free: 363.8 MiB)
   22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 991.0 (TID 2273). 1635 bytes result sent to driver
   22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 991.0 (TID 2273) in 72 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 991.0, whose tasks have all completed, from pool
   22/03/20 21:59:23 INFO DAGScheduler: ResultStage 991 (collect at BaseSparkCommitActionExecutor.java:275) finished in 0.145 s
   22/03/20 21:59:23 INFO DAGScheduler: Job 684 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage 991: Stage finished
   22/03/20 21:59:23 INFO DAGScheduler: Job 684 finished: collect at BaseSparkCommitActionExecutor.java:275, took 0.219633 s
   22/03/20 21:59:23 INFO CommitUtils: Creating  metadata for UPSERT_PREPPED numWriteStats:1numReplaceFileIds:0
   22/03/20 21:59:23 INFO SparkContext: Starting job: collect at BaseSparkCommitActionExecutor.java:283
   22/03/20 21:59:23 INFO DAGScheduler: Got job 685 (collect at BaseSparkCommitActionExecutor.java:283) with 1 output partitions
   22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 993 (collect at BaseSparkCommitActionExecutor.java:283)
   22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 992)
   22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
   22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 993 (MapPartitionsRDD[2138] at map at BaseSparkCommitActionExecutor.java:283), which has no missing parents
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_856 stored as values in memory (estimated size 435.7 KiB, free 356.4 MiB)
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_856_piece0 stored as bytes in memory (estimated size 156.4 KiB, free 356.3 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_856_piece0 in memory on rkalluri.attlocal.net:63252 (size: 156.4 KiB, free: 363.7 MiB)
   22/03/20 21:59:23 INFO SparkContext: Created broadcast 856 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 993 (MapPartitionsRDD[2138] at map at BaseSparkCommitActionExecutor.java:283) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 993.0 with 1 tasks resource profile 0
   22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 993.0 (TID 2274) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4271 bytes) taskResourceAssignments Map()
   22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 993.0 (TID 2274)
   22/03/20 21:59:23 INFO BlockManager: Found block rdd_2136_0 locally
   22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 993.0 (TID 2274). 1248 bytes result sent to driver
   22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 993.0 (TID 2274) in 20 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 993.0, whose tasks have all completed, from pool
   22/03/20 21:59:23 INFO DAGScheduler: ResultStage 993 (collect at BaseSparkCommitActionExecutor.java:283) finished in 0.092 s
   22/03/20 21:59:23 INFO DAGScheduler: Job 685 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage 993: Stage finished
   22/03/20 21:59:23 INFO DAGScheduler: Job 685 finished: collect at BaseSparkCommitActionExecutor.java:283, took 0.093105 s
   22/03/20 21:59:23 INFO BaseSparkCommitActionExecutor: Committing 20220320215909174, action Type deltacommit, operation Type UPSERT_PREPPED
   22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_851_piece0 on rkalluri.attlocal.net:63252 in memory (size: 116.9 KiB, free: 363.8 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_856_piece0 on rkalluri.attlocal.net:63252 in memory (size: 156.4 KiB, free: 363.9 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_849_piece0 on rkalluri.attlocal.net:63252 in memory (size: 5.2 KiB, free: 363.9 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_854_piece0 on rkalluri.attlocal.net:63252 in memory (size: 119.4 KiB, free: 364.1 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_853_piece0 on rkalluri.attlocal.net:63252 in memory (size: 3.4 KiB, free: 364.1 MiB)
   22/03/20 21:59:23 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:134
   22/03/20 21:59:23 INFO DAGScheduler: Got job 686 (collect at HoodieSparkEngineContext.java:134) with 1 output partitions
   22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 994 (collect at HoodieSparkEngineContext.java:134)
   22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage: List()
   22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
   22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_855_piece0 on rkalluri.attlocal.net:63252 in memory (size: 156.4 KiB, free: 364.2 MiB)
   22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 994 (MapPartitionsRDD[2140] at flatMap at HoodieSparkEngineContext.java:134), which has no missing parents
   22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_852_piece0 on rkalluri.attlocal.net:63252 in memory (size: 5.1 KiB, free: 364.2 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_850_piece0 on rkalluri.attlocal.net:63252 in memory (size: 3.1 KiB, free: 364.2 MiB)
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_857 stored as values in memory (estimated size 99.5 KiB, free 358.3 MiB)
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_857_piece0 stored as bytes in memory (estimated size 35.1 KiB, free 358.2 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_857_piece0 in memory on rkalluri.attlocal.net:63252 (size: 35.1 KiB, free: 364.2 MiB)
   22/03/20 21:59:23 INFO SparkContext: Created broadcast 857 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 994 (MapPartitionsRDD[2140] at flatMap at HoodieSparkEngineContext.java:134) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 994.0 with 1 tasks resource profile 0
   22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 994.0 (TID 2275) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4408 bytes) taskResourceAssignments Map()
   22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 994.0 (TID 2275)
   22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 994.0 (TID 2275). 796 bytes result sent to driver
   22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 994.0 (TID 2275) in 15 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 994.0, whose tasks have all completed, from pool
   22/03/20 21:59:23 INFO DAGScheduler: ResultStage 994 (collect at HoodieSparkEngineContext.java:134) finished in 0.036 s
   22/03/20 21:59:23 INFO DAGScheduler: Job 686 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage 994: Stage finished
   22/03/20 21:59:23 INFO DAGScheduler: Job 686 finished: collect at HoodieSparkEngineContext.java:134, took 0.035978 s
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Marking instant complete [==>20220320215909174__deltacommit__INFLIGHT]
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Checking for file exists ?file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit.inflight
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Create new file for toInstant ?file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Completed [==>20220320215909174__deltacommit__INFLIGHT]
   22/03/20 21:59:23 INFO BaseSparkCommitActionExecutor: Committed 20220320215909174
   22/03/20 21:59:23 INFO SparkContext: Starting job: collectAsMap at HoodieSparkEngineContext.java:148
   22/03/20 21:59:23 INFO DAGScheduler: Got job 687 (collectAsMap at HoodieSparkEngineContext.java:148) with 1 output partitions
   22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 995 (collectAsMap at HoodieSparkEngineContext.java:148)
   22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage: List()
   22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
   22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 995 (MapPartitionsRDD[2142] at mapToPair at HoodieSparkEngineContext.java:145), which has no missing parents
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_858 stored as values in memory (estimated size 99.7 KiB, free 358.1 MiB)
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_858_piece0 stored as bytes in memory (estimated size 35.2 KiB, free 358.1 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_858_piece0 in memory on rkalluri.attlocal.net:63252 (size: 35.2 KiB, free: 364.1 MiB)
   22/03/20 21:59:23 INFO SparkContext: Created broadcast 858 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 995 (MapPartitionsRDD[2142] at mapToPair at HoodieSparkEngineContext.java:145) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 995.0 with 1 tasks resource profile 0
   22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 995.0 (TID 2276) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4408 bytes) taskResourceAssignments Map()
   22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 995.0 (TID 2276)
   22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 995.0 (TID 2276). 836 bytes result sent to driver
   22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 995.0 (TID 2276) in 6 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 995.0, whose tasks have all completed, from pool
   22/03/20 21:59:23 INFO DAGScheduler: ResultStage 995 (collectAsMap at HoodieSparkEngineContext.java:148) finished in 0.025 s
   22/03/20 21:59:23 INFO DAGScheduler: Job 687 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage 995: Stage finished
   22/03/20 21:59:23 INFO DAGScheduler: Job 687 finished: collectAsMap at HoodieSparkEngineContext.java:148, took 0.026164 s
   22/03/20 21:59:23 INFO FSUtils: Removed directory at file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/.temp/20220320215909174
   22/03/20 21:59:23 INFO HoodieHeartbeatClient: Stopping heartbeat for instant 20220320215909174
   22/03/20 21:59:23 INFO HoodieHeartbeatClient: Stopped heartbeat for instant 20220320215909174
   22/03/20 21:59:23 INFO HeartbeatUtils: Deleted the heartbeat for instant 20220320215909174
   22/03/20 21:59:23 INFO HoodieHeartbeatClient: Deleted heartbeat file for instant 20220320215909174
   22/03/20 21:59:23 INFO SparkContext: Starting job: collect at SparkHoodieBackedTableMetadataWriter.java:154
   22/03/20 21:59:23 INFO DAGScheduler: Got job 688 (collect at SparkHoodieBackedTableMetadataWriter.java:154) with 1 output partitions
   22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 997 (collect at SparkHoodieBackedTableMetadataWriter.java:154)
   22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 996)
   22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
   22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 997 (MapPartitionsRDD[2136] at flatMap at BaseSparkCommitActionExecutor.java:175), which has no missing parents
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_859 stored as values in memory (estimated size 435.3 KiB, free 357.7 MiB)
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_859_piece0 stored as bytes in memory (estimated size 156.3 KiB, free 357.5 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_859_piece0 in memory on rkalluri.attlocal.net:63252 (size: 156.3 KiB, free: 364.0 MiB)
   22/03/20 21:59:23 INFO SparkContext: Created broadcast 859 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 997 (MapPartitionsRDD[2136] at flatMap at BaseSparkCommitActionExecutor.java:175) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 997.0 with 1 tasks resource profile 0
   22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 997.0 (TID 2277) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4271 bytes) taskResourceAssignments Map()
   22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 997.0 (TID 2277)
   22/03/20 21:59:23 INFO BlockManager: Found block rdd_2136_0 locally
   22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 997.0 (TID 2277). 1328 bytes result sent to driver
   22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 997.0 (TID 2277) in 20 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 997.0, whose tasks have all completed, from pool
   22/03/20 21:59:23 INFO DAGScheduler: ResultStage 997 (collect at SparkHoodieBackedTableMetadataWriter.java:154) finished in 0.091 s
   22/03/20 21:59:23 INFO DAGScheduler: Job 688 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage 997: Stage finished
   22/03/20 21:59:23 INFO DAGScheduler: Job 688 finished: collect at SparkHoodieBackedTableMetadataWriter.java:154, took 0.091996 s
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__deltacommit__COMPLETED]}
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__deltacommit__COMPLETED]}
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__deltacommit__COMPLETED]}
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating in-memory based Table View
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__deltacommit__COMPLETED]}
   22/03/20 21:59:23 INFO HoodieTimelineArchiver: No Instants to archive
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Marking instant complete [==>20220320215909174__commit__INFLIGHT]
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Checking for file exists ?file:/tmp/hudi_4635/.hoodie/20220320215909174.inflight
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Create new file for toInstant ?file:/tmp/hudi_4635/.hoodie/20220320215909174.commit
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Completed [==>20220320215909174__commit__INFLIGHT]
   22/03/20 21:59:23 INFO SparkContext: Starting job: collectAsMap at HoodieSparkEngineContext.java:148
   22/03/20 21:59:23 INFO DAGScheduler: Got job 689 (collectAsMap at HoodieSparkEngineContext.java:148) with 1 output partitions
   22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 998 (collectAsMap at HoodieSparkEngineContext.java:148)
   22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage: List()
   22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
   22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 998 (MapPartitionsRDD[2144] at mapToPair at HoodieSparkEngineContext.java:145), which has no missing parents
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_860 stored as values in memory (estimated size 99.7 KiB, free 357.4 MiB)
   22/03/20 21:59:23 INFO MemoryStore: Block broadcast_860_piece0 stored as bytes in memory (estimated size 35.2 KiB, free 357.4 MiB)
   22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_860_piece0 in memory on rkalluri.attlocal.net:63252 (size: 35.2 KiB, free: 364.0 MiB)
   22/03/20 21:59:23 INFO SparkContext: Created broadcast 860 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 998 (MapPartitionsRDD[2144] at mapToPair at HoodieSparkEngineContext.java:145) (first 15 tasks are for partitions Vector(0))
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 998.0 with 1 tasks resource profile 0
   22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 998.0 (TID 2278) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4387 bytes) taskResourceAssignments Map()
   22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 998.0 (TID 2278)
   22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 998.0 (TID 2278). 858 bytes result sent to driver
   22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 998.0 (TID 2278) in 7 ms on rkalluri.attlocal.net (executor driver) (1/1)
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 998.0, whose tasks have all completed, from pool
   22/03/20 21:59:23 INFO DAGScheduler: ResultStage 998 (collectAsMap at HoodieSparkEngineContext.java:148) finished in 0.026 s
   22/03/20 21:59:23 INFO DAGScheduler: Job 689 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage 998: Stage finished
   22/03/20 21:59:23 INFO DAGScheduler: Job 689 finished: collectAsMap at HoodieSparkEngineContext.java:148, took 0.026346 s
   22/03/20 21:59:23 INFO FSUtils: Removed directory at file:/tmp/hudi_4635/.hoodie/.temp/20220320215909174
   22/03/20 21:59:23 INFO BaseHoodieWriteClient: Start to clean synchronously.
   22/03/20 21:59:23 INFO CleanerUtils: Cleaned failed attempts if any
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating remote first table view
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating remote first table view
   22/03/20 21:59:23 INFO BaseHoodieWriteClient: Cleaner started
   22/03/20 21:59:23 INFO BaseHoodieWriteClient: Scheduling cleaning at instant time :20220320215923917
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating remote first table view
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating remote view for basePath file:/tmp/hudi_4635. Server=rkalluri.attlocal.net:63594, Timeout=300
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating InMemory based view for basePath file:/tmp/hudi_4635
   22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:23 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/refresh/?basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:23 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/compactions/pending/?basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:/tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:/tmp/hudi_4635
   22/03/20 21:59:23 INFO FileSystemViewManager: Creating InMemory based view for basePath file:/tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:23 INFO HoodieTableMetadataUtil: Loading latest merged file slices for metadata table partition files
   22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__deltacommit__COMPLETED]}
   22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:23 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:23 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=1, StoreTimeTaken=0
   22/03/20 21:59:23 INFO CacheConfig: Created cacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432, maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95, multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
   22/03/20 21:59:23 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:23 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:23 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:23 INFO HoodieBackedTableMetadata: Opened metadata base file from file:/tmp/hudi_4635/.hoodie/metadata/files/files-0000_0-966-2246_20220320215907162001.hfile at instant 20220320215907162001 in 1 ms
   22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:24 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:24 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:24 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__deltacommit__COMPLETED]}
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Scanning log file HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255', fileLen=-1}
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Reading a data block from file file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255 at instant 20220320215908164
   22/03/20 21:59:24 INFO HoodieLogFormatReader: Moving to the next reader for logfile HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273', fileLen=-1}
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Scanning log file HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273', fileLen=-1}
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Reading a data block from file file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273 at instant 20220320215909174
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Number of remaining logblocks to merge 1
   22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432, maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95, multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO ExternalSpillableMap: Estimated Payload size => 616
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Merging the final data blocks
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Number of remaining logblocks to merge 1
   22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432, maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95, multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of log files scanned => 2
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: MaxMemoryInBytes allowed for compaction => 1073741824
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of entries in MemoryBasedMap in ExternalSpillableMap => 3
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Total size in bytes of MemoryBasedMap in ExternalSpillableMap => 1848
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of entries in BitCaskDiskMap in ExternalSpillableMap => 0
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Size of file spilled to disk => 0
   22/03/20 21:59:24 INFO HoodieBackedTableMetadata: Opened 2 metadata log files (dataset instant=20220320215909174, metadata instant=20220320215909174) in 35 ms
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO BaseTableMetadata: Listed partitions from metadata: #partitions=5
   22/03/20 21:59:24 INFO CleanPlanner: Total Partitions to clean : 5, with policy KEEP_LATEST_COMMITS
   22/03/20 21:59:24 INFO CleanPlanner: Using cleanerParallelism: 5
   22/03/20 21:59:24 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:100
   22/03/20 21:59:24 INFO DAGScheduler: Got job 690 (collect at HoodieSparkEngineContext.java:100) with 5 output partitions
   22/03/20 21:59:24 INFO DAGScheduler: Final stage: ResultStage 999 (collect at HoodieSparkEngineContext.java:100)
   22/03/20 21:59:24 INFO DAGScheduler: Parents of final stage: List()
   22/03/20 21:59:24 INFO DAGScheduler: Missing parents: List()
   22/03/20 21:59:24 INFO DAGScheduler: Submitting ResultStage 999 (MapPartitionsRDD[2146] at map at HoodieSparkEngineContext.java:100), which has no missing parents
   22/03/20 21:59:24 INFO MemoryStore: Block broadcast_861 stored as values in memory (estimated size 556.0 KiB, free 356.8 MiB)
   22/03/20 21:59:24 INFO MemoryStore: Block broadcast_861_piece0 stored as bytes in memory (estimated size 196.7 KiB, free 356.7 MiB)
   22/03/20 21:59:24 INFO BlockManagerInfo: Added broadcast_861_piece0 in memory on rkalluri.attlocal.net:63252 (size: 196.7 KiB, free: 363.8 MiB)
   22/03/20 21:59:24 INFO SparkContext: Created broadcast 861 from broadcast at DAGScheduler.scala:1478
   22/03/20 21:59:24 INFO DAGScheduler: Submitting 5 missing tasks from ResultStage 999 (MapPartitionsRDD[2146] at map at HoodieSparkEngineContext.java:100) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4))
   22/03/20 21:59:24 INFO TaskSchedulerImpl: Adding task set 999.0 with 5 tasks resource profile 0
   22/03/20 21:59:24 INFO TaskSetManager: Starting task 0.0 in stage 999.0 (TID 2279) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4344 bytes) taskResourceAssignments Map()
   22/03/20 21:59:24 INFO TaskSetManager: Starting task 1.0 in stage 999.0 (TID 2280) (rkalluri.attlocal.net, executor driver, partition 1, PROCESS_LOCAL, 4344 bytes) taskResourceAssignments Map()
   22/03/20 21:59:24 INFO TaskSetManager: Starting task 2.0 in stage 999.0 (TID 2281) (rkalluri.attlocal.net, executor driver, partition 2, PROCESS_LOCAL, 4344 bytes) taskResourceAssignments Map()
   22/03/20 21:59:24 INFO TaskSetManager: Starting task 3.0 in stage 999.0 (TID 2282) (rkalluri.attlocal.net, executor driver, partition 3, PROCESS_LOCAL, 4344 bytes) taskResourceAssignments Map()
   22/03/20 21:59:24 INFO TaskSetManager: Starting task 4.0 in stage 999.0 (TID 2283) (rkalluri.attlocal.net, executor driver, partition 4, PROCESS_LOCAL, 4344 bytes) taskResourceAssignments Map()
   22/03/20 21:59:24 INFO Executor: Running task 2.0 in stage 999.0 (TID 2281)
   22/03/20 21:59:24 INFO Executor: Running task 0.0 in stage 999.0 (TID 2279)
   22/03/20 21:59:24 INFO Executor: Running task 1.0 in stage 999.0 (TID 2280)
   22/03/20 21:59:24 INFO Executor: Running task 4.0 in stage 999.0 (TID 2283)
   22/03/20 21:59:24 INFO Executor: Running task 3.0 in stage 999.0 (TID 2282)
   22/03/20 21:59:24 INFO CleanPlanner: Cleaning HEF/20211215, retaining latest 10 commits.
   22/03/20 21:59:24 INFO CleanPlanner: Cleaning DEF/20211215, retaining latest 10 commits.
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=HEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=DEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (HEF/20211215)
   22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file slices for metadata table partition files
   22/03/20 21:59:24 INFO CleanPlanner: Cleaning GEF/20211215, retaining latest 10 commits.
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=GEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (DEF/20211215)
   22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file slices for metadata table partition files
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (GEF/20211215)
   22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file slices for metadata table partition files
   22/03/20 21:59:24 INFO CleanPlanner: Cleaning EEF/20211215, retaining latest 10 commits.
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=EEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (EEF/20211215)
   22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file slices for metadata table partition files
   22/03/20 21:59:24 INFO CleanPlanner: Cleaning FEF/20211215, retaining latest 10 commits.
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=FEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (FEF/20211215)
   22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file slices for metadata table partition files
   22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__deltacommit__COMPLETED]}
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
   22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering operations
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system view for partition (files)
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=2, StoreTimeTaken=0
   22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432, maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95, multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO HoodieBackedTableMetadata: Opened metadata base file from file:/tmp/hudi_4635/.hoodie/metadata/files/files-0000_0-966-2246_20220320215907162001.hfile at instant 20220320215907162001 in 1 ms
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=3, StoreTimeTaken=0
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=3, StoreTimeTaken=0
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=3, StoreTimeTaken=0
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=3, StoreTimeTaken=0
   22/03/20 21:59:24 INFO BlockManagerInfo: Removed broadcast_859_piece0 on rkalluri.attlocal.net:63252 in memory (size: 156.3 KiB, free: 363.9 MiB)
   22/03/20 21:59:24 INFO BlockManagerInfo: Removed broadcast_857_piece0 on rkalluri.attlocal.net:63252 in memory (size: 35.1 KiB, free: 364.0 MiB)
   22/03/20 21:59:24 INFO BlockManagerInfo: Removed broadcast_860_piece0 on rkalluri.attlocal.net:63252 in memory (size: 35.2 KiB, free: 364.0 MiB)
   22/03/20 21:59:24 INFO BlockManagerInfo: Removed broadcast_858_piece0 on rkalluri.attlocal.net:63252 in memory (size: 35.2 KiB, free: 364.0 MiB)
   22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:24 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:24 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:24 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__deltacommit__COMPLETED]}
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Scanning log file HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255', fileLen=-1}
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Reading a data block from file file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255 at instant 20220320215908164
   22/03/20 21:59:24 INFO HoodieLogFormatReader: Moving to the next reader for logfile HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273', fileLen=-1}
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Scanning log file HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273', fileLen=-1}
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Reading a data block from file file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273 at instant 20220320215909174
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Number of remaining logblocks to merge 1
   22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432, maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95, multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO ExternalSpillableMap: Estimated Payload size => 616
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Merging the final data blocks
   22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Number of remaining logblocks to merge 1
   22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432, maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95, multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of log files scanned => 2
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: MaxMemoryInBytes allowed for compaction => 1073741824
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of entries in MemoryBasedMap in ExternalSpillableMap => 3
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Total size in bytes of MemoryBasedMap in ExternalSpillableMap => 1848
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of entries in BitCaskDiskMap in ExternalSpillableMap => 0
   22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Size of file spilled to disk => 0
   22/03/20 21:59:24 INFO HoodieBackedTableMetadata: Opened 2 metadata log files (dataset instant=20220320215909174, metadata instant=20220320215909174) in 36 ms
   22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
   22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from metadata: partition=GEF/20211215, #files=9
   22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from metadata: partition=HEF/20211215, #files=9
   22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from metadata: partition=FEF/20211215, #files=9
   22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from metadata: partition=EEF/20211215, #files=9
   22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from metadata: partition=DEF/20211215, #files=10
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=9, NumFileGroups=9, FileGroupsCreationTime=0, StoreTimeTaken=0
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=9, NumFileGroups=9, FileGroupsCreationTime=0, StoreTimeTaken=0
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=9, NumFileGroups=9, FileGroupsCreationTime=1, StoreTimeTaken=0
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=9, NumFileGroups=9, FileGroupsCreationTime=1, StoreTimeTaken=0
   22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=10, NumFileGroups=10, FileGroupsCreationTime=1, StoreTimeTaken=0
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=GEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=FEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=DEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=EEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request : (http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=HEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
   22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition path:DEF/20211215
   22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition path:GEF/20211215
   22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition path:EEF/20211215
   22/03/20 21:59:24 INFO Executor: Finished task 0.0 in stage 999.0 (TID 2279). 931 bytes result sent to driver
   22/03/20 21:59:24 INFO Executor: Finished task 3.0 in stage 999.0 (TID 2282). 931 bytes result sent to driver
   22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition path:HEF/20211215
   22/03/20 21:59:24 INFO Executor: Finished task 1.0 in stage 999.0 (TID 2280). 931 bytes result sent to driver
   22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition path:FEF/20211215
   22/03/20 21:59:24 INFO Executor: Finished task 4.0 in stage 999.0 (TID 2283). 931 bytes result sent to driver
   22/03/20 21:59:24 INFO TaskSetManager: Finished task 0.0 in stage 999.0 (TID 2279) in 125 ms on rkalluri.attlocal.net (executor driver) (1/5)
   22/03/20 21:59:24 INFO TaskSetManager: Finished task 3.0 in stage 999.0 (TID 2282) in 125 ms on rkalluri.attlocal.net (executor driver) (2/5)
   22/03/20 21:59:24 INFO Executor: Finished task 2.0 in stage 999.0 (TID 2281). 931 bytes result sent to driver
   22/03/20 21:59:24 INFO TaskSetManager: Finished task 1.0 in stage 999.0 (TID 2280) in 125 ms on rkalluri.attlocal.net (executor driver) (3/5)
   22/03/20 21:59:24 INFO TaskSetManager: Finished task 4.0 in stage 999.0 (TID 2283) in 125 ms on rkalluri.attlocal.net (executor driver) (4/5)
   22/03/20 21:59:24 INFO TaskSetManager: Finished task 2.0 in stage 999.0 (TID 2281) in 125 ms on rkalluri.attlocal.net (executor driver) (5/5)
   22/03/20 21:59:24 INFO TaskSchedulerImpl: Removed TaskSet 999.0, whose tasks have all completed, from pool
   22/03/20 21:59:24 INFO DAGScheduler: ResultStage 999 (collect at HoodieSparkEngineContext.java:100) finished in 0.213 s
   22/03/20 21:59:24 INFO DAGScheduler: Job 690 is finished. Cancelling potential speculative or zombie tasks for this job
   22/03/20 21:59:24 INFO TaskSchedulerImpl: Killing all running tasks in stage 999: Stage finished
   22/03/20 21:59:24 INFO DAGScheduler: Job 690 finished: collect at HoodieSparkEngineContext.java:100, took 0.213358 s
   22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:24 INFO BaseHoodieWriteClient: Start to archive synchronously.
   22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__commit__COMPLETED]}
   22/03/20 21:59:24 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635
   22/03/20 21:59:24 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/hoodie.properties
   22/03/20 21:59:24 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
   22/03/20 21:59:24 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:24 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
   22/03/20 21:59:24 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///tmp/hudi_4635/.hoodie/metadata
   22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20220320215909174__deltacommit__COMPLETED]}
   22/03/20 21:59:24 INFO HoodieTimelineArchiver: Limiting archiving of instants to latest compaction on metadata table at 20220320215907162001
   22/03/20 21:59:24 INFO HoodieHeartbeatClient: Stopping heartbeat for instant 20220320215909174
   22/03/20 21:59:24 INFO HoodieHeartbeatClient: Stopped heartbeat for instant 20220320215909174
   22/03/20 21:59:24 INFO HeartbeatUtils: Deleted the heartbeat for instant 20220320215909174
   22/03/20 21:59:24 INFO HoodieHeartbeatClient: Deleted heartbeat file for instant 20220320215909174
   22/03/20 21:59:24 INFO TransactionManager: Transaction ending with transaction owner Option{val=[==>20220320215909174__commit__INFLIGHT]}
   22/03/20 21:59:24 INFO ZookeeperBasedLockProvider: RELEASING lock atZkBasePath = /hudi, lock key = None
   22/03/20 21:59:24 INFO ZookeeperBasedLockProvider: RELEASED lock atZkBasePath = /hudi, lock key = None
   22/03/20 21:59:24 INFO TransactionManager: Transaction ended with transaction owner Option{val=[==>20220320215909174__commit__INFLIGHT]}
   An error occurred while calling o1843.save.
   : java.lang.NullPointerException
   	at org.apache.hudi.client.HoodieTimelineArchiver.lambda$getInstantsToArchive$10(HoodieTimelineArchiver.java:452)
   	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
   	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
   	at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
   	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
   	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
   	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
   	at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1351)
   	at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
   	at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   	at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
   	at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
   	at org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:147)
   	at org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:818)
   	at org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:572)
   	at org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:477)
   	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:212)
   	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:119)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:667)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:299)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
   	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:848)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:382)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
   	at sun.reflect.GeneratedMethodAccessor224.invoke(Unknown Source)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
   	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
   	at java.lang.Thread.run(Thread.java:745)
   
   22/03/20 21:59:24 INFO SparkContext: Invoking stop() from shutdown hook
   22/03/20 21:59:24 INFO SparkUI: Stopped Spark web UI at http://rkalluri.attlocal.net:4040
   22/03/20 21:59:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
   22/03/20 21:59:24 INFO MemoryStore: MemoryStore cleared
   22/03/20 21:59:24 INFO BlockManager: BlockManager stopped
   22/03/20 21:59:24 INFO BlockManagerMaster: BlockManagerMaster stopped
   22/03/20 21:59:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
   22/03/20 21:59:24 INFO SparkContext: Successfully stopped SparkContext
   22/03/20 21:59:24 INFO ShutdownHookManager: Shutdown hook called


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1082277719


   We have identified an issue w/ multi-writer wrt archival resulting in NPE and the fix is [here](https://github.com/apache/hudi/pull/5138). If you can give it a try and let us know, if it works, would be great. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1039455327


   actually we made few fixes to multi-writer in 0.10.0. Especially around [rolling back unintentded commits](https://github.com/apache/hudi/pull/3956). 
   So, when you say you are also seeing it w/ 0.10.0, was the table created in 0.10.0 and then after few commits you are facing this issue. Or was the table created in older versions and was upgraded to 0.10.0. If its upgraded, probably the table was already in bad state that lead to NullPointerException. 
   
   If you can clarify this, it could help investigate the issue. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1073712400


   > sorry. could not get time to repro this. I don't have exp w/ pyspark. I just saved the script to rep.py and tried to execute the spark-submit. but running into failures.
   > 
   > ```
   > :: resolution report :: resolve 255ms :: artifacts dl 5ms
   > 	:: modules in use:
   > 	org.apache.hudi#hudi-spark-bundle_2.11;0.10.1 from local-m2-cache in [default]
   > 	org.apache.spark#spark-avro_2.11;2.4.4 from local-m2-cache in [default]
   > 	org.spark-project.spark#unused;1.0.0 from local-m2-cache in [default]
   > 	---------------------------------------------------------------------
   > 	|                  |            modules            ||   artifacts   |
   > 	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
   > 	---------------------------------------------------------------------
   > 	|      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
   > 	---------------------------------------------------------------------
   > :: retrieving :: org.apache.spark#spark-submit-parent-a3077db6-05fc-4539-ab07-7fcdba4e85ba
   > 	confs: [default]
   > 	0 artifacts copied, 3 already retrieved (0kB/5ms)
   > 22/03/19 14:05:28 WARN Utils: Your hostname, Sivabalans-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.1.5 instead (on interface en0)
   > 22/03/19 14:05:28 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
   > 22/03/19 14:05:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   >   File "/tmp/rep.py", line 26
   >     'hoodie.write.lock.zookeeper.lock_key': f"{table_name}",
   >                                                           ^
   > SyntaxError: invalid syntax
   > log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
   > log4j:WARN Please initialize the log4j system properly.
   > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   > ```
   > 
   > Or should I launch pyspark and run through the commands you have given above.
   
   yeah, please launch pyspark and try.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID edited a comment on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID edited a comment on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1020055298


   We tried hudi 0.10.0, but running into issue with bulk write, multi writer 
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o240.save.
   : org.apache.hudi.exception.HoodieRemoteException: Failed to delete marker directory s3://xxxxx/tmp/tmp/tmp/tables/deeptick/.hoodie/.temp/20220124051311124
   Read timed out
           at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.deleteMarkerDir(TimelineServerBasedWriteMarkers.java:91)
           at org.apache.hudi.table.marker.WriteMarkers.quietDeleteMarkerDir(WriteMarkers.java:88)
           at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:450)
           at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:197)
           at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
           at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:633)
           at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:284)
           at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
           at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
           at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
           at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
           at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
           at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
           at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
           at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
           at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
           at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
           at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.net.SocketTimeoutException: Read timed out
           at java.net.SocketInputStream.socketRead0(Native Method)
           at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
           at java.net.SocketInputStream.read(SocketInputStream.java:171)
           at java.net.SocketInputStream.read(SocketInputStream.java:141)
           at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
           at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
           at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
           at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
           at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
           at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
           at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
           at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
           at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
           at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
           at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
           at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
           at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
           at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
           at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
           at org.apache.http.client.fluent.Request.execute(Request.java:151)
           at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.executeRequestToTimelineServer(TimelineServerBasedWriteMarkers.java:177)
           at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.deleteMarkerDir(TimelineServerBasedWriteMarkers.java:88)
           ... 45 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1049491206


   Hi @nsivabalan,
   Here is the reproducible code
   <details>
   <summary> pyspark script </summary>
   
   ```
   
   from pyspark.context import SparkContext
   from pyspark.sql.session import SparkSession
   from pyspark.sql.functions import col, to_timestamp, monotonically_increasing_id, to_date, when
   from pyspark.sql.types import *
   import time
   from pyspark.sql.functions import lit
   from pyspark.sql.functions import col, when, expr
   import argparse
   import threading
   
   spark = SparkSession.builder.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer').config('spark.sql.hive.convertMetastoreParquet', 'false').getOrCreate()
   sc = spark.sparkContext
   
   table_name = None
   table_path = None
   
   header = [["A0", "STRING"], ["A1", "STRING"], ["A2", "STRING"], ["A3", "STRING"], ["A4", "STRING"], ["A5", "INTEGER"], ["A6", "INTEGER"], ["A7", "SHORT"], ["A8", "INTEGER"], ["A9", "LONG"], ["A10", "DOUBLE"], ["A11", "INTEGER"], ["A12", "LONG"], ["A13", "DOUBLE"], ["A14", "LONG"], ["A15", "DOUBLE"], ["A16", "DOUBLE"], ["A17", "INTEGER"], ["A18", "SHORT"], ["A19", "DOUBLE"], ["A20", "INTEGER"], ["A21", "SHORT"], ["A22", "DOUBLE"], ["A23", "STRING"], ["A24", "STRING"], ["A25", "INTEGER"], ["A26", "INTEGER"], ["A27", "STRING"], ["A28", "INTEGER"], ["A29", "INTEGER"], ["A30", "STRING"], ["A31", "DOUBLE"], ["A32", "DOUBLE"], ["A33", "STRING"], ["A34", "DOUBLE"], ["A35", "INTEGER"], ["A36", "SHORT"], ["A37", "STRING"], ["A38", "DOUBLE"], ["A39", "STRING"], ["A40", "STRING"], ["A41", "STRING"], ["A42", "STRING"], ["A43", "STRING"], ["A44", "INTEGER"], ["A45", "LONG"], ["A46", "LONG"], ["A47", "LONG"], ["A48", "LONG"], ["A49", "LONG"], ["A50", "LONG"], ["A51", "INTEGER"], ["A52", "INTEGER
 "], ["A53", "INTEGER"], ["A54", "INTEGER"], ["A55", "INTEGER"], ["A56", "DOUBLE"], ["A57", "DOUBLE"], ["A58", "DOUBLE"], ["A59", "DOUBLE"], ["A60", "LONG"], ["A61", "STRING"], ["A62", "DOUBLE"], ["A63", "STRING"], ["A64", "DOUBLE"], ["A65", "DOUBLE"], ["A66", "LONG"], ["A67", "LONG"]]
   
   common_config = {
       'className' : 'org.apache.hudi',
       'hoodie.write.concurrency.mode': 'optimistic_concurrency_control',
       'hoodie.write.lock.provider':'org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider',
       'hoodie.cleaner.policy.failed.writes':'LAZY',
       'hoodie.write.lock.zookeeper.url':'xxxxxxx',
       'hoodie.write.lock.zookeeper.port':'2181',
       'hoodie.write.lock.zookeeper.lock_key': f"{table_name}",
       'hoodie.write.lock.zookeeper.base_path':'/hudi',
       'hoodie.datasource.write.row.writer.enable': 'false',
       'hoodie.table.name': table_name,
       'hoodie.datasource.write.table.type': 'COPY_ON_WRITE',
       'hoodie.datasource.write.recordkey.field': 'A1,A9',
       'hoodie.datasource.write.partitionpath.field': 'A2,A5',
       'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator',
       'hoodie.datasource.write.precombine.field': "A5",
       'hoodie.datasource.hive_sync.use_jdbc': 'false',
       'hoodie.datasource.hive_sync.enable': 'false',
       'hoodie.compaction.payload.class': 'org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload',
       'hoodie.datasource.hive_sync.table': f"{table_name}",
       'hoodie.datasource.hive_sync.partition_fields': 'A2,A5',
       'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
       'hoodie.copyonwrite.record.size.estimate': 256,
       'hoodie.write.lock.client.wait_time_ms': 1000,
       'hoodie.write.lock.client.num_retries': 50
   }
   
   init_load_config = {
       'hoodie.parquet.max.file.size': 1024*1024*1024,
       'hoodie.bulkinsert.shuffle.parallelism': 10,
       'compactionSmallFileSize': 100*1024*1024,
       'hoodie.datasource.write.operation': 'bulk_insert',
       'hoodie.write.markers.type': "DIRECT"
       #'hoodie.compact.inline': True
       # 'hoodie.datasource.write.insert.drop.duplicates' : 'true'
   }
   
   increamental_config = {
       'hoodie.upsert.shuffle.parallelism': 1,
       'hoodie.insert.shuffle.parallelism': 1,
       'hoodie.cleaner.commits.retained': 1,
       'hoodie.clean.automatic': True
   }
   
   def get_parameters():
       parser = argparse.ArgumentParser(
           description='Usage: --table_path=<path of table> --table_name=<table_name>')
       parser.add_argument('--table_path', help='table_path', required=True)
       parser.add_argument('--table_name', help='table_name', required=True)
       (args, unknown) = parser.parse_known_args()
       return args
   
   def main():
       global table_path
       global table_name
   
       params       = get_parameters()
       table_path   = params.table_path
       table_name   = params.table_name
       common_config['hoodie.table.name'] = table_name
       common_config['hoodie.datasource.hive_sync.table'] = table_name
       common_config['path'] = table_path
       schema = ",".join([ f"{field[0]} {field[1]}" for field in header])
       records = [
               ['A','ABC','DEF','USA','1',20211215,1,2,3,-4,None,None,None,None,None,None,5.19,None,None,0.0,None,None,None,'0',None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,1,0,None,8,None,7,6,None,None,None,None,None,None,None,None,None,None,None,None,None]
       ]
       df = spark.createDataFrame(records, schema)
       bulk_insert(df)
       print("Wrote 1 file")
   
       records = [
       ['A','ABC','DEF','USA','1',20211215,1,2,3,-4,None,None,None,None,None,None,5.19,None,None,0.0,None,None,None,'0',None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,1,0,None,8,None,7,6,None,None,None,None,None,None,None,None,None,None,None,None,None],
       ['A','ABC','EEF','USA','1',20211215,1,2,3,-4,None,None,None,None,None,None,5.19,None,None,0.0,None,None,None,'0',None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,1,0,None,8,None,7,6,None,None,None,None,None,None,None,None,None,None,None,None,None],
       ['A','ABC','FEF','USA','1',20211215,1,2,3,-4,None,None,None,None,None,None,5.19,None,None,0.0,None,None,None,'0',None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,1,0,None,8,None,7,6,None,None,None,None,None,None,None,None,None,None,None,None,None],
       ['A','ABC','GEF','USA','1',20211215,1,2,3,-4,None,None,None,None,None,None,5.19,None,None,0.0,None,None,None,'0',None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,1,0,None,8,None,7,6,None,None,None,None,None,None,None,None,None,None,None,None,None],
       ['A','ABC','HEF','USA','1',20211215,1,2,3,-4,None,None,None,None,None,None,5.19,None,None,0.0,None,None,None,'0',None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,1,0,None,8,None,7,6,None,None,None,None,None,None,None,None,None,None,None,None,None],
       ]
       df = spark.createDataFrame(records, schema)
       for i in range(1,20):
           print(f"Writing file #{i+1}")
           bulk_insert(df)
       print("Job finished")
   
   
   def bulk_insert(input_df):
       isos = input_df.select("A2").distinct().rdd.flatMap(lambda x: x).collect()
       dfs = {iso:input_df.where(input_df.A2 == iso) for iso in isos}
       combinedConf = {**common_config, **init_load_config }
   
       #dfs = {"ALL": input_df}
       print("Total frames: {}".format(len(dfs)))
       print("Running bulk-insert in table {}:{}".format(table_path, table_name))
       cnt = 1
       #executor = ThreadPoolExecutor(len(dfs))
       tlist = list()
       for iso in dfs:
           df = dfs[iso]
           print("Writing dataframe: {}".format(cnt))
           t = BulkWriterThread()
           t.set(df, iso, combinedConf)
           t.start()
           tlist.append(t)
           time.sleep(1)
           print("Wrote")
           cnt += 1
       print("Waiting for finish")
       for t in tlist:
           t.join()
   
       print("Wait finished")
       print("Write completed")
   
   
   class BulkWriterThread(threading.Thread):
       def run(self):
           name = threading.current_thread().name
           print(f"{name}: writing {self.iso} data")
           self.exc = None
           try:
               self.df.write.format('org.apache.hudi').option('hoodie.datasource.write.operation', 'bulk_insert').options(**self.conf).mode('append').save()
               #glueContext.write_dynamic_frame.from_options(frame = DynamicFrame.fromDF(self.df, glueContext, "df"), connection_type = "marketplace.spark", connection_options = self.conf)
               print(f"{name}: {self.iso} data written")
           except Exception as e:
               print(e)
               self.exc = e
   
       def set(self, df, iso, conf):
           self.df = df
           self.iso = iso
           self.conf = conf
   
       def join(self):
           print("Joining")
           name = threading.current_thread().name
           threading.Thread.join(self)
           print("Joined")
   
           if self.exc:
               print(f"{name}: Error in writing {self.iso} data")
               raise self.exc
   
   
   
   if __name__ == "__main__":
       main()
   ```
   </details>
    
    We are runining it like this: 
   
    ```
    spark-submit --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.hive.convertMetastoreParquet=false" --jars  s3://xxxx/jars/hudi-spark3-bundle_2.12-0.10.0.jar,/usr/lib/spark/external/lib/spark-avro.jar rep.py --table_path s3://xxx/tables/rept5 --table_name=rept5
   ```
   
   Please let me know if it's reproducible on your side or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1039455327


   actually we made few fixes to multi-writer in 0.10.0. Especially around [rolling back unintentded commits](https://github.com/apache/hudi/pull/3956). 
   So, when you say you are also seeing it w/ 0.10.0, was the table created in 0.10.0 and then after few commits you are facing this issue. Or was the table created in older versions and was upgraded to 0.10.0. If its upgraded, probably the table was already in bad state that lead to NullPointerException. 
   
   If you can clarify this, it could help investigate the issue. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1020335689


   <details>
   <summary>
   content of .hoodie folder where bulk write is failing (with hudi 0.0.9)
   </summary>
   ```
                              PRE .aux/
   2022-01-14 06:26:20          0 .aux_$folder$
   2022-01-18 09:12:20     115150 .commits_.archive.10_1-0-1
   2022-01-18 09:15:52     118166 .commits_.archive.11_1-0-1
   2022-01-18 11:35:24     118196 .commits_.archive.12_1-0-1
   2022-01-18 13:14:09     115304 .commits_.archive.13_1-0-1
   2022-01-18 13:18:03     118104 .commits_.archive.14_1-0-1
   2022-01-20 07:29:18     129672 .commits_.archive.15_1-0-1
   2022-01-20 07:48:27     123601 .commits_.archive.16_1-0-1
   2022-01-20 10:30:33     101851 .commits_.archive.17_1-0-1
   2022-01-20 10:33:37     104875 .commits_.archive.18_1-0-1
   2022-01-20 12:00:17     100167 .commits_.archive.19_1-0-1
   2022-01-14 13:03:09     116982 .commits_.archive.1_1-0-1
   2022-01-21 12:53:14     131089 .commits_.archive.20_1-0-1
   2022-01-14 19:11:43     200942 .commits_.archive.2_1-0-1
   2022-01-15 14:57:46     119663 .commits_.archive.3_1-0-1
   2022-01-15 15:03:58     118443 .commits_.archive.4_1-0-1
   2022-01-18 04:47:11     120103 .commits_.archive.5_1-0-1
   2022-01-18 06:33:57     118235 .commits_.archive.6_1-0-1
   2022-01-18 07:33:46     118155 .commits_.archive.7_1-0-1
   2022-01-18 08:47:54     113318 .commits_.archive.8_1-0-1
   2022-01-18 08:54:16     124461 .commits_.archive.9_1-0-1
   2022-01-14 06:26:42          0 .heartbeat_$folder$
   2022-01-14 06:26:20          0 .temp_$folder$
   2022-01-17 12:43:50       5499 20220117124338.rollback
   2022-01-17 12:43:50          0 20220117124338.rollback.inflight
   2022-01-17 12:52:39       5530 20220117125235.rollback
   2022-01-17 12:52:39          0 20220117125235.rollback.inflight
   2022-01-17 12:53:50       5530 20220117125347.rollback
   2022-01-17 12:53:50          0 20220117125347.rollback.inflight
   2022-01-17 12:54:02       5530 20220117125357.rollback
   2022-01-17 12:54:02          0 20220117125357.rollback.inflight
   2022-01-17 12:54:34       5530 20220117125430.rollback
   2022-01-17 12:54:34          0 20220117125430.rollback.inflight
   2022-01-17 12:54:54       5550 20220117125451.rollback
   2022-01-17 12:54:54          0 20220117125451.rollback.inflight
   2022-01-17 12:55:14       5540 20220117125511.rollback
   2022-01-17 12:55:14          0 20220117125511.rollback.inflight
   2022-01-17 12:56:01       5550 20220117125558.rollback
   2022-01-17 12:56:01          0 20220117125558.rollback.inflight
   2022-01-17 12:56:28       5540 20220117125625.rollback
   2022-01-17 12:56:28          0 20220117125625.rollback.inflight
   2022-01-17 12:56:31       5550 20220117125628.rollback
   2022-01-17 12:56:31          0 20220117125628.rollback.inflight
   2022-01-17 12:56:54       5540 20220117125651.rollback
   2022-01-17 12:56:54          0 20220117125651.rollback.inflight
   2022-01-17 12:59:08       5551 20220117125833.rollback
   2022-01-17 12:59:08          0 20220117125833.rollback.inflight
   2022-01-17 12:58:50       5540 20220117125845.rollback
   2022-01-17 12:58:50          0 20220117125845.rollback.inflight
   2022-01-17 12:59:10       5540 20220117125904.rollback
   2022-01-17 12:59:10          0 20220117125904.rollback.inflight
   2022-01-17 12:59:13       5550 20220117125909.rollback
   2022-01-17 12:59:13          0 20220117125909.rollback.inflight
   2022-01-17 12:59:24       5550 20220117125921.rollback
   2022-01-17 12:59:24          0 20220117125921.rollback.inflight
   2022-01-17 12:59:34       5550 20220117125931.rollback
   2022-01-17 12:59:33          0 20220117125931.rollback.inflight
   2022-01-17 13:02:27       5540 20220117130222.rollback
   2022-01-17 13:02:26          0 20220117130222.rollback.inflight
   2022-01-17 13:04:56       5540 20220117130452.rollback
   2022-01-17 13:04:56          0 20220117130452.rollback.inflight
   2022-01-18 05:31:55       4197 20220118053147.rollback
   2022-01-18 05:31:55          0 20220118053147.rollback.inflight
   2022-01-18 05:40:20       4197 20220118054011.rollback
   2022-01-18 05:40:20          0 20220118054011.rollback.inflight
   2022-01-18 05:42:16       4197 20220118054208.rollback
   2022-01-18 05:42:16          0 20220118054208.rollback.inflight
   2022-01-18 09:15:59       5550 20220118091556.rollback
   2022-01-18 09:15:59          0 20220118091556.rollback.inflight
   2022-01-18 11:35:32       5550 20220118113528.rollback
   2022-01-18 11:35:32          0 20220118113528.rollback.inflight
   2022-01-20 10:36:11      11228 20220120103556.commit
   2022-01-20 10:35:59          0 20220120103556.commit.requested
   2022-01-20 10:36:01          0 20220120103556.inflight
   2022-01-20 10:36:30      11254 20220120103623.commit
   2022-01-20 10:36:24          0 20220120103623.commit.requested
   2022-01-20 10:36:25          0 20220120103623.inflight
   2022-01-20 10:36:46      11254 20220120103638.commit
   2022-01-20 10:36:40          0 20220120103638.commit.requested
   2022-01-20 10:36:40          0 20220120103638.inflight
   2022-01-20 10:37:01      11254 20220120103654.commit
   2022-01-20 10:36:55          0 20220120103654.commit.requested
   2022-01-20 10:36:56          0 20220120103654.inflight
   2022-01-20 10:37:18      11254 20220120103710.commit
   2022-01-20 10:37:11          0 20220120103710.commit.requested
   2022-01-20 10:37:12          0 20220120103710.inflight
   2022-01-20 12:00:09      31701 20220120105501.commit
   2022-01-20 10:55:05          0 20220120105501.commit.requested
   2022-01-20 10:55:06          0 20220120105501.inflight
   2022-01-20 14:26:02      31821 20220120120442.commit
   2022-01-20 12:04:44          0 20220120120442.commit.requested
   2022-01-20 12:04:45          0 20220120120442.inflight
   2022-01-20 15:47:44      31803 20220120142932.commit
   2022-01-20 14:29:34          0 20220120142932.commit.requested
   2022-01-20 14:29:34          0 20220120142932.inflight
   2022-01-21 07:07:32       6568 20220121070720.commit
   2022-01-21 07:07:24          0 20220121070720.commit.requested
   2022-01-21 07:07:25          0 20220121070720.inflight
   2022-01-21 08:23:22       6568 20220121082311.commit
   2022-01-21 08:23:14          0 20220121082311.commit.requested
   2022-01-21 08:23:15          0 20220121082311.inflight
   2022-01-21 08:23:39       6573 20220121082334.commit
   2022-01-21 08:23:36          0 20220121082334.commit.requested
   2022-01-21 08:23:36          0 20220121082334.inflight
   2022-01-21 08:23:53       6574 20220121082347.commit
   2022-01-21 08:23:49          0 20220121082347.commit.requested
   2022-01-21 08:23:49          0 20220121082347.inflight
   2022-01-21 10:00:51      30887 20220121082810.commit
   2022-01-21 08:28:12          0 20220121082810.commit.requested
   2022-01-21 08:28:13          0 20220121082810.inflight
   2022-01-21 10:58:33      30862 20220121100330.commit
   2022-01-21 10:03:31          0 20220121100330.commit.requested
   2022-01-21 10:03:32          0 20220121100330.inflight
   2022-01-21 12:37:24       6572 20220121123710.commit
   2022-01-21 12:37:14          0 20220121123710.commit.requested
   2022-01-21 12:37:15          0 20220121123710.inflight
   2022-01-21 12:37:45       6572 20220121123711.commit
   2022-01-21 12:37:14          0 20220121123711.commit.requested
   2022-01-21 12:37:15          0 20220121123711.inflight
   2022-01-21 12:37:36       6572 20220121123712.commit
   2022-01-21 12:37:14          0 20220121123712.commit.requested
   2022-01-21 12:37:15          0 20220121123712.inflight
   2022-01-21 12:37:54       6572 20220121123713.commit
   2022-01-21 12:37:15          0 20220121123713.commit.requested
   2022-01-21 12:37:15          0 20220121123713.inflight
   2022-01-21 12:38:03       6572 20220121123714.commit
   2022-01-21 12:37:15          0 20220121123714.commit.requested
   2022-01-21 12:37:16          0 20220121123714.inflight
   2022-01-21 12:38:12       6572 20220121123715.commit
   2022-01-21 12:37:16          0 20220121123715.commit.requested
   2022-01-21 12:37:17          0 20220121123715.inflight
   2022-01-21 12:53:30       6571 20220121125249.commit
   2022-01-21 12:52:53          0 20220121125249.commit.requested
   2022-01-21 12:52:54          0 20220121125249.inflight
   2022-01-21 12:53:02       6572 20220121125250.commit
   2022-01-21 12:52:53          0 20220121125250.commit.requested
   2022-01-21 12:52:54          0 20220121125250.inflight
   2022-01-21 12:53:28       6572 20220121125251.commit
   2022-01-21 12:52:53          0 20220121125251.commit.requested
   2022-01-21 12:52:54          0 20220121125251.inflight
   2022-01-21 12:53:31       6572 20220121125252.commit
   2022-01-21 12:52:53          0 20220121125252.commit.requested
   2022-01-21 12:52:54          0 20220121125252.inflight
   2022-01-21 12:53:35       6572 20220121125253.commit
   2022-01-21 12:52:55          0 20220121125253.commit.requested
   2022-01-21 12:52:55          0 20220121125253.inflight
   2022-01-21 12:53:33       6571 20220121125254.commit
   2022-01-21 12:52:55          0 20220121125254.commit.requested
   2022-01-21 12:52:56          0 20220121125254.inflight
   2022-01-14 06:26:19          0 archived_$folder$
   2022-01-14 06:26:21        493 hoodie.properties
   ```
   </summary>
   </details>
   <details>
   <summary>
   Content of  second table .hoodie folder with hudi version 0.10.0
   </summary>
   ```
                              PRE .aux/
   2022-01-24 04:54:01          0 .aux_$folder$
   2022-01-24 04:54:12          0 .heartbeat_$folder$
   2022-01-24 04:54:00          0 .temp_$folder$
   2022-01-24 04:54:11       6580 20220124045358431.commit
   2022-01-24 04:54:03          0 20220124045358431.commit.requested
   2022-01-24 04:54:04          0 20220124045358431.inflight
   2022-01-24 04:58:12       6584 20220124045800035.commit
   2022-01-24 04:58:02          0 20220124045800035.commit.requested
   2022-01-24 04:58:03          0 20220124045800035.inflight
   2022-01-24 04:58:16       6582 20220124045800759.commit
   2022-01-24 04:58:02          0 20220124045800759.commit.requested
   2022-01-24 04:58:03          0 20220124045800759.inflight
   2022-01-24 04:58:19       6584 20220124045801710.commit
   2022-01-24 04:58:03          0 20220124045801710.commit.requested
   2022-01-24 04:58:04          0 20220124045801710.inflight
   2022-01-24 04:58:18       6584 20220124045802657.commit
   2022-01-24 04:58:04          0 20220124045802657.commit.requested
   2022-01-24 04:58:04          0 20220124045802657.inflight
   2022-01-24 04:58:20       6584 20220124045803717.commit
   2022-01-24 04:58:05          0 20220124045803717.commit.requested
   2022-01-24 04:58:05          0 20220124045803717.inflight
   2022-01-24 04:58:22       6584 20220124045804602.commit
   2022-01-24 04:58:05          0 20220124045804602.commit.requested
   2022-01-24 04:58:06          0 20220124045804602.inflight
   2022-01-24 10:32:18      14889 20220124051309563.commit
   2022-01-24 05:13:12          0 20220124051309563.commit.requested
   2022-01-24 05:13:13          0 20220124051309563.inflight
   2022-01-24 09:43:13      14733 20220124051310390.commit
   2022-01-24 05:13:12          0 20220124051310390.commit.requested
   2022-01-24 05:13:13          0 20220124051310390.inflight
   2022-01-24 10:25:13      14920 20220124051311124.commit
   2022-01-24 05:13:12          0 20220124051311124.commit.requested
   2022-01-24 05:13:13          0 20220124051311124.inflight
   2022-01-24 09:43:24      14872 20220124051312057.commit
   2022-01-24 05:13:13          0 20220124051312057.commit.requested
   2022-01-24 05:13:14          0 20220124051312057.inflight
   2022-01-24 09:33:06      14807 20220124051313047.commit
   2022-01-24 05:13:14          0 20220124051313047.commit.requested
   2022-01-24 05:13:15          0 20220124051313047.inflight
   2022-01-24 09:28:05      14803 20220124051314035.commit
   2022-01-24 05:13:15          0 20220124051314035.commit.requested
   2022-01-24 05:13:16          0 20220124051314035.inflight
   2022-01-24 09:43:40      14850 20220124051315081.commit
   2022-01-24 05:13:16          0 20220124051315081.commit.requested
   2022-01-24 05:13:16          0 20220124051315081.inflight
   2022-01-24 09:43:35      14870 20220124051316033.commit
   2022-01-24 05:13:17          0 20220124051316033.commit.requested
   2022-01-24 05:13:17          0 20220124051316033.inflight
   2022-01-24 09:43:06      14838 20220124051317035.commit
   2022-01-24 05:13:18          0 20220124051317035.commit.requested
   2022-01-24 05:13:18          0 20220124051317035.inflight
   2022-01-24 09:43:18      14834 20220124051318026.commit
   2022-01-24 05:13:19          0 20220124051318026.commit.requested
   2022-01-24 05:13:19          0 20220124051318026.inflight
   2022-01-24 09:46:48      14874 20220124051319038.commit
   2022-01-24 05:13:20          0 20220124051319038.commit.requested
   2022-01-24 05:13:20          0 20220124051319038.inflight
   2022-01-24 09:38:07      14820 20220124051320082.commit
   2022-01-24 05:13:21          0 20220124051320082.commit.requested
   2022-01-24 05:13:21          0 20220124051320082.inflight
   2022-01-24 10:50:49      14888 20220124051321058.commit
   2022-01-24 05:13:22          0 20220124051321058.commit.requested
   2022-01-24 05:13:22          0 20220124051321058.inflight
   2022-01-24 10:31:26      14888 20220124051322102.commit
   2022-01-24 05:13:23          0 20220124051322102.commit.requested
   2022-01-24 05:13:23          0 20220124051322102.inflight
   2022-01-24 10:54:44      17633 20220124051323094.commit
   2022-01-24 05:13:24          0 20220124051323094.commit.requested
   2022-01-24 05:13:24          0 20220124051323094.inflight
   2022-01-24 09:43:29      14854 20220124051324052.commit
   2022-01-24 05:13:25          0 20220124051324052.commit.requested
   2022-01-24 05:13:25          0 20220124051324052.inflight
   2022-01-24 10:31:40      14920 20220124051325067.commit
   2022-01-24 05:13:26          0 20220124051325067.commit.requested
   2022-01-24 05:13:26          0 20220124051325067.inflight
   2022-01-24 09:46:55      14721 20220124051326055.commit
   2022-01-24 05:13:27          0 20220124051326055.commit.requested
   2022-01-24 05:13:27          0 20220124051326055.inflight
   2022-01-24 10:37:18      14920 20220124051327065.commit
   2022-01-24 05:13:28          0 20220124051327065.commit.requested
   2022-01-24 05:13:28          0 20220124051327065.inflight
   2022-01-24 04:54:00          0 archived_$folder$
   2022-01-24 04:54:02        550 hoodie.properties
   ```
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID removed a comment on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID removed a comment on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1073711184


   > 
   
   yeah, please launch pyspark and try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1073711184


   > 
   
   yeah, please launch pyspark and try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1018159576


   Bulk Write is running without any failures with single writer, but we want to write bunch of files, so we need multi writer to decrease total write time. Is there anything we are missing for multi writer or any way to fix it for multi writer ?
   That's all we have in the stack trace.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1033325939


   @nsivabalan I tried with "hoodie.write.markers.type" = "DIRECT" and it stopped that "Failed to delete marker directory" error with hudi 0.10.0 , but it again failed with the same error we are getting with hudi 0.9.0
   
   <details>
   <summary> stack trace</summary>
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o243.save.
   : java.lang.NullPointerException
           at org.apache.hudi.table.HoodieTimelineArchiveLog.lambda$getInstantsToArchive$8(HoodieTimelineArchiveLog.java:226)
           at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
           at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
           at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
           at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
           at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
           at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
           at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
           at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
           at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
           at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
           at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
           at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
           at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
           at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
           at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
           at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
           at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
           at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:124)
           at org.apache.hudi.client.AbstractHoodieWriteClient.archive(AbstractHoodieWriteClient.java:760)
           at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:453)
           at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:197)
           at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
           at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:633)
           at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:284)
           at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
           at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
           at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
           at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
           at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
           at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
           at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
           at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
           at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
           at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
           at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   ```
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1030880076


   @VIKASPATID : Can you switch to direct style markers ("hoodie.write.markers.type" = "DIRECT") for now and unblock yourself as Ethan investigates in the mean time. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1020055298


   We tried hudi 0.10.0, but running into issue with bulk write, multi writer 
   py4j.protocol.Py4JJavaError: An error occurred while calling o240.save.
   : org.apache.hudi.exception.HoodieRemoteException: Failed to delete marker directory s3://xxxxx/tmp/tmp/tmp/tables/deeptick/.hoodie/.temp/20220124051311124
   Read timed out
           at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.deleteMarkerDir(TimelineServerBasedWriteMarkers.java:91)
           at org.apache.hudi.table.marker.WriteMarkers.quietDeleteMarkerDir(WriteMarkers.java:88)
           at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:450)
           at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:197)
           at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
           at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:633)
           at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:284)
           at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
           at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
           at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
           at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
           at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
           at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
           at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
           at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
           at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
           at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
           at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
           at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
           at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
           at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
           at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.net.SocketTimeoutException: Read timed out
           at java.net.SocketInputStream.socketRead0(Native Method)
           at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
           at java.net.SocketInputStream.read(SocketInputStream.java:171)
           at java.net.SocketInputStream.read(SocketInputStream.java:141)
           at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
           at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
           at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
           at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
           at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
           at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
           at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
           at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
           at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
           at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
           at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
           at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
           at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
           at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
           at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
           at org.apache.http.client.fluent.Request.execute(Request.java:151)
           at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.executeRequestToTimelineServer(TimelineServerBasedWriteMarkers.java:177)
           at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.deleteMarkerDir(TimelineServerBasedWriteMarkers.java:88)
           ... 45 more
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1018531352


   got it. So, w/o setting any multi-writer configs, are things working fine? 
   Also, can you post the contents of .hoodie folder.
   
   Let's try few things to see if we can unblock you and we can go from there.
   - Try switching to "insert" or "upsert"
   - Try removing all multi-writer configs and see what happens. 
   
   btw, multi-writer had few bugs in 0.8.0. We fixed few bugs in 0.10.0. So, would recommend giving it a try with 0.10.0 if you are really looking to enable multi-writer. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1041063573


   Hi @nsivabalan, it's a new table created with 0.10.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1047399253


   @VIKASPATID : is it possible to give us reproducible steps. would help us triage it faster. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1073112112


   sorry. could not get time to repro this. I don't have exp w/ pyspark. I just saved the script to rep.py and tried to execute the spark-submit. but running into failures. 
   ```
   :: resolution report :: resolve 255ms :: artifacts dl 5ms
   	:: modules in use:
   	org.apache.hudi#hudi-spark-bundle_2.11;0.10.1 from local-m2-cache in [default]
   	org.apache.spark#spark-avro_2.11;2.4.4 from local-m2-cache in [default]
   	org.spark-project.spark#unused;1.0.0 from local-m2-cache in [default]
   	---------------------------------------------------------------------
   	|                  |            modules            ||   artifacts   |
   	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
   	---------------------------------------------------------------------
   	|      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
   	---------------------------------------------------------------------
   :: retrieving :: org.apache.spark#spark-submit-parent-a3077db6-05fc-4539-ab07-7fcdba4e85ba
   	confs: [default]
   	0 artifacts copied, 3 already retrieved (0kB/5ms)
   22/03/19 14:05:28 WARN Utils: Your hostname, Sivabalans-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.1.5 instead (on interface en0)
   22/03/19 14:05:28 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
   22/03/19 14:05:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
     File "/tmp/rep.py", line 26
       'hoodie.write.lock.zookeeper.lock_key': f"{table_name}",
                                                             ^
   SyntaxError: invalid syntax
   log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
   log4j:WARN Please initialize the log4j system properly.
   log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   ```
   
   Or should I launch pyspark and run through the commands you have given above. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org